How can I see my job's memory usage?
Historical information can first of all be found by issuing the command "finishedjobinfo -j". That will print out the maximum memory used by your job.
If you want more details then we also save some memory information each 5 minute interval for the job in a file under /sw/share/slurm/[cluster-name]/uppmax_jobstats//. Notice that this is only stored for 30 days.
You can also ask for an e-mail containing the log, when you submit your job with sbatch or start an "interactive" session, by adding a "-C usage_mail" flag to your command. Two examples:
sbatch -A testproj -p core -n 5 -C usage_mail batchscript1 interactive -A testproj -p node -n 1 -C "fat&usage_mail"
As you see, you have to be careful with the syntax when asking for two features, like "fat" and "usage_mail", at the same time. The logical AND operator "&" combines the flags.
If you overdraft the RAM that you asked for, you will probably get an automatic e-mail anyway.
If, on the other hand, you want to view your memory consumption in real time then you will have to login to the node in question in another SSH session. (You will probably find a more recently updated memory information file there, named /var/spool/uppmax_jobstats/.)
By naively looking at the memory consumption with tools like "ps" and "top" you as a user can easily get the wrong impression of the system, as the Linux kernel uses free memory for lots of buffers and caches to speed up other processes (but releases this as soon as applications requests it).
If you know that you are the only user running on the node (from requesting a node job for example), then you could issue the command "free -g" instead. That will show you how much memory is used/free by the whole system, exclusive to these caches. Look for the row called "-/+ buffers/cache".
If you require more detailed live information, then it would probably be best if the tool called "smem" is used. Download the latest version from http://www.selenic.com/smem/download/ and unpack it in your home directory. Inside you will find an executable Python script, and by executing the command "smem -utk" you will see your user's memory usage reported in three different ways.
USS is the total memory used by the user without shared buffers or caches. RSS is the number reported in "top" and "ps"; i.e. including ALL shared buffered/cached memory. And then there's also the PSS figure which tries to calculate a proportional memory usage per user for all shared memory buffers and caches (i.e. the figure will fall between USS and RSS).