Tintin user guide
Table of contents:
This is the user guide to Tintin, a high performance computer cluster at UPPMAX. Guides for the other systems at UPPMAX can be found here.
Please read this Users Guide for up-to-date information.
All heavy usage of the cluster must go through the batch system, SLURM. The login nodes only allow up to 30 minutes of cpu time per process.
The login node for Tintin is called tintin.uppmax.uu.se. (In fact, there may be three login nodes hidden behind this name; you will be automatically redirected to any one of these.)
Tintin consists of 160 dual Bulldozer compute servers (nodes) where each compute server consists of two 8-core Opteron 6220 processors running at 3 GHz. We provide 144 nodes with 64 GB memoryi (ti1-ti144) and 16 nodes with 128 GB /ti145-ti160). All nodes are interconnected with a 2:1 oversubscribed QDR Infiniband fabric. There is also 4 nodes with nVidia Tesla S2050 GPU graphics card (ti161-164). In total Tintin provides 2624 cpu cores in compute nodes.
The login nodes (tintin1-3) are identical to the compute nodes but with only 32GB memory.
Important information about computer architecture
Tintin has a significantly different architecture from Milou, so it may be important for you to recompile your applications, to get them to run as fast as (or faster than) on Milou. To get a good speed, UPPMAX recommends you to compare the speed of binaries, made with more than one compiler.
Note that, due to features available only on Tintin CPUs, codes compiled on Tintin will normally *not* run at all on Milou. So even for non-performance critical codes you should take care to only run codes compiled on Tintin on Tintin and codes compiled on Milou on Milou.
Symptoms of mixing machines when compiling and running could, apart from missing libraries, be the program terminating due to illegal instructions.
OS and software
There are several compilers available through the module system on Tintin. This gives you flexibility to obtain programs that run optimally on Tintin.
- gcc - the newest version usually generates the best code, if you tell it to use the new instructions. Check which version is the newest by doing module avail.
The compiler executable is named gcc for C, g++ for C++, and gfortran for Fortran.
To use the new instructions available on tintin (AVX and FMA4), give the additional options "-mavx -mfma4" to gcc. For good performance with this compiler you should also specify optimization at least at level -O2 or -O3. Also try using -march=native, which will enable all the instructions on the CPU.
- intel+mkl - usually generates good code, even on the AMD cpus of tintin. As with gcc, it is good to use the latest version.
The compiler executable is named icc for C, icpc for C++, and ifort for Fortran.
You should give optimization options at least -O2, preferably -O3 or -fast. You can also try to use the -mavx option to the compiler to output AVX instructions, but please verify the results you obtain, as we found some additional problems with this option for some codes.
- pgi - often generates somewhat slower code, but it is stable so often it is easier to obtain working code, even with quite advanced optimizations.
The compiler executable is named pgcc for C, pgCC for C++, and pgfortran, pgf77, pgf90, or pgf95 for Fortran.
For this compiler, you can generate code for tintin using the following options "-Mvect=simd:128 -tp bulldozer-64". Also give optimization options at least -O2, preferably -Ofast, even though the compile times are much longer, the result is often worth the wait. It is possible to generate 256 bit vector instructions using "-Mvect=simd:256" instead of "-Mvect=simd:128", but our tests show the compiler to often generate suboptimal code with this option, and 256 bit vector instructions are not very beneficial compared to 128 bit vector instructions for the Bulldozer CPUs anyway.
- open64 - This compiler has special optimizations for the Bulldozer CPU, and can give good results, but it tends to break code at high optimization levels. You can use this compiler by doing
module load open64/amd-18.104.22.168
The compiler executable is named opencc for C, openCC for C++, and openf90 or openf95 for Fortran.
The options "-mavx -mfma4 -mcpu=bdver1 -mtune=bdver1" generates code for tintin. Also use at least -O2 optimization level, preferably -Ofast.
See the our software pages for more details about OS, compilers and installed software.
You will probably have good use of the following commands:
- uquota - telling you about your file system usage.
- projinfo - telling you about the CPU hour usage of your projects.
- jobinfo - telling you about running and waiting jobs on Tintin.
- finishedjobinfo - telling you about finished jobs on Tintin.
- projmembers - telling you about project memberships.
- projsummary [project id] - summarizes some useful information about projects
For SLURM commands and for commands like projinfo, jobinfo and finishedjobinfo, you may use the "-M" flag to ask for the answer to be given for a system that you are not logged in to. E.g., when logged into Tintin, you may ask about information about current core hour usage on Milou, with the command projinfo -M Milou
Accounts and log in
All access to this system is done via secure shell (a.k.a SSH) interactive login to the login node, using the domain name tintin.uppmax.uu.se
ssh -AX firstname.lastname@example.org
To get a user account and start using UPPMAX, see the Getting Started page.
For questions concerning accounts and access to Tintin, please contact UPPMAX support.
Note that the machine you arrive at when logged in is only a so called login node, where you can do various smaller tasks. We have some limits in place that restricts your usage. For larger tasks you should use our batch system that pushes your jobs onto other machines within the cluster.
Using the batch system
To allow a fair and efficient usage of the system we use a resource manager to coordinate user demands. On Tintin we use the SLURM software. Read our SLURM user guide for detailed information on how to use SLURM.
- There is a job walltime limit of ten days (240 hours).
- We restrict each user to at most 5000 running and waiting jobs in total.
- Each project has a 30 days running allocation of CPU hours. We do not forbid running jobs after the allocation is overdrafted, but instead allow to submit jobs with a very low queue priority, so that you may be able to run your jobs anyway, if a sufficient number of nodes happens to be free on the system.
- Very wide jobs will only be started within a maintenance window (just before the maintenance window or at the end of the maintenance window). These are planned for the first Wednesday of each month. On Tintin a "very wide" job asks for 54 nodes or more.
- $SNIC_TMP - Path to node-local temporary disk space
The $SNIC_TMP variable contains the path to a node-local temporary file directory that you can use when running your jobs, in order to get maxiumum disk performance (since the disks are local to the current compute node). This directory will be automatically created on your (first) compute node before the job starts and automatically deleted when the job has finished.
The path specified in $SNIC_TMP is equal to the path: /scratch/$SLURM_JOB_ID, where the job variable $SLURM_JOB_ID contains the unique job identifier of your job.
WARNING: Please note, that in your "core" (see below) jobs, if you write data in the /scratch directory but outside of the /scratch/$SLURM_JOB_ID directory, your data may be automatically deleteted during your job run.
Details about the "core" and "node" partitions
A normal Tintin node contains 64 GB of RAM and sixteen compute cores. An equal share of RAM for each core would mean that each core gets at most four GB of RAM. This simple calculation gives one of the limits mentioned below for a "core" job.
You need to choose between running a "core" job or a "node" job. A "core" job must keep within certain limits, to be able to run together with up to fifteen other "core" jobs on a shared node. A job that cannot keep within those limits must run as a "node" job.
Some serial jobs must run as "node" jobs. You tell Slurm that you need a "node" job with the flag "-p node". (If you forget to tell Slurm, you are by default choosing to run a "core" job.)
A "core" job:
Will use a part of the resources on a node, from a 1/16 share to a 15/16 share of a node.
Must specify less cores than 16, i.e.between "-n 1" to "-n 15".
Must not demand "-N", "--nodes", or "--exclusive".
Is recommended not to demand "--mem"
Must not demand to run on a fat node (see below, for an explanation of "fat"), a devel node or a GPU node.
Must not use more than four GB of RAM for each core it demands. If a job needs half of the RAM, i.e. 32 GB, you need to reserve also at least half of the cores on the node, i.e. eight cores, with the "-n 8" flag.
A "core" job is accounted on your project as one "core hour" (sometimes also named as a "CPU hour") per core you have been allocated, for each wallclock hour that it runs. On the other hand, a "node" job is accounted on your project as sixteen core hours for each wallclock hour that it runs, multiplied with the number of nodes that you have asked for.
Tintin has two node types, thin being the typical cluster node and fat nodes having double the amount of memory available normally (128 Gbyte). You may specify a node with more RAM, by adding the words "-C mem128GB" or "-C fat" to your job submission line and thus making sure that you will get 128 GB of RAM on each node in your job. Please note that there are only sixteen nodes with this amount (or more) of RAM.
To request a fat node, use -c mem128GB or -c fat in your sbatch command.
How to run on a node with GPU
Specify partition "gpu", as in example
$ interactive -A p2010999 -p gpu -n 16 -t 8:00:00
for running interactively on a GPU-enabled node for up to eight hours.
File storage and disk space
At UPPMAX we have a few different kinds of storage areas for files, see Disk Storage User Guide for more information and recommended use.