MPI and OpenMP user guide
This is a short tutorial about how to use the queuing system, and how to compile and run MPI and OpenMP jobs.
Compiling and running parallel programs on UPPMAX clusters.
Introduction
These notes show by brief examples how to compile and run serial and parallel programs on the clusters at UPPMAX.
Section 1 show how to compile and run serial programs, written in fortran, c, or java, on the login nodes. Things work very much like on any unix system, but the subsections on c and java also demonstrate the use of modules.
Section 2 show how to run serial programs on the execution nodes by submitting them as batch jobs to the queue system SLURM.
Section 3 demonstrate parallel message passing programs in c, using the MPI system.
Section 4 demonstrate threaded programs in c using OpenMP directives. These programs must be executed on processors on the same node, since the threads have common memory areas.
Section 5, finally, demonstrate threaded programs usinig pthreads instead of OpenMP.
All programs are of the trivial "hello, world" type. The point is to demonstrate how to compile and execute the programs, not how to write parallel programs.
Serial programs on the login node
Fortran programs
Enter the following fortran program and save in the file hello.f
C HELLO.F : PRINT MESSAGE ON SCREEN PROGRAM HELLO WRITE(*,*) "hello, world"; END
To compile this you should decide on which compilers to use. At UPPMAX there are two different Fortran compilers installed gcc (gfortran) and Intel (ifort).
For this example we will use Gnu Compiler Collection (gcc) compilers installed on UPPMAX, so the gfortran command can be used to compile fortran code. The GFortran compiler is fully compliant with the Fortran 95 Standard and includes legacy F77 support. In addition, a significant number of Fortran 2003 and Fortran 2008 features are implemented. Fortran2008 and Fortran2018 has full support from gcc/9.
A module must first be loaded to use the compilers. You can check what is available and then load a specific version. Choose one recent or one you know will work for your needs.
$ module avail gcc
$ module load gcc/10.3.0
To compile, enter the command:
$ gfortran -o hello hello.f
to run, enter:
$ ./hello hello, world
To compile with good optimization you can use the "-Ofast" flag to the compiler, but be a bit careful with the -Ofast flag, since sometimes the compiler is a bit overenthusiastic in the optimization and this is especially true if your code contains programming errors (which if you are responsible for the code ought to fix, but if this is someone elses code your options are often more limited). Should -Ofast not work for your code you may try with -O3 instead.
Intel oneAPI collection (intel) compilers are installed on UPPMAX, so the ifort command can be used to compile fortran code. The ifort compiler is fully compliant with the Fortran 95 Standard and includes legacy F77 support. In addition, a significant number of Fortran 2003 and Fortran 2008 features are implemented. Fortran2008 has full support from intel/18. Fortran2018 has full support from intel/19+.
If you want to use Intel, check what is available and choose one recent or one you know will work for your needs.
$ module avail intel
$ module load intel/20.4
To compile, enter the command:
$ ifort -o hello hello.f
to run, enter:
$ ./hello hello, world
C programs
Enter the following c program and save in the file hello.c
/* hello.c : print message on screen */ #include <stdio.h> int main() { printf("hello, world\n"); return 0; }
To compile using gcc installed with the system (4.8.5, 2015) and with no optimization, use the gcc command.
$ gcc -o hello hello.c
To use a newer version of ggc we load a module:
$ module load gcc/10.3.0
$ gcc -o hello hello.c
with basic optimization:
$ gcc -O3 -o hello hello.c
c11 standard has full support from gcc/4.8, c17 standard (bug-fix) from gcc/8.
To use the intel compiler, first load the intel module:
$ module load intel/20.4
and then compile with the command icc:
$ icc -o hello hello.c
To run, enter:
$ ./hello hello, world
c11 and c17 (bug fix) standards have support from intel/17+ (fully from 19).
Java programs
Enter the following java program and save in the file hello.java
/* hello.java : print message on screen */ class hello { public static void main(String[] args) { System.out.println("hello, world"); } }
Before compiling a java program, the module java has to be loaded.
To load the java module, enter the command:
$ module load java
To check that the java module is loaded, use the command:
$ module list
To compile, enter the command:
$ javac hello.java
The java module is not always needed to run the program.
To verify this, unload the java module:
$ module unload java
to run, enter:
$ java hello hello, world
Running serial programs on execution nodes
Jobs are submitted to execution nodes through the resource manager.
We use SLURM on our clusters.
To run the serial program hello as a batch job using SLURM, enter the following shell script in the file hello.sh:
#!/bin/bash -l # hello.sh : execute hello serially in SLURM # command: $ sbatch hello.sh # sbatch options use the sentinel #SBATCH # You must specify a project #SBATCH -A your_project_name #SBATCH -J serialtest # Put all output in the file hello.out #SBATCH -o hello.out # request 5 seconds of run time #SBATCH -t 0:0:5 # request one core #SBATCH -p core -n 1 ./hello
The last line in the script is the command used to start the program.
Submit the job to the batch queue:
$ sbatch hello.sh
The program's output to stdout is saved in the file named at the -o flag.
$ cat hello.out hello, world
Mpi using the OpenMPI system
Before compiling a program for MPI we must choose, in addition to the compiler, which version of MPI we want to use. At UPPMAX there are two, openmpi and intelmpi. These, with their versions, are compatible only to a subset of the gcc and intel compiler versions. The lists below summarise the best choices.
- GCC
- v5: gcc/5.3.0 openmpi/1.10.3
- v6: gcc/6.3.0 openmpi/2.1.0
- v7: gcc/7.4.0 openmpi/3.1.3
- v8: gcc/8.3.0 openmpi/3.1.3
- v9: gcc/9.3.0 openmpi/3.1.3
- v10: gcc/10.3.0 openmpi/3.1.6 or openmpi/4.1.0
- v11: gcc/11.2.0 openmpi/4.1.1 (will work on Miarka)
- Intel
- v18: intel/18.3 openmpi/3.1.3
- v20: intel/20.4 openmpi/3.1.6 or openmpi/4.0.4
Check this compatibility page for a more complete picture of compatible versions.
C programs
Enter the following mpi program in c and save in the file hello.c
/* hello.c : mpi program in c printing a message from each process */ #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int npes, myrank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &npes); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); printf("From process %d out of %d, Hello World!\n", myrank, npes); MPI_Finalize(); return 0; }
Before compiling a program for MPI we must choose wich version of MPI. At UPPMAX there are two, openmpi and intelmpi. For this example we will use openmpi.
To load the openmpi module, enter the command below or choose other versions according to the lists above.
$ module load gcc/10.3.0 openmpi/3.1.6
To check that the openmpi modules is loaded, use the command:
$ module list
The command to compile a c program for mpi is mpicc. Which compiler is used when this command is issued depends on what compiler module was loaded before openmpi
To compile, enter the command:
$ mpicc -o hello hello.c
You should add optimization and other flags to the mpicc command, just as you would to the compiler used. So if the pgi compiler is used and you wish to compile an mpi program written in C with good, fast optimization you should use a command similar to the following:
$ mpicc -fast -o hello hello.c
To run the mpi program hello using the batch system:
#!/bin/bash -l # hello.sh : execute parallel mpi program hello on slurm # use openmpi # command: $ sbatch hello.sh # slurm options use the sentinel #SBATCH #SBATCH -A your_project_name #SBATCH -J mpitest #SBATCH -o hello.out # # request 5 seconds of run time #SBATCH -t 00:00:05 #SBATCH -p node -n 8 module load pgi/18.3 openmpi/3.1.3 mpirun ./hello
The last line in the script is the command used to start the program.
The last word on the last line is the program name hello.
Submit the job to the batch queue:
$ sbatch hello.sh
The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:
$ cat hello.out From process 4 out of 8, Hello World! From process 5 out of 8, Hello World! From process 2 out of 8, Hello World! From process 7 out of 8, Hello World! From process 6 out of 8, Hello World! From process 3 out of 8, Hello World! From process 1 out of 8, Hello World! From process 0 out of 8, Hello World!
Fortran programs
The following example program does numerical integration to find Pi (inefficiently, but it is just an example):
program testampi implicit none include 'mpif.h' double precision :: h,x0,x1,v0,v1 double precision :: a,amaster integer :: i,intlen,rank,size,ierr,istart,iend call MPI_Init(ierr) call MPI_Comm_size(MPI_COMM_WORLD,size,ierr) call MPI_Comm_rank(MPI_COMM_WORLD,rank,ierr) intlen=100000000 write (*,*) 'I am node ',rank+1,' out of ',size,' nodes.' h=1.d0/intlen istart=(intlen-1)*rank/size iend=(intlen-1)*(rank+1)/size write (*,*) 'start is ', istart write (*,*) 'end is ', iend a=0.d0 do i=istart,iend x0=i*h x1=(i+1)*h v0=sqrt(1.d0-x0*x0) v1=sqrt(1.d0-x1*x1) a=a+0.5*(v0+v1)*h enddo write (*,*) 'Result from node ',rank+1,' is ',a call MPI_Reduce(a,amaster,1, & MPI_DOUBLE_PRECISION,MPI_SUM,0,MPI_COMM_WORLD,ierr) if (rank.eq.0) then write (*,*) 'Result of integration is ',amaster write (*,*) 'Estimate of Pi is ',amaster*4.d0 endif call MPI_Finalize(ierr) stop end program testampi
The program can be compiled by this procedure, using mpif90:
$ module load intel/20.4 openmpi/3.1.6 $ mpif90 -Ofast -o testampi testampi.f90
The program can be run by creating a submit script sub.sh:
#!/bin/bash -l # execute parallel mpi program in slurm # command: $ sbatch sub.sh # slurm options use the sentinel #SBATCH #SBATCH -J mpitest #SBATCH -A your_project_name #SBATCH -o pi # # request 5 seconds of run time #SBATCH -t 00:00:05 # #SBATCH -p node -n 8 module load intel/20.4 openmpi/3.1.6
mpirun ./testampi
Submit it:
sbatch sub.sh
Output from the program on Rackham:
I am node 8 out of 8 nodes. start is 87499999 end is 99999999 I am node 3 out of 8 nodes. start is 24999999 end is 37499999 I am node 5 out of 8 nodes. start is 49999999 end is 62499999 I am node 2 out of 8 nodes. start is 12499999 end is 24999999 I am node 7 out of 8 nodes. start is 74999999 end is 87499999 I am node 6 out of 8 nodes. start is 62499999 end is 74999999 I am node 1 out of 8 nodes. start is 0 end is 12499999 I am node 4 out of 8 nodes. start is 37499999 end is 49999999 Result from node 8 is 4.0876483237300587E-002 Result from node 5 is 0.1032052706959522 Result from node 2 is 0.1226971551244773 Result from node 3 is 0.1186446918315650 Result from node 7 is 7.2451466712425514E-002 Result from node 6 is 9.0559231928350928E-002 Result from node 1 is 0.1246737119371059 Result from node 4 is 0.1122902087263801 Result of integration is 0.7853982201935574 Estimate of Pi is 3.141592880774230
OpenMP
OpenMP uses threads that use shared memory. OpenMP is supported by both the gcc and intel compilers and in the c/c++ and Fortran languages. Don't mix with OpenMPI whis is an open source library for MPI. OpenMP is built in in all modern compiler libraries.
Depending on your preferences load the chosen compiler:
$ module load gcc/12.1.0
or
$ module load intel/20.4
C programs
Enter the following openmp program in c and save in the file hello_omp.c
/* hello.c : openmp program in c printing a message from each thread */ #include <stdio.h> #include <omp.h> int main() { int nthreads, tid; #pragma omp parallel private(nthreads, tid) { nthreads = omp_get_num_threads(); tid = omp_get_thread_num(); printf("From thread %d out of %d, hello, world\n", tid, nthreads); } return 0; }
To compile, enter the command (note the -fopenmp or -qopenmp flag depending on compiler):
$ gcc -fopenmp -o hello_omp hello_omp.c
or
$ icc qfopenmp -o hello_omp hello_omp.c
Also here you should add optimization flags such as -fast as appropriate.
To run the openMP program hello using the batch system, enter the following shell script in the file hello.sh:
#!/bin/bash -l # hello.sh : execute parallel openmp program hello on slurm # use openmp # command: $ sbatch hello.sh # slurm options use the sentinel #SBATCH #SBATCH -J omptest #SBATCH -A your_project_name #SBATCH -o hello.out # # request 5 seconds of run time #SBATCH -t 00:00:05 #SBATCH -p node -n 8 uname -n #Tell the openmp program to use 8 threads export OMP_NUM_THREADS=8 module load intel/20.4 # or gcc... ulimit -s $STACKLIMIT ./hello_omp
The last line in the script is the command used to start the program.
Submit the job to the batch queue:
$ sbatch hello.sh
The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:
$ cat hello.out r483.uppmax.uu.se unlimited From thread 0 out of 8, hello, world From thread 1 out of 8, hello, world From thread 2 out of 8, hello, world From thread 3 out of 8, hello, world From thread 4 out of 8, hello, world From thread 6 out of 8, hello, world From thread 7 out of 8, hello, world From thread 5 out of 8, hello, world
Fortran programs
Enter the following openmp program in Fortran and save in the file hello_omp.f90
PROGRAM HELLO INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM ! Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(NTHREADS, TID) ! Obtain thread number TID = OMP_GET_THREAD_NUM() PRINT *, 'Hello World from thread = ', TID ! Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF ! All threads join master thread and disband !$OMP END PARALLEL END
With gcc compiler:
$ gfortran hello_omp.f90 -o hello_omp -fopenmp
and with Intel compiler:
$ ifort hello_omp.f90 -o hello_omp -qopenmp
Run with:
$ ./hello_omp
Hello World from thread = 1 Hello World from thread = 2 Hello World from thread = 0 Hello World from thread = 3 Number of threads = 4
A batch file would look similar to the C version, above.
Pthreads
Pthreads (Posix threads) are more low-level than openMP. That means that for a beginner it is easier to get rather expected gain only with a few lines with openMP. On the other hand it may be possible to gain more efficiency from your code with pthreads, though with quite some effort. Pthreads is native in c/c++. With additional installation of a POSIX library for Fortran it is possible to run it in there as well.
Enter the following program in c and save in the file hello_pthreads.c
/* hello.c : create system pthreads and print a message from each thread */ #include <stdio.h> #include <pthread.h> // does not work for setting array length of "tid": const int NTHR = 8; // Instead use "#define" #define NTHR 8 int nt = NTHR, tid[NTHR]; pthread_attr_t attr; void *hello(void *id) { printf("From thread %d out of %d: hello, world\n", *((int *) id), nt); pthread_exit(0); } int main() { int i, arg1; pthread_t thread[NTHR]; /* system threads */ pthread_attr_init(&attr); pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); /* create threads */ for (i = 0; i < nt; i++) { tid[i] = i; pthread_create(&thread[i], &attr, hello, (void *) &tid[i]); } /* wait for threads to complete */ for (i = 0; i < nt; i++) pthread_join(thread[i], NULL); return 0; }
To compile, enter the commands
$ module load gcc/10.2.0 $ gcc -pthread -o hello_pthread hello_pthread.c
To run the pthread program hello using the batch system, enter the following shell script in the file hello.sh:
#!/bin/bash -l # hello.sh : execute parallel pthreaded program hello on slurm # command: $ sbatch hello.sh # slurm options use the sentinel #SBATCH #SBATCH -J pthread #SBATCH -A your_project_name #SBATCH -o hello.out # # request 5 seconds of run time #SBATCH -t 00:00:05 # use openmp programming environment # to ensure all processors on the same node #SBATCH -p node -n 8 uname -n ./hello_pthread
The last line in the script is the command used to start the program.
Submit the job to the batch queue:
$ sbatch hello.sh
The program's output to stdout is saved in the file named at the -o flag.
A test run of the above program yelds the following output file:
$ cat hello.out r483.uppmax.uu.se From thread 0 out of 8: hello, world From thread 4 out of 8: hello, world From thread 5 out of 8: hello, world From thread 6 out of 8: hello, world From thread 7 out of 8: hello, world From thread 1 out of 8: hello, world From thread 2 out of 8: hello, world From thread 3 out of 8: hello, world