Running jobs and accessing the systems
-
How do I use the modules?
- In order to make running installed programs easier you should use the module
command. The different module that are installed sets the correct environments that are needed for
the programs to run, like PATH, LD_LIBRARY_PATH and MANPATH.
To see what what modules that are available, type module avail.
To see what modules you have loaded, type module list.
Note. For the batch system slurm to work with modules, like on kalkyl, you must have
#!/bin/bash -l
in your submit script in able to load modules in the script.
For more information, use this link -
I can't log in.
-
From outside Sweden
Please note that at the moment we do not allow computers outside Sweden to connect to our resources. If you try this you will get "ssh_exchange_identification: Connection closed by remote host" (OpenSSH client) or "Server unexpectedly closed network connection" (Putty client).
If you need to access UPPMAX from outside Sweden, please use the VPN service at Uppsala University (http://uadm.uu.se/iti/helpdesk/tjanster/vpn). If this is not possible for you, please contact support, providing the following information: your user name, the IP address (or DNS name) of the computer you wish to connect from, and for how long you need this access.
From within Sweden
If you get "ssh_exchange_identification: Connection closed by remote host", and are connecting from a computer in Sweden, this is typically caused by your computer not having a proper DNS name, or the forward and reverse name resolution do not match.If this is the case, please contact your ISP and ask them to correct this. If you are certain that the computer you are connecting from are in Sweden and have a proper DNS name, please contact support, providing the DNS name of the computer and we will add it to the list of allowed networks to connect from.
If it still fails
Finally, we may also be having a service stop, please look at news/events on the front page.
Note
You can check forward and reverse name resolution with the following commands on Linux
- Forward resolution: 'host mycomputername.domain.tld'. You have to replace mycomputername.domain.tld with your computers actual name. Example
$ host kalkyl1.uppmax.uu.se
$ kalkyl1.uppmax.uu.se has address 130.238.136.97 - Reverse resolution: 'host my_ipnumer'. You have to replace my_ipnumber with your computers actual IP number. Example
$ host 130.238.136.97
$ 97.136.238.130.in-addr.arpa domain name pointer kalkyl1.uppmax.uu.se
- Forward resolution: 'host mycomputername.domain.tld'. You have to replace mycomputername.domain.tld with your computers actual name. Example
-
How can I display my disk quota
-
We use a quota system at UPPMAX. To display your disk quota use the command 'uquota'.
To limit the amount of disk space each user can allocate we use a disk quota system at UPPMAX. The default disk quota is 32 GByte in your home directory and 250 GByte in your global scratch area.
When you exceed the hard limit you cannot store any more files or data. You have to remove some files or request more quota. You can exceed the soft limit up to the hard limit. When you have exceeded the soft limit for as many days as the grace period permits you cannot stora any more files or data until you have reduced your disk space usage under the soft limit.
You can display your current usage with the command 'uquota', if you have loaded the module 'uppmax' first.
-
My program crashes with the error message "Bus error". Why?
-
This may happen if your executable binary file is deleted while the program is running. For example, if you recompile your program the previous executable file is deleted, which can cause running instances of the program to crash with "Bus error". The recommended solution is that if you need to recompile or reinstall while the program is running, create a copy of the executable file and execute the copy. Then, the original executable file can be safely deleted. Alternatively, rename the currently executing file to something new and unique (using the mv command) before recompiling/reinstalling your program.
-
My program suddenly seems to stop executing but it does not crash, the process is still alive. What is wrong?
-
This may happen if your executable binary file is deleted while the program is running. For example, if you recompile your program the previous copy of the executable file is deleted, which can cause running instances of the program to freeze in this way. The recommended solution is that if you need to recompile while the program is running, create a copy of the executable file and execute the copy. Then, the original executable file can be safely deleted. Alternatively, rename the currently executing file to something new and unique (using the mv command) before recompiling/reinstalling your program.
-
How can I compile MPI and OpenMP programs on UPPMAX computers?
-
Please see the tutorial on MPI and OpenMP
-
I have strange problems with my text-files / scripts when they have been copied from other computers
-
Lines of text files are terminated differently on UNIX/Windows/MAC.
This might happen because your file was created, for instance, on a Windows computer and later copied to UPPMAX unix machines. Text files have different line terminations on for instance Windows and UNIX. If this is an ordinary textfile you can test this by using the "file" command, like this:$ file myfile myfile: ASCII text, with CRLF line terminators
CRLF terminators tells you that each line of the file is ended by both a carriage-return and a line-feed, as on Windows. On Ra, the file can simply be converted to UNIX style text files using the "dos2unix" command:$ dos2unix myfile dos2unix: converting file myfile to UNIX format ...
Checking the file again with the "file" command reveals that it now has ordinary UNIX line terminators (only LF):$ file myfile myfile: ASCII text
Similarly, a file from a Mac can be converted using the "mac2unix" command.
If a shell script is behaving strangely, it can be due to the same problem. Trying to execute a program where the end of line marker is wrong might result in an error message such as the one below:$ cat myscript.sh #!/bin/sh ./program $ ./myscript.sh : No such file or directory
The "file" command does not work in this case as it simply tells us that the script is a "Bourne shell script text executable". Opening the script using "vi" shows at the bottom of the screen "myscript.sh" [dos] 2L, 22C. The "[dos]" is a sure marker of the same problem. Opening the same file in emacs reveals the same thing (-uu-(DOS)---F1 myscript.sh). Convert the script to unix-format using the "dos2unix" command as described above. An alternative is to copy the file and use the "dos2unix" command on the copy and compare the file sizes using "ls -l":$ ls -l testme.sh rwxr-xr-x 1 daniels uppmax_staff 22 Dec 15 10:53 testme.sh $ dos2unix testme.sh dos2unix: converting file testme.sh to UNIX format ... $ ls -l testme.sh -rwxr-xr-x 1 daniels uppmax_staff 20 Dec 15 10:54 testme.sh
Note that the file size went from 22 bytes to 20, reflecting that the two CR bytes at the (almost) end of the line were removed. -
My job has very low priority! What can be wrong?
-
One reason could be that your project has consumed its allocated hours.
Background: Every job ( and user ) has to be associated with a project. Say that that you are working for a SNIC project s00101-01 that's been granted 10000 core hours every month. The first of every month s00101-01 is credited with 10000 hours and jobs that runs in that project are given a high priority. When enough jobs have run to consume this amount of hours the priority is set very low.On SLURM:
To check status for your projects, run
$ module load uppmax $ projinfo (Counting the number of core hours used since 2010-05-01/00:00:00 until now.) Project Used[h] Current allocation [h/month] User ----------------------------------------------------- s00101-01 72779.48 50000 some-user 72779.48If there are enough jobs left in projects that have not gone over their allocation, jobs associated with this project are therefore stuck wating at the bottom of the jobinfo list until beginning of June
On the other side they may be lucky to get some free nodes, so they can run already within May.
The job queue, that you can see with the jobinfo command, is ordered on job priority. Jobs with a high priority will run first, if they can (depending on number of free nodes and any special demands on e.g. memory).
Job priority is the sum of the following numbers (you may use the sprio command to get exact numbers for individual jobs):
- A high number (100000 or 130000) if your project is within its allocation and a lower number otherwise. There are different grades of lower numbers, depending on how many times your project is overdrafted. As an example, a 2000 core hour project gets priority 70000 when it has used more than 2000 core hours, gets priority 60000 when it has used more than 4000 core hours, gets priority 50000 when it has used more than 6000 core hours, and so on. The lowest grade gives priority 10000 and does not go down from there.
- The number of minutes the job has been waiting in queue (for a maximum of 20160 after fourteen days).
- A job size number, higher for more nodes allocated to your job, for a maximum of 104.
- A very, very high number for "short" jobs, i.e. very short jobs that is not wider than four nodes. Take a look into the Kalkyl User Guide for more information about this.
If your job priority is zero or one, there are more serious problems, as if you ask for more resources than the batch system finds on the system.
If you ask for a longer run time (TimeLimit) than the maximum on the system, your job will not run. See the Kalkyl User Guide for information about the current maximum. When we write this, the maximum is seven days.
To check status the month of May for a project do
module load uppmax projinfo -m 05 -P s00101-01 Project Used[h] Allocated[h] User @isis -------------------------------------- s00101-01 11032 10000@isis some-user 11032
Jobs associated with this project will therefore got stuck wating at the bottom of the qstat list until beginning of June.
It is, if the user belongs to more projects, possible to change the project of a queueing job. For details on how to do this typeman qalter
The job queue, that you can see with the "qstat" command, is ordered on job priority. Jobs with a high priority will run first if they can (depending on number of free nodes and any special demands on e.g. memory).
A job can also start using the "back fill" mechanism if it can finish its work without delaying the topmost (higher priority) waiting job. In this way very short jobs have the possibility to find a small, empty space in the time schedule.
Job priority is based several factors:
- A high project policy contribution if the project is within its monthly allocation.
- The waiting time in the queue, priority grows with time.
- Job size, higher for more slots.
- User assigned "-p" priority (range -1023 to 1024) affects users jobs internal order.
- Finally if the user has many pending jobs the priority is shared between them thus decreasing the priority of all his jobs.
One can read the man page of "qstat" and use the informational flags "-ext" or "-pri" to get all the details on how this math is done.
- A high number (100000 or 130000) if your project is within its allocation and a lower number otherwise. There are different grades of lower numbers, depending on how many times your project is overdrafted. As an example, a 2000 core hour project gets priority 70000 when it has used more than 2000 core hours, gets priority 60000 when it has used more than 4000 core hours, gets priority 50000 when it has used more than 6000 core hours, and so on. The lowest grade gives priority 10000 and does not go down from there.
-
I want my program to send data to both stdout and to a file but nothing comes until the program ends.
-
There is a program called unbuffer. You could try using it like
unbuffer your_program |tee some_output_file
-
Why can't I scp/sftp to UPPMAX when I can connect with ssh?
-
Probably you output some text in your login scripts.
Scp/sftp works only if your login scripts do not produce any output to stdout for a noninteractive login.
IF you want to produce output you MUST make sure they do it only for interactive logins.
You can use the following code (assuming bash):
TTY=$(tty -s) if [ $? = 0 ];then echo "Interactive print stuff here"
-
How to run interactively on a compute node?
-
You may want to run an interactive application on one or several compute nodes. You may want to use one or several compute nodes as a development workbench, interactively. How can this be arranged?
On SLURM:
The program interactive may be what you are looking for.
You need to know your project name, which you can find with the projinfo command. Let us assume that the project name is p2010099.
The best way to use the command is usually to add as few parameters as possible, because the interactive command tries to find an optimal solution to give you a high queue priority and thus a quick job start. If you have a clear idea about what parameters you need, please specify them, otherwise it might be a good idea to first see what you get with fewer parameters.
To get one node of eight cores, we recommend you to use the most simple command
interactive -A p2010099
If you need more than one node, or special features on your node, you can specify that to the interactive command, e.g.
interactive -A p2010099 -n 16 -C fat
as if it was an sbatch command. Actually, interactive is implemented partly as an sbatch command and you can use most sbatch flags here. Please note that only a few nodes are fat, so you may have to wait for quite a long time to get your session started.
There are three ways to get a priority boost, and the interactive command knows how to use them all:
- Internally using the sbatch flag "--qos=interact", that allows a single-node job with a timelimit of up to 12 hours. (Please note
that you are not allowed to keep more than one "--qos=interact" jobs in the batch
system simultaneously, and please note that you can not use this
"priority lane" when you have oversubscribed your monthly core hour
allocation.)
- Internally using the special devel partition, that allows the job to use 1-4 nodes, with a timelimit of up to one hour. (Please note that
you are not allowed to keep more than one "devel" job in the batch
system simultaneously, regardless if they are running or merely queued.)
- Internally using the sbatch flag "--qos=short", that allows the job to use 1-4 nodes, with a timelimit of up to 15 minutes. (Please note that your are not allowed to keep more than two "short" jobs in the batch system simultaneously.)
If you do not specify any timelimit, the interactive command will give you the maximum timelimit allowed, according to the rules for priority boosts.
In the last example ("interactive -A p2010099 -n 16 -C fat"), the interactive command can not use "priority lane" 1 above, because it uses more than one node (one node contains eight cores, two nodes contain a total of sixteen cores), and it can not use "priority lane" 2 above, because the special devel partition contains no fat nodes, so the interactive command tries to give you a high-priority 15-minute job.
If you also want to run for 15 hours, you may say so, with the command
interactive -A p2010099 -n 16 -C fat -t 15:00:00
but no "priority lane" can be used, you get your normal queue priority, and you might have to wait for a very long time for your session to start. Please note that you need to keep watch over when the job starts, because you are accounted for all the time from job start even if you are sleeping, and because an allocated and unused node is a waste of expensive resources.
- Internally using the sbatch flag "--qos=interact", that allows a single-node job with a timelimit of up to 12 hours. (Please note
that you are not allowed to keep more than one "--qos=interact" jobs in the batch
system simultaneously, and please note that you can not use this
"priority lane" when you have oversubscribed your monthly core hour
allocation.)
-
How can I make data available over HTTP?
-
Expert version
Your projects public www folder is located in /bubo/webexport/<project id>, and accessible through https://export.uppmax.uu.se/<project id>
Read further for information on the www-tools scripts used to configure access to the public www folder, unless you are familiar with apache-configuration.
End expert versionThe Basics
Some information about the web server:
- All scripting languages (PHP etc) are disabled for security reasons.
- The web server has read-only access to the files in your webexport folder, and only to your webexport folder.
- There is no way for the web server to communicate with the Uppmax clusters, what so ever.
The purpose of this web server is mainly to make data accessible for users outside of Uppmax. What you do with this feature is completely up to you.
Webexport Guide
To enable the public www folder of your project, please send a mail to the support and they will help you activate it.
Once activated, there will be a new folder in your project folder called webexport. Everything in this folder will be available online at https://export.uppmax.uu.se/<project id> (Ex. https://export.uppmax.uu.se/b2011999).
If you are familiar with web servers since before, you will be able to customize the behavior of the files and folders within the webexport folder, using .htaccess/.htpasswd files.
For those that are not familiar with web servers since before, a couple of tools have been created to help with the process. More about these later on.
A newly created webexport folder will contain two folders; "private" and "files". They differ in behavior, so mind where you put your data!
- Files put in the "files"-folder will be visible and downloadable from any computer.
- Files put in the "private"-folder will be hidden (only hidden through the web browser, NOT over ssh) and will require you to define user/password combinations (manually, or using the dedicated tools, mentioned below) which then can be used to access the files.Where can i find my files?
If you put a file named test.txt in your webexport folder, ex. /proj/b2010074/webexport/test.txt, it will be accessible through http://export.uppmax.uu.se/b2010074/test.txt
If you put a file called test.txt in the 'files' folder in your webexport folder, ex. /proj/b2010074/webexport/files/test.txt, it will be accessible through http://export.uppmax.uu.se/b2010074/files/test.txt
If you put a file called test.txt in a folder created by your self in your webexport folder, ex. /proj/b2010074/webexport/myNewFolder/test.txt, it will be accessible through http://export.uppmax.uu.se/b2010074/myNewFolder/test.txt
Tools, and how to use them
Common for all tools is that you first have to load the www-tools module$ module load www-tools
Adding a user to a password protected directory
IMPORTANT:If the directory you add the user to has not been setup to be password protected, standard settings will be applied (listing of files, password protected). Follow "Creating a new folder, or changing the behavior of an existing one" to change these settings of a directory.
1. Enter the directory you want to add a user to
$ cd <password protected folder>
( Ex. $ cd /proj/b2010999/private )
2. Run the www-add-user script$ www-add-user <user name>
( Ex. $ www-add-user martin)
3. Done! The script will generate a password for you for security purposes. Send the user name and password to the person(s) you want to be able to access the data in the password protected folder.Removing a user from a password protected directory
1. Enter the directory you want to remove a user from
$ cd <password protected folder>
( Ex. $ cd /proj/b2010999/private )
2. Run the www-rem-user script$ www-rem-user <user name>
( Ex. $ www-rem-user martin )
3. Done! If you wish to see a list of all users in the password protected folder you are currently standing in, type$ cat .htpasswd
and you will se a list of the user
Ex.
martin:uBbBQTqv1Kx9M
martin2:A9gf7e0nmHLNA
martin3:dUs/DhBOhod4Q
martin4:34nPPePzHLfxU
Where martin, martin2, martin3, martin4 are the user names.Creating a new folder, or changing the behavior of an existing one
1. Run the www-add-dir script$ www-add-dir <path to the directory you want to create/change>
( Ex. $ www-add-dir webexport/moreFiles )
If you wish to modify the behavior of an existing directory, you will have to use the -f flag to force a change.
( Ex. $ www-add-dir -f /proj/b2010999/webexport/files )
2. You will now be asked 2 questions:
- Would you like to enable listing of files in the directory? (y/n) [N]:
Activating the listing of files will enable users to browse the folder in their browser. All files in the folder will appear in a list and the user can choose which file he/she wants to download.
Deactivating listing of files will require the user to know the name of the file he/she wants to download. No list of files will appear, and the user will not be able to download anything without the file name.
- Would you like to have your new directory password protected? (y/n) [Y]:
Activating the password protection will, as the name suggests,require a valid user name and password from the user before any downloading can be done.
Deactivating it will leave the directory open for downloading by anyone.
Please note that users have to be added to password protected folders. Follow the guide for adding users on this page for instructions on how to do that.How to remove password protection of a directory
If you want to remove the password protection of a folder, you can remove two files called .htpasswd and .htaccess
These filenames start with a dot (.), so they are hidden files, so you have to use
ls -a
to see them in the directory.
To remove them, simply type
rm .htpasswd .htaccess
in the directory that is password protected.
Advanced usage
Nested password protection
If you password protect a directory, all its files and sub directories will be password protected with the same users and passwords. This is call inheretence, when a sub directory inherits the settings of its parent directroy.
If you want the sub directory to have different users and passwords than the parent directory, you can createnested security zones. That means that you creates a new password protected directory as a sub directory to the first password protected directory.
This new directory will not inherit any of the settings or users from the parent directory. You will have to add users to this new directory the same way you have added them to the parent directory. Even if you create a new user with the same user name as in the parent directory, they will have different passwords. All the files and sub directories of this newly created directory will inherit the new settings and users specified in the newly created directory.
There is no limit on how many times you can nest security zones. You can even create a sub directory with the password protection turned off, and you will have a password free sub directory (NOT the same as just creating a new directory).




