My job has very low priority! What can be wrong?
One reason could be that your project has consumed its allocated hours.
Background: Every job is associated with a project. Suppose that that you are working for a SNIC project s00101-01 that's been granted 10000 core hours per 30-days running. At the start of the project, s00101-01 is credited with 10000 hours and jobs that runs in that project are given a high priority. All the jobs that are finished or are running during the last 30 days is compared with this granted time. If enough jobs have run to consume this amount of hours the priority is lowered. The more you have overdrafted your granted time, the lower the priority.
If you have overdrafted your granted time it's still possible to run jobs. You will probably wait for a longer time in the queue.
To check status for your projects, run
$ projinfo (Counting the number of core hours used since 2010-05-12/00:00:00 until now.) Project Used[h] Current allocation [h/month] User ----------------------------------------------------- s00101-01 72779.48 50000 some-user 72779.48
If there are enough jobs left in projects that have not gone over their allocation, jobs associated with this project are therefore stuck wating at the bottom of the jobinfo list until the usage for the last 30 days drops down under its allocated budget again.
On the other side they may be lucky to get some free nodes, so it could happen that they run as a bonus job before this happens.
The job queue, that you can see with the jobinfo command, is ordered on job priority. Jobs with a high priority will run first, if they can (depending on number of free nodes and any special demands on e.g. memory).
Job priority is the sum of the following numbers (you may use the sprio command to get exact numbers for individual jobs):
- A high number (100000 or 130000) if your project is within its allocation and a lower number otherwise. There are different grades of lower numbers, depending on how many times your project is overdrafted. As an example, a 2000 core hour project gets priority 70000 when it has used more than 2000 core hours, gets priority 60000 when it has used more than 4000 core hours, gets priority 50000 when it has used more than 6000 core hours, and so on. The lowest grade gives priority 10000 and does not go down from there.
- The number of minutes the job has been waiting in queue (for a maximum of 20160 after fourteen days).
- A job size number, higher for more nodes allocated to your job, for a maximum of 104.
- A very, very high number for "short" jobs, i.e. very short jobs that is not wider than four nodes.
If your job priority is zero or one, there are more serious problems, for example that you asked for more resources than the batch system finds on the system.
If you ask for a longer run time (TimeLimit) than the maximum on the system, your job will not run. The maximum is currently ten days. If you must run a longer job, submit it with a ten-day runtime and contact UPPMAX support.