|
Job priority |
|
============ |
|
|
|
Triton queues are not first-in first-out, but "fairshare". This means |
|
that every person has a priority. The more you run the lower your |
|
user priority. As time passes, your user priority increases again. |
|
The longer a job waits in the queue, the higher its job priority goes. |
|
So, in the long run (if everyone is submitting an never-ending stream |
|
of jobs), everyone will get exactly their share. |
|
|
|
Once there are priorities, then: jobs are scheduled in order of |
|
priority, then any gaps are backfilled with any smaller jobs that can |
|
fit in. So small jobs usually get scheduled fast regardless. |
|
|
|
*Warning: from this point on, we get more and more technical, if you |
|
really want to know the details. Summary at the end.* |
|
|
|
What's a share? Currently shares are based on department and their |
|
respective funding of Triton (``sshare``). Shares are shared among |
|
everyone in the department, but each person has their own priority. |
|
Thus, for medium users, the 2-week usage of the rest of your |
|
department can affect how fast your jobs run. However, again, things |
|
are balanced per-user within departments. (However, one heavy user in |
|
a department can affect all others in that department a bit too much, |
|
we are working on this) |
|
|
|
Your priority goes down via the "job billing": roughly time×power. |
|
CPUs are billed at 1/s (but older, less powerful CPUs cost less!). |
|
Memory costs .2/GB/s. But: you only get billed for the max of memory |
|
or CPU. So if you use one CPU and all the memory (so that no one else |
|
can run on it), you get billed for all memory but no CPU. Same for |
|
all CPUs and little memory. This encourages balanced use. (this also |
|
applies to GPUs). |
|
|
|
GPUs also have a billing weight, currently tens of times higher than a |
|
CPU billing weight for the newest GPUs. (In general all of these can |
|
change, for the latest info see search ``BillingWeights`` in |
|
``/etc/slurm/slurm.conf``). |
|
|
|
If you submit a long job but it ends early, you are only billed for |
|
the actual time you use (but the longer job might take longer to start |
|
at the beginning). Memory is always billed for the full reservation |
|
even if you use less, since it isn't shared. |
|
|
|
The "user priority" is actually just a record how much you have |
|
consumed lately (the billing numbers above). This number goes down |
|
with a half-life decay of 2 weeks. Your personal priority your share |
|
compared to that, so we get the effect described above: the more you |
|
(or your department) runs lately, the lower your priority. |
|
|
|
If you want your stuff to run faster, the best way is to more |
|
accurately specify your time (may make that job can find a place |
|
sooner) and memory (avoids needlessly wasting your priority). |
|
|
|
While your job is pending in the queue SLURM checks those metrics |
|
regularly and recalculates job priority constantly. If you are |
|
interested in details, take a look at `multifactor priority plugin |
|
<https://slurm.schedmd.com/priority_multifactor.html>`__ page (general |
|
info) and `depth-oblivious fair-share factor |
|
<https://slurm.schedmd.com/priority_multifactor3.html>`__ for what we |
|
use specifically (warning: very in depth page). On Triton, you can |
|
always see the latest billing weights in ``/etc/slurm/slurm.conf`` |
|
|
|
Numerically, job priorities range from 0 to 2^32-1. Higher is |
|
sooner to run, but really the number doesn't mean much itself. |
|
|
|
These commands can show you information about your user and job |
|
priorities: |
|
|
|
.. csv-table:: |
|
:delim: | |
|
|
|
``slurm s`` | list of jobs per user with their current priorities |
|
``slurm full`` | as above but almost all of the job parameters are listed |
|
``slurm shares`` | displays usage (RawUsage) and current FairShare weights (FairShare, higher is better) values for all users |
|
``slurm j <jobid>`` | shows ``<jobid>`` detailed info including priority, requested nodes etc. |
|
|
|
.. |
|
``slurm p gpu`` | # shows partition parameters incl. Priority= |
|
|
|
|
|
tl;dr: Just select the resources you think you need, and slurm |
|
tries to balance things out so everyone gets their share. The best |
|
way to maintain high priority is to use resources efficiently so you |
|
don't need to over-request. |
From a interactive question: how is priority calculated? We shouldn't go into depth but it could be mentioned in 1-2 more sentences.
This bit of history is an old write-up about it, which was deprecated some time ago as the page was redundant: (new description shouldn't be this long, but it could be a faq?)
scicomp-docs/triton/usage/jobs.rst.old
Lines 189 to 273 in af73a37