triton/tut/slurm: how does priority work? · Issue #685 · AaltoSciComp/scicomp-docs

From a interactive question: how is priority calculated? We shouldn't go into depth but it could be mentioned in 1-2 more sentences.

This bit of history is an old write-up about it, which was deprecated some time ago as the page was redundant: (new description shouldn't be this long, but it could be a faq?)

scicomp-docs/triton/usage/jobs.rst.old

Lines 189 to 273 in af73a37

    
           Job priority 
        
           ============ 
        
           Triton queues are not first-in first-out, but "fairshare".  This means 
        
           that every person has a priority.  The more you run the lower your 
        
           user priority.  As time passes, your user priority increases again. 
        
           The longer a job waits in the queue, the higher its job priority goes. 
        
           So, in the long run (if everyone is submitting an never-ending stream 
        
           of jobs), everyone will get exactly their share. 
        
           Once there are priorities, then: jobs are scheduled in order of 
        
           priority, then any gaps are backfilled with any smaller jobs that can 
        
           fit in.  So small jobs usually get scheduled fast regardless. 
        
           *Warning: from this point on, we get more and more technical, if you 
        
           really want to know the details.  Summary at the end.* 
        
           What's a share?  Currently shares are based on department and their 
        
           respective funding of Triton (``sshare``).  Shares are shared among 
        
           everyone in the department, but each person has their own priority. 
        
           Thus, for medium users, the 2-week usage of the rest of your 
        
           department can affect how fast your jobs run.  However, again, things 
        
           are balanced per-user within departments.  (However, one heavy user in 
        
           a department can affect all others in that department a bit too much, 
        
           we are working on this) 
        
           Your priority goes down via the "job billing": roughly time×power. 
        
           CPUs are billed at 1/s (but older, less powerful CPUs cost less!). 
        
           Memory costs .2/GB/s.  But: you only get billed for the max of memory 
        
           or CPU. So if you use one CPU and all the memory (so that no one else 
        
           can run on it), you get billed for all memory but no CPU.  Same for 
        
           all CPUs and little memory.  This encourages balanced use.  (this also 
        
           applies to GPUs). 
        
           GPUs also have a billing weight, currently tens of times higher than a 
        
           CPU billing weight for the newest GPUs.  (In general all of these can 
        
           change, for the latest info see search ``BillingWeights`` in 
        
           ``/etc/slurm/slurm.conf``). 
        
           If you submit a long job but it ends early, you are only billed for 
        
           the actual time you use (but the longer job might take longer to start 
        
           at the beginning).  Memory is always billed for the full reservation 
        
           even if you use less, since it isn't shared. 
        
           The "user priority" is actually just a record how much you have 
        
           consumed lately (the billing numbers above).  This number goes down 
        
           with a half-life decay of 2 weeks.  Your personal priority your share 
        
           compared to that, so we get the effect described above: the more you 
        
           (or your department) runs lately, the lower your priority. 
        
           If you want your stuff to run faster, the best way is to more 
        
           accurately specify your time (may make that job can find a place 
        
           sooner) and memory (avoids needlessly wasting your priority). 
        
           While your job is pending in the queue SLURM checks those metrics 
        
           regularly and recalculates job priority constantly.  If you are 
        
           interested in details, take a look at `multifactor priority plugin 
        
           <https://slurm.schedmd.com/priority_multifactor.html>`__ page (general 
        
           info) and `depth-oblivious fair-share factor 
        
           <https://slurm.schedmd.com/priority_multifactor3.html>`__ for what we 
        
           use specifically (warning: very in depth page).  On Triton, you can 
        
           always see the latest billing weights in ``/etc/slurm/slurm.conf`` 
        
           Numerically, job priorities range from 0 to 2^32-1.  Higher is 
        
           sooner to run, but really the number doesn't mean much itself. 
        
           These commands can show you information about your user and job 
        
           priorities: 
        
           .. csv-table:: 
        
              :delim: | 
        
              ``slurm s``         | list of jobs per user with their current priorities 
        
              ``slurm full``      | as above but almost all of the job parameters are listed 
        
              ``slurm shares``    | displays usage (RawUsage) and current FairShare weights (FairShare, higher is better) values for all users 
        
              ``slurm j <jobid>`` | shows ``<jobid>`` detailed info including priority, requested nodes etc. 
        
           .. 
        
              ``slurm p gpu``       |     # shows partition parameters incl. Priority= 
        
           tl;dr: Just select the resources you think you need, and slurm 
        
           tries to balance things out so everyone gets their share.  The best 
        
           way to maintain high priority is to use resources efficiently so you 
        
           don't need to over-request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton/tut/slurm: how does priority work? #685

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Job priority
	============

	Triton queues are not first-in first-out, but "fairshare". This means
	that every person has a priority. The more you run the lower your
	user priority. As time passes, your user priority increases again.
	The longer a job waits in the queue, the higher its job priority goes.
	So, in the long run (if everyone is submitting an never-ending stream
	of jobs), everyone will get exactly their share.

	Once there are priorities, then: jobs are scheduled in order of
	priority, then any gaps are backfilled with any smaller jobs that can
	fit in. So small jobs usually get scheduled fast regardless.

	*Warning: from this point on, we get more and more technical, if you
	really want to know the details. Summary at the end.*

	What's a share? Currently shares are based on department and their
	respective funding of Triton (``sshare``). Shares are shared among
	everyone in the department, but each person has their own priority.
	Thus, for medium users, the 2-week usage of the rest of your
	department can affect how fast your jobs run. However, again, things
	are balanced per-user within departments. (However, one heavy user in
	a department can affect all others in that department a bit too much,
	we are working on this)

	Your priority goes down via the "job billing": roughly time×power.
	CPUs are billed at 1/s (but older, less powerful CPUs cost less!).
	Memory costs .2/GB/s. But: you only get billed for the max of memory
	or CPU. So if you use one CPU and all the memory (so that no one else
	can run on it), you get billed for all memory but no CPU. Same for
	all CPUs and little memory. This encourages balanced use. (this also
	applies to GPUs).

	GPUs also have a billing weight, currently tens of times higher than a
	CPU billing weight for the newest GPUs. (In general all of these can
	change, for the latest info see search ``BillingWeights`` in
	``/etc/slurm/slurm.conf``).

	If you submit a long job but it ends early, you are only billed for
	the actual time you use (but the longer job might take longer to start
	at the beginning). Memory is always billed for the full reservation
	even if you use less, since it isn't shared.

	The "user priority" is actually just a record how much you have
	consumed lately (the billing numbers above). This number goes down
	with a half-life decay of 2 weeks. Your personal priority your share
	compared to that, so we get the effect described above: the more you
	(or your department) runs lately, the lower your priority.

	If you want your stuff to run faster, the best way is to more
	accurately specify your time (may make that job can find a place
	sooner) and memory (avoids needlessly wasting your priority).

	While your job is pending in the queue SLURM checks those metrics
	regularly and recalculates job priority constantly. If you are
	interested in details, take a look at `multifactor priority plugin
	<https://slurm.schedmd.com/priority_multifactor.html>`__ page (general
	info) and `depth-oblivious fair-share factor
	<https://slurm.schedmd.com/priority_multifactor3.html>`__ for what we
	use specifically (warning: very in depth page). On Triton, you can
	always see the latest billing weights in ``/etc/slurm/slurm.conf``

	Numerically, job priorities range from 0 to 2^32-1. Higher is
	sooner to run, but really the number doesn't mean much itself.

	These commands can show you information about your user and job
	priorities:

	.. csv-table::
	:delim: \|

	``slurm s`` \| list of jobs per user with their current priorities
	``slurm full`` \| as above but almost all of the job parameters are listed
	``slurm shares`` \| displays usage (RawUsage) and current FairShare weights (FairShare, higher is better) values for all users
	``slurm j <jobid>`` \| shows ``<jobid>`` detailed info including priority, requested nodes etc.

	..
	``slurm p gpu`` \| # shows partition parameters incl. Priority=


	tl;dr: Just select the resources you think you need, and slurm
	tries to balance things out so everyone gets their share. The best
	way to maintain high priority is to use resources efficiently so you
	don't need to over-request.

triton/tut/slurm: how does priority work? #685

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions