SLURM Programmer's Guide
Overview
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Components include machine status, partition management, job management, scheduling, and stream copy modules. SLURM requires no kernel modifications for it operation and is relatively self-contained.
There is an overview of the components and their interactions available in a separate document, SLURM: Simple Linux Utility for Resource Management [PDF].
SLURM is written in the C language and uses a GNU autoconf configuration engine. While initially written for Linux, other UNIX-like operating systems should be easy porting targets. Code should adhere to the Linux kernel coding style. (Some components of SLURM have been taken from various sources. Some of these components do not conform to the Linux kernel coding style. However, new code written for SLURM should follow these standards.)
Many of these modules have been built and tested on a variety of Unix computers including Red Hat Linux, IBM's AIX, Sun's Solaris, and Compaq's Tru-64. The only module at this time that is operating system dependent is src/slurmd/read_proc.c. We will be porting and testing on additional platforms in future releases.
Plugins
To make the use of different infrastructures possible, SLURM uses a general purpose plugin mechanism. A SLURM plugin is a dynamically linked code object that is loaded explicitly at run time by the SLURM libraries. It provides a customized implementation of a well-defined API connected to tasks such as authentication, interconnect fabric, task scheduling, etc. A set of functions is defined for use by all of the different infrastructures of a particular variety. When a SLURM daemon is initiated, it reads the configuration file to determine which of the available plugins should be used. A plugin developer's guide is available with general information about plugins. Most plugin types also have their own documentation available, such as SLURM Authentication Plugin API and SLURM Job Completion Logging API.
Directory Structure
The contents of the SLURM directory structure will be described below in increasing detail as the structure is descended. The top level directory contains the scripts and tools required to build the entire SLURM system. It also contains a variety of subdirectories for each type of file.
General build tools/files include: acinclude.m4, autogen.sh, configure.ac, Makefile.am, Make-rpm.mk, META, README, slurm.spec.in, and the contents of the auxdir directory. autoconf and make commands are used to build and install SLURM in an automated fashion. NOTE: autoconf version 2.52 or higher is required to build SLURM. Execute autoconf -V to check your version number. The build process is described in the README file.
Copyright and disclaimer information are in the files COPYING and DISCLAIMER. All of the top-level subdirectories are described below.
auxdirUsed for building SLURM.
contribsVarious contributed tools.
docDocumentation including man pages.
etcSample configuration files.
slurmHeader files for API use. These files must be installed. Placing
these header files in this location makes for better code portability.
srcContains all source code and header files not in the "slurm" subdirectory
described above.
testsuiteDejaGnu and Expect are used for testing all of its files
are here.
Documentation
All of the documentation is in the subdirectory doc. Two directories are of particular interest:
doc/man contains the man pages for the APIs,
configuration file, commands, and daemons.
doc/html contains the web pages.
Source Code
Functions are divided into several categories, each in its own subdirectory. The details of each directory's contents are proved below. The directories are as follows:
apiApplication Program Interfaces into
the SLURM code. Used to send and get SLURM information from the central manager.
These are the functions user applications might utilize.
commonGeneral purpose functions for widespread use throughout
SLURM.
databaseVarious database files that support the accounting
storage plugin.
pluginsPlugin functions for various infrastructures or optional
behavior. A separate subdirectory is used for each plugin class:
- accounting_storage for specifying the type of storage for accounting,
- auth for user authentication,
- checkpoint for system-initiated checkpoint and restart of user jobs,
- crypto for cryptographic functions,
- jobacct_gather for job accounting,
- jobcomp for job completion logging,
- mpi for MPI support,
- priority calculates job priority based on a number of factors
including fair-share,
- proctrack for process tracking,
- sched for job scheduler,
- select for a job's node selection,
- switch for switch (interconnect) specific functions,
- task for task affinity to processors,
- topology methods for assigning nodes to jobs based on node
topology.
sacctUser command to view accounting information about jobs.
sacctmgrUser and administrator tool to manage accounting.
sallocUser command to allocate resources for a job.
sattachUser command to attach standard input, output and error
files to a running job or job step.
sbatchUser command to submit a batch job (script for later execution).
sbcastUser command to broadcast a file to all nodes associated
with an existing SLURM job.
scancelUser command to cancel (or signal) a job or job step.
scontrolAdministrator tool to manage SLURM.
sinfoUser command to get information on SLURM nodes and partitions.
slurmctldSLURM central manager daemon code.
slurmdSLURM daemon code to manage the compute server nodes including
the execution of user applications.
slurmdbdSLURM database daemon managing access to the accounting
storage database.
smapUser command to view layout of nodes, partitions, and jobs.
This is particularly valuable on systems like Bluegene, which has a three
dimension torus topography.
sprioUser command to see the breakdown of a job's priority
calculation when the Multifactor Job Priority plugin is installed.
squeueUser command to get information on SLURM jobs and job steps.
sreportUser command to view various reports about past
usage across the enterprise.
srunUser command to submit a job, get an allocation, and/or
initiation a parallel job step.
srun_crCheckpoint/Restart wrapper for srun.
sshareUser command to view shares and usage when the Multifactor
Job Priority plugin is installed.
sstatUser command to view detailed statistics about running
jobs when a Job Accounting Gather plugin is installed.
striggerUser and administrator tool to manage event triggers.
sviewUser command to view and update node, partition, and job
job state information.
Configuration
Sample configuration files are included in the etc subdirectory. The slurm.conf can be built using a configuration tool. See doc/man/man5/slurm.conf.5 and the man pages for other configuration files for more details. init.d.slurm is a script that determines which SLURM daemon(s) should execute on any node based upon the configuration file contents. It will also manage these daemons: starting, signalling, restarting, and stopping them.
Test Suite
The testsuite files use a DejaGnu framework for testing. These tests are very limited in scope.
We also have a set of Expect SLURM tests available under the testsuite/expect directory. These tests are executed after SLURM has been installed and the daemons initiated. About 250 test scripts exercise all SLURM commands and options including stress tests. The file testsuite/expect/globals contains default paths and procedures for all of the individual tests. At the very least, you will need to set the slurm_dir variable to the correct value. To avoid conflicts with other developers, you can override variable settings in a separate file named testsuite/expect/globals.local.
Set your working directory to testsuite/expect before starting these tests. Tests may be executed individually by name (e.g. test1.1) or the full test suite may be executed with the single command regression. See testsuite/expect/README for more information.
Adding Files and Directories
If you are adding files and directories to SLURM, it will be necessary to re-build configuration files before executing the configure command. Update Makefile.am files as needed then execute autogen.sh before executing configure.
Tricks of the Trade
HAVE_FRONT_END
You can make a single node appear to SLURM as a Linux cluster by running configure with the --enable-front-end option. This defines b>HAVE_FRONT_END with a non-zero value in the file config.h. All (fake) nodes should be defined in the slurm.conf file. These nodes should be configured with a single NodeAddr value indicating the node on which single slurmd daemon executes. Initiate one slurmd and one slurmctld daemon. Do not initiate too many simultaneous job steps to avoid overloading the slurmd daemon executing them all.
Multiple slurmd support
It is possible to run multiple slurmd daemons on a single node, each using a different port number and NodeName alias. This is very useful for testing networking and protocol changes, or anytime you want to simulate a larger cluster than you really have. The author uses this on his desktop to simulate multiple nodes. However, it is important to note that not all slurm functions will work with multiple slurmd support enabled (e.g. many switch plugins will not work, it is best to use switch/none).
Multiple support is enabled at configure-time with the
"--enable-multiple-slurmd" parameter. This enables a new parameter in the
slurm.conf file on the NodeName line, "Port=
Each slurmd needs to have its own NodeName, and its own TCP port number. Here is an example of the NodeName lines for running three slurmd daemons on each of ten nodes:
NodeName=foo[1-10] NodeHostname=host[1-10] Port=17001 NodeName=foo[11-20] NodeHostname=host[1-10] Port=17002 NodeName=foo[21-30] NodeHostname=host[1-10] Port=17003
It is likely that you will also want to use the "%n" symbol in any slurmd related paths in the slurm.conf file, for instance SlurmdLogFile, SlurmdPidFile, and especially SlurmdSpoolDir. Each slurmd replaces the "%n" with its own NodeName. Here is an example:
SlurmdLogFile=/var/log/slurm/slurmd.%n.log SlurmdPidFile=/var/run/slurmd.%n.pid SlurmdSpoolDir=/var/spool/slurmd.%n
It is up to you to start each slurmd daemon with the proper NodeName. For example, to start the slurmd daemons for host1 from the above slurm.conf example:
host1> slurmd -N foo1 host1> slurmd -N foo11 host1> slurmd -N foo21
Last modified 27 March 2009