SLURM Administrator Guide for Sun Constellation systems

Overview

This document describes the unique features of SLURM on Sun Constellation computers. You should be familiar with the SLURM's mode of operation on Linux clusters before studying the relatively few differences in Sun Constellation system operation described in this document.

SLURM's primary mode of operation is designed for use on clusters with nodes configured in a one-dimensional space. A topology plugin was developed to optimize resource allocations in three dimensions. Changes were also made for hostlist parsing to support hostnames of an appropriate format.

Configuration

Two variables must be defined in the config.h file: HAVE_SUN_CONST and SYSTEM_DIMENSIONS=4 (more on that value later). This can be accomplished in several different ways depending upon how SLURM is being built.

  1. Execute the configure command with the option --enable-sun-const OR
  2. Execute the rpmbuild command with the option --with sun_const OR
  3. Add %with_sun_const 1 to your ~/.rpmmacros file.

Node names must have a four-digit suffix describing identifying their location (this is why SYSTEM_DIMENSIONS is configured to be 4). The first three digits specify the node's zero-origin position in the X-, Y- and Z-dimension respectively. This is followed by one digit specifying the node's sequence number at that coordinate (e.g. "tux0123" for X=0, Y=1, Z=2, sequence_number=3; "tux1234" for X=1, Y=2, Z=3, sequence_number=4). The coordinate location should be zero-origin (starting at X=0, Y=0, Z=0). The sequence number should also start at zero and can include upper case letters for higher values, for up to 36 nodes at a specific coordinate (e.g. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, ... Z). To avoid confusion, we recommend that the node name prefix consist of lower case letters. Numerically sequential node names may specified by in SLURM commands and configuration files using the system name prefix with the end-points enclosed in square brackets and separated by an "-". For example "tux[0000-000B]" is used to represent the twelve nodes with sequence numbers from 0 to B, all at coordinate X=0, Y=0 and Z=0. Alternately, rectangular prisms of node names can be specified using the system name prefix with the end-points enclosed in square brackets and separated by an "x". For example "tux[0000x0111]" is used to represent the eight nodes in a block with endpoints at "tux0000" and "tux0111" (tux0000, tux0001, tux0010, tux0011, tux0100, tux0101, tux0110 and tux0111). Viewed another way, these eight nodes have sequence numbers 0 or 1 and have four distinct coordinates (000, 001, 010 and 011). While node names of this form are required for SLURM's internal use, it need not be the name returned by the hostlist -s command. See man slurm.conf for details on how to use the NodeName, NodeAddr and NodeHostName configuration parameters for flexibility in this matter.

Next you need to select from two options for the resource selection plugin (the SelectType option in SLURM's slurm.conf configuration file):

  1. select/cons_res - Performs a best-fit algorithm based upon a one-dimensional space to allocate whole nodes, sockets, or cores to jobs based upon other configuration parameters.
  2. select/linear - Performs a best-fit algorithm based upon a one-dimensional space to allocate whole nodes to jobs.

In order for select/cons_res or select/linear to allocate resources physically nearby in four-dimensional space, the nodes be specified in SLURM's slurm.conf configuration file in such a fashion that those nearby in slurm.conf (managed internal to SLURM as a one-dimensional space) are also nearby in the physical four-dimensional space.

SLURM can automatically perform that conversion using a Hilbert curve. Set TopologyPlugin=topology/3d_torus in SLURM's slurm.conf configuration file for nodes to be reordered appropriately. First a three-dimensional Hilbert curve is constructed through all coordinates in the system such that every coordinate in the order list physically adjacent. The node list are then re-ordered following that Hilbert curve while maintaining the node's sequence number (i.e. not building a Hilbert curve through that fourth dimension). If the number of nodes at each coordinate varies, it may be necessary to put a separate node definition line in the slurm.conf file. If that is the case, put them in numeric order for the topology/3d_torus plugin to function properly.

Alternately configure TopologyPlugin=topology/none and construct your own node ordering sequence as desired in slurm.conf. Note that each node must be listed exactly once and consecutive nodes should be nearby in three-dimensional space. The open source code used by SLURM to generate the Hilbert curve is included in the distribution at contribs/skilling.c in the event that you wish to experiment with it to generate your own node ordering. Two examples of SLURM configuration files are shown below:

# slurm.conf for Sun Constellation system of size 4x4x4
# with eight nodes at each coordinate (512 nodes total)

# Configuration parameters removed here

# Automatic orders nodes following a Hilbert curve
NodeName=DEFAULT CPUs=8 RealMemory=2048 State=Unknown
NodeName=tux[0000x3337]
PartitionName=debug Nodes=tux[0000x3337] Default=Yes State=UP
# slurm.conf for Sun Constellation system of size 1x2x2
# with a different count of nodes at each coordinate

# Configuration parameters removed here

# Manual ordering of nodes following a space-filling curve
NodeName=DEFAULT CPUs=8 RealMemory=2048 State=Unknown
NodeName=tux[0000-0007]  #  8 nodes at 0,0,0
NodeName=tux[0010-001B]  # 12 nodes at 0,0,1
NodeName=tux[0100-0107]  #  8 nodes at 0,1,0
NodeName=tux[0110-0115]  #  6 nodes at 0,1,1
PartitionName=DEFAULT Default=Yes State=UP
PartitionName=debug Nodes=tux[0000-0007,0010-001B,0100-0107,0110-0115]

Tools

The node names output by the scontrol show nodes command will be ordered as defined (sequentially along the Hilbert curve) rather than in numeric order (e.g. "tux0010" may follow "tux1010" rather than "tux0000"). The output of the smap and sview commands will also display nodes ordered by the Hilbert curve so that nodes appearing adjacent in the display will be physically adjacent. This permits the locality of a job, partition or reservation to be easily determined. In order to locate specific nodes with the sview command, select Actions, Search and Node(s) Name then enter the desired node names. The output of other SLURM commands (e.g. sinfo and squeue) will use a SLURM hostlist expression with the node names numerically ordered). SLURM partitions should contain nodes which are defined sequentially by that ordering for optimal performance.

Last modified 4 August 2009