SLURM Task Plugin Programmer Guide
Overview
This document describes SLURM task management plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own SLURM scheduler plugins. This is version 1 of the API.
SLURM task management plugins are SLURM plugins that implement the SLURM task management API described herein. They would typically be used to control task affinity (i.e. binding tasks to processors). They must conform to the SLURM Plugin API with the following specifications:
const char plugin_type[]
The major type must be "task." The minor type can be any recognizable
abbreviation for the type of task management. We recommend, for example:
- affinityA plugin that implements task binding to processors. The actual mechanism used to task binding is dependent upon the available infrastructure as determined by the "configure" program when SLURM is built and the value of the TaskPluginParam as defined in the slurm.conf (SLURM configuration file).
- cgroupUse Linux cgroups for binding tasks to resources.
- noneA plugin that implements the API without providing any services. This is the default behavior and provides no task binding.
The plugin_name and plugin_version symbols required by the SLURM Plugin API require no specialization for task support. Note carefully, however, the versioning discussion below.
Data Objects
The implementation must maintain (though not necessarily directly export) an enumerated errno to allow SLURM to discover as practically as possible the reason for any failed API call. These values must not be used as return values in integer-valued functions in the API. The proper error return value from integer-valued functions is SLURM_ERROR.
API Functions
The following functions must appear. Functions which are not implemented should be stubbed.
int task_slurmd_batch_request (uint32_t job_id, batch_job_launch_msg_t *req);
Description: Prepare to launch a batch job. Establish node, socket, and core resource availability for it. Executed by the slurmd daemon as user root.
Arguments:
job_id (input)
ID of the job to be started.
req (input/output)
Batch job launch request specification.
See src/common/slurm_protocol_defs.h for the
data structure definition.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_slurmd_launch_request (uint32_t job_id, launch_tasks_request_msg_t *req, uint32_t node_id);
Description: Prepare to launch a job. Establish node, socket, and core resource availability for it. Executed by the slurmd daemon as user root.
Arguments:
job_id (input)
ID of the job to be started.
req (input/output)
Task launch request specification including node, socket, and
core specifications.
See src/common/slurm_protocol_defs.h for the
data structure definition.
node_id (input)
ID of the node on which resources are being acquired (zero origin).
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_slurmd_reserve_resources (uint32_t job_id, launch_tasks_request_msg_t *req, uint32_t node_id);
Description: Reserve resources for the initiation of a job. Executed by the slurmd daemon as user root.
Arguments:
job_id (input)
ID of the job being started.
req (input)
Task launch request specification including node, socket, and
core specifications.
See src/common/slurm_protocol_defs.h for the
data structure definition.
node_id (input)
ID of the node on which the resources are being acquired
(zero origin).
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_slurmd_suspend_job (uint32_t job_id);
Description: Temporarily release resources previously reserved for a job. Executed by the slurmd daemon as user root.
Arguments: job_id (input) ID of the job which is being suspended.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_slurmd_resume_job (uint32_t job_id);
Description: Reclaim resources which were previously released using the task_slurmd_suspend_job function. Executed by the slurmd daemon as user root.
Arguments: job_id (input) ID of the job which is being resumed.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_slurmd_release_resources (uint32_t job_id);
Description: Release resources previously reserved for a job. Executed by the slurmd daemon as user root.
Arguments: job_id (input) ID of the job which has completed.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_pre_setuid (slurmd_job_t *job);
Description: task_pre_setuid() is called before setting the UID for the user to launch his jobs. Executed by the slurmstepd program as user root.
Arguments: job (input) pointer to the job to be initiated. See src/slurmd/slurmstepd/slurmstepd_job.h for the data structure definition.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_pre_launch (slurmd_job_t *job);
Description: task_pre_launch() is called prior to exec of application task. Executed by the slurmstepd program as the job's owner. It is followed by TaskProlog program (as configured in slurm.conf) and --task-prolog (from srun command line).
Arguments: job (input) pointer to the job to be initiated. See src/slurmd/slurmstepd/slurmstepd_job.h for the data structure definition.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_post_term (slurmd_job_t *job);
Description: task_term() is called after termination of job step. Executed by the slurmstepd program as the job's owner. It is preceded by --task-epilog (from srun command line) followed by TaskEpilog program (as configured in slurm.conf).
Arguments: job (input) pointer to the job which has terminated. See src/slurmd/slurmstepd/slurmstepd_job.h for the data structure definition.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
int task_post_step (slurmd_job_t *job);
Description: task_post_step() is called after termination of all the tasks of the job step. Executed by the slurmstepd program as user root.
Arguments: job (input) pointer to the job which has terminated. See src/slurmd/slurmstepd/slurmstepd_job.h for the data structure definition.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.
Versioning
This document describes version 2 of the SLURM Task Plugin API. Future releases of SLURM may revise this API.
Last modified 29 April 2011