Configuration

General pipeline engine level options details and default values can be emitted to stdout using:

$> pbsmrtpipe show-workflow-options

The pipeline level options can be set in the options section of the preset (or workflow) XML file.

<?xml version="1.0" ?>
<pipeline-template-preset id="MyPreset">

    <metadata>
        <version>1.1.0</version>
        <name>My Preset</name>
        <description>A description of my preset</description>
    </metadata>

    <!-- Reference Workflow template by id -->
    <importTemplate id="pbsmrtpipe.pipelines.dev_01" />

    <!-- Default Pipeline Engine Options -->
    <options>
        <!-- MAX Number of Processors per Task-->
        <option id="pbsmrtpipe.options.max_nproc">
            <value>24</value>
        </option>

        <!-- Enable Chunked mode -->
        <option id="pbsmrtpipe.options.chunk_mode">
            <value>True</value>
        </option>

        <!-- MAX Number of NPROC -->
        <option id="pbsmrtpipe.options.max_nchunks">
            <value>48</value>
        </option>

    </options>

    <!-- Default Task specific Options -->

    <task-options>
        <option id="pbsmrtpipe.task_options.option_id1">
            <value>1234</value>
        </option>
        <option id="pbsmrtpipe.task_options.option_id2">
        <value>abcd</value>
        </option>
    </task-options>
</pipeline-template-preset>

To have a global base level preset applied to all pipeline executions, create a preset.xml file, then export the ENV variable “PB_SMRTPIPE_XML_PRESET”

Example

<?xml version="1.0" encoding="UTF-8"?>
<pipeline-template-preset>
    <!-- Pipeline Engine Level Options -->
    <options>
        <option id="pbsmrtpipe.options.max_nchunks" >
            <value>13</value>
        </option>
        <option id="pbsmrtpipe.options.cluster_manager" >
            <value>/path/to/my-cluster-template-dir</value>
        </option>
    </options>

    <!-- Task Options -->
    <task-options>
        <!-- Typically, this is NOT a good idea to a single task option to
         to every pipeline execution. -->
        <option id="pbsmrtpipe.task_options.option_id1">
            <value>1234</value>
        </option>
    </task-options>
</pipeline-template-preset >

Cluster Manager

The cluster manager configuration (pbsmrtpipe.options.cluster_manager) is often the first item that needs to be configured to enable distributed computing of your pipeline.

The cluster manager configuration points to a directory of two cluster templates, start.tmpl and stop.tmpl.

The start.tmpl is the most important. It must contain a single bash line that exposes several template variables that will be replaced. This will often have the queue name that jobs will be submitted to.

qsub -pe smp ${NPROC} -S /bin/bash -V -q default -N ${JOB_ID} \
    -o ${STDOUT_FILE}\
    -e ${STDERR_FILE}\
    ${CMD}

Required Template Variables

  • NPROC the number of processors/slots to use
  • JOB_ID This will adhere to the
  • STDOUT_FILE the qsub level stdout
  • STDERR_FILE the qsub level stderr
  • CMD this will the absolute path to the command to be executed

Note

The cluster manager configuration can point to a python package which contains the cluster templates, or an absolute path to the cluster templates. The python package model is used primarily for internal purposes.

Stop Template example. Only JOB_ID is required.

qdel ${JOB_ID}

Environment Handling

By default pbsmrtpipe will use the standard *nix model of inheriting the parent environment to find executables in your path.

The simplest model is to create a setup.sh to explicitly set your desired path.

A simple example:

#!/bin/bash

# Contrived example

# Source my python virtualenv with pbsmrtpipe
source /path/to/my-ve/bin/activate

# Use a custom version of blasr
MY_BLASR_BIN=/path/to/my-blasr/bin

# Make Sure qsub is in path to run jobs on cluster for SGE
export SGE_ROOT=/path/to/sge-root

# this must be consistent with the SGE_ROOT
SGE_BIN=/mnt/software/g/gridengine/6.2u5/usr/bin

# Now pbsmrtpipe will be able to find blasr, qsub
export PATH=$MY_BLASR_BIN:$SGE_BIN:$PATH

If you’re building several tools, you may need to wrap each tool to keep each tool wrapped in it’s own environment settings.