General pipeline engine level options details and default values can be emitted to stdout using:

$> pbsmrtpipe show-workflow-options

The pipeline level options can be set in the options section of the preset (or workflow) XML file.

<?xml version="1.0" ?>
<pipeline-template-preset id="MyPreset">

        <name>My Preset</name>
        <description>A description of my preset</description>

    <!-- Reference Workflow template by id -->
    <importTemplate id="pbsmrtpipe.pipelines.dev_01" />

    <!-- Default Pipeline Engine Options -->
        <!-- MAX Number of Processors per Task-->
        <option id="pbsmrtpipe.options.max_nproc">

        <!-- Enable Chunked mode -->
        <option id="pbsmrtpipe.options.chunk_mode">

        <!-- MAX Number of NPROC -->
        <option id="pbsmrtpipe.options.max_nchunks">


    <!-- Default Task specific Options -->

        <option id="pbsmrtpipe.task_options.option_id1">
        <option id="pbsmrtpipe.task_options.option_id2">

To have a global base level preset applied to all pipeline executions, create a preset.xml file, then export the ENV variable “PB_SMRTPIPE_XML_PRESET”


<?xml version="1.0" encoding="UTF-8"?>
    <!-- Pipeline Engine Level Options -->
        <option id="pbsmrtpipe.options.max_nchunks" >
        <option id="pbsmrtpipe.options.cluster_manager" >

    <!-- Task Options -->
        <!-- Typically, this is NOT a good idea to a single task option to
         to every pipeline execution. -->
        <option id="pbsmrtpipe.task_options.option_id1">
</pipeline-template-preset >

Cluster Manager

The cluster manager configuration (pbsmrtpipe.options.cluster_manager) is often the first item that needs to be configured to enable distributed computing of your pipeline.

The cluster manager configuration points to a directory of two cluster templates, start.tmpl and stop.tmpl.

The start.tmpl is the most important. It must contain a single bash line that exposes several template variables that will be replaced. This will often have the queue name that jobs will be submitted to.

qsub -pe smp ${NPROC} -S /bin/bash -V -q default -N ${JOB_ID} \
    -o ${STDOUT_FILE}\
    -e ${STDERR_FILE}\

Required Template Variables

  • NPROC the number of processors/slots to use
  • JOB_ID This will adhere to the
  • STDOUT_FILE the qsub level stdout
  • STDERR_FILE the qsub level stderr
  • CMD this will the absolute path to the command to be executed


The cluster manager configuration can point to a python package which contains the cluster templates, or an absolute path to the cluster templates. The python package model is used primarily for internal purposes.

Stop Template example. Only JOB_ID is required.

qdel ${JOB_ID}

Environment Handling

By default pbsmrtpipe will use the standard *nix model of inheriting the parent environment to find executables in your path.

The simplest model is to create a to explicitly set your desired path.

A simple example:


# Contrived example

# Source my python virtualenv with pbsmrtpipe
source /path/to/my-ve/bin/activate

# Use a custom version of blasr

# Make Sure qsub is in path to run jobs on cluster for SGE
export SGE_ROOT=/path/to/sge-root

# this must be consistent with the SGE_ROOT

# Now pbsmrtpipe will be able to find blasr, qsub

If you’re building several tools, you may need to wrap each tool to keep each tool wrapped in it’s own environment settings.