Pbsmrtpipe TestKit Framework

The core functionality of pysiv for running integration tests (using RS-era smrtpipe.py) is now in pbsmrtpipe. It resides in pbsmsrtpipe.testkit.* and the monkey_patch decorator is in pbsmrtpipe.testkit.core. Both individual tasks and well as pipelines can be run using testkit (although we mostly just use the latter for production testing).

The main change is the ability to load tests from any python module (e.g., pysiv2.core.test_zero). It should be straightforward to expand the monkey_patch decorator to define your own test cases building assertions.

Tests are driven by configuration files, which by default are usually in “INI file” format. An example:

[pbsmrtpipe:pipeline]

id = dev_01
description = Dev example
author = mkocher

# debug mode
debug = True

pipeline_xml = workflow_id.xml
preset_xml = preset.xml

[entry_points]
e_01 = input.txt


[tests]
# Tests can be loaded from any python module
# specifically, Any TestBase subclass in pbsmrtpipe.teskit.core.test_zero will be loaded
pbsmrtpipe.testkit.core = test_zero, test_resources, test_datastore

# This will raise an import error if the python module isn't install
#tacos = test_tacos

The pipeline to run is specified by a separate XML file (pipeline_xml), as are any preset task options. Entry points are specified as entry ID/path pairs.

For pipelines, an alternative is to use JSON format, which is more flexible but not as lightweight.

{
  "testId": "dev_local_fasta_chunk",
  "description": "Dev example for Fasta Chunking",
  "author": "mkocher",
  "debug": true,
  "workflowXml": "workflow_id.xml",
  "presetXml": "preset.xml",
  "entryPoints": [
    {
      "entryId": "e_01",
      "path": "input.txt"
    }
  ],
  "pythonTests": {
    "pbsmrtpipe.testkit.core": ["test_zero", "test_resources", "test_datastore", "test_datastore_chunking"]
  }
}

As an alternative to a workflow XML, you can instead specify pipelineId.

Currently, there are dev integration tests in testkit-data root level directory that only dependend on pbcore. These are useful to running example jobs and understanding how pipelines are constructed. In our Perforce repos, most tests live in //depot/software/smrtanalysis/siv/testkit-jobs/sa3_pipelines, which has multiple examples for all of our production pipelines. Each test should have its own subdirectory, preferably grouped by application type. Please note that although pbsmrtpipe pipelines will often work with “bare” BAM or FASTA files as inputs instead of PacBio DataSet XML files, testkit jobs should always use the corresponding DataSet XML as entry points, to ensure compatibility with SMRT Link services (see below).

Defining Test Cases

Similarly to pysiv, tests cases can inherit from TestBase.

Here’s a example taken from test_resources, which checks if the core job resources were created correctly. The job directory can be accessed via self.job_dir.

import os
import logging

from .base import TestBase
from pbsmrtpipe.testkit.base import monkey_patch

log = logging.getLogger(__name__)


@monkey_patch
class TestCoreResources(TestBase):

    """
    Test to see of the core job directory structure and files exist
    """
    # Directories (relative to the root job) that should be created
    DIRS = ('html', 'workflow', 'tasks', 'logs')

    # Files that should exist within the 'html' subdirectory
    HTML_DIRS = ('css', 'images', 'js')
    HTML_FILES = ('index.html', 'settings.html', 'workflow.html', 'datastore.html')

    # Fils that should exist within the 'workflow' directory
    WORKFLOW_FILES = ('datastore.json', 'entry-points.json', 'report-tasks.json', 'options-task.json',
                      'options-workflow.json', 'workflow.dot', 'workflow.png', 'workflow.svg')

    def test_job_path_exists(self):
        self.assertTrue(os.path.exists(self.job_dir))

Running Testkit

pbsmrtpipe testkit jobs may be run in one of two ways:

  1. directly on the command line, using pbtestkit-runner.
  2. indirectly via SMRT Link services, using pbtestkit-service-runner; this requires a separate working SMRT Link server.

In addition, tests may be run in parallel using the corresponding “multirunner” commands (see below).

For command-line-only testing, pbtestkit-runner testkit.cfg is the minimal command required, which will use the default workflow options specified in the preset.xml. This can be further modified by the --force-distributed and --local-only options which control use of a queuing system to dispatch tasks, and by --force-chunk-mode and --disable-chunk-mode to toggle chunking. (We recommend that you try all possible modes when developing a pipeline, but simple functional tests are often quicker to run locally and unchunked.) If you have run the testkit job already and just want to re-run the test suite with modifications, add --only-tests.

Testkit Tool to run pbsmrtpipe jobs.

usage: pbtestkit-runner [-h] [--version] [--log-file LOG_FILE]
                        [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]
                        [--output-xml OUTPUT_XML] [--ignore-test-failures]
                        [--only-tests] [--force-distributed | --local-only]
                        [--force-chunk-mode | --disable-chunk-mode]
                        testkit_cfg

Positional Arguments

testkit_cfg Path to testkit.cfg file.

Named Arguments

--version show program’s version number and exit
--log-file Write the log to file. Default(None) will write to stdout.
--log-level

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL

Set log level

Default: “INFO”

--debug

Alias for setting log level to DEBUG

Default: False

--quiet

Alias for setting log level to CRITICAL to suppress output.

Default: False

-v, --verbose Set the verbosity level.
--output-xml Path to output XUnit XML
--ignore-test-failures
 

Exit with code 0 if pbsmrtpipe ran successfully, even if some tests fail

Default: False

--only-tests

Only run the tests.

Default: False

--force-distributed
 Override XML settings to enable distributed mode (if cluster manager is provided)
--local-only Override XML settings to disable distributed mode. All Task will be submitted to build-7100375-project-9441-pbsmrtpipe
--force-chunk-mode
 Override to enable Chunk mode
--disable-chunk-mode
 Override to disable Chunk mode

Testing via services is slightly less flexible but has the added advantage of making the job visible in SMRT Link, including all reports and plots, plus checking for functionality in the server environment. Note that services absolutely requires that all entry points be DataSet XML files. The equivalent command is slightly more complicated:

$ pbtestkit-service-runner --host smrtlink-bihourly --port 8081 testkit.cfg

Workflow options (but not task options) will be ignored in this mode in favor of whatever the server defaults are. To re-run tests only, you need to first determine the integer ID of the smrtlink job, then add --only-tests <ID> to the arguments.

Utility for running a testkit job through services as an alternative to pbtestkit-runner.

usage: pbtestkit-service-runner [-h] [--version] [--log-file LOG_FILE]
                                [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]
                                [-u HOST] [-p PORT] [--user USER]
                                [--password PASSWORD] [-x XML_OUT]
                                [-n NUNIT_OUT] [-t TIME_OUT] [-s SLEEP]
                                [--ignore-test-failures] [--import-only]
                                [--only-tests TEST_JOB_ID]
                                testkit_cfg

Positional Arguments

testkit_cfg

Named Arguments

--version show program’s version number and exit
--log-file Write the log to file. Default(None) will write to stdout.
--log-level

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL

Set log level

Default: “INFO”

--debug

Alias for setting log level to DEBUG

Default: False

--quiet

Alias for setting log level to CRITICAL to suppress output.

Default: False

-v, --verbose Set the verbosity level.
-u, --host

Hostname of SMRT Link server. If this is anything other than ‘localhost’ you must supply authentication.

Default: “localhost”

-p, --port

Services port number

Default: 8081

--user User to authenticate with (if using HTTPS)
--password Password to authenticate with (if using HTTPS)
-x, --xunit

Output XUnit test results

Default: “test-output.xml”

-n, --nunit

Optional NUnit output file, used for JIRA/Xray integration; will be written only if the ‘xray_tests’ field is populated.

Default: “nunit_out.xml”

-t, --timeout

Timeout for blocking after job submission

Default: 1800

-s, --sleep

Sleep time after job submission

Default: 2

--ignore-test-failures
 

Only exit with non-zero return code if the job itself failed, regardless of test outcome

Default: False

--import-only

Import datasets without running pipeline

Default: False

--only-tests Run tests on an existing smrtlink job

For running suites of related tests, the multirunner commands take an FOFN pointing to the testkit configs to run:

$ pbtestkit-multirunner mapping_tests.txt --nworkers 8
$ pbtestkit-service-multirunner mapping_tests.txt --nworkers 8 --host smrtlink-bihourly --port 8081

pbtestkit-multirunner also supports the same arguments to control job distribution and chunking. Note however that test-only mode is not supported by the multirunners.

Run multiple testkit.cfg files in parallel

usage: pbtestkit-multirunner [-h] [--version] [--ignore-test-failures]
                             [--force-distributed | --local-only]
                             [--force-chunk-mode | --disable-chunk-mode]
                             [--debug] [-n NWORKERS] [-j JUNIT_OUT]
                             testkit_cfg_fofn

Positional Arguments

testkit_cfg_fofn
 File of butler.cfg file name relative to the current dir (e.g., RS_Resquencing/testkit.cfg

Named Arguments

--version show program’s version number and exit
--ignore-test-failures
 

Exit with code 0 if pbsmrtpipe ran successfully, even if some tests fail

Default: False

--force-distributed
 Override XML settings to enable distributed mode (if cluster manager is provided)
--local-only Override XML settings to disable distributed mode. All Task will be submitted to build-7100375-project-9441-pbsmrtpipe
--force-chunk-mode
 Override to enable Chunk mode
--disable-chunk-mode
 Override to disable Chunk mode
--debug

Alias for setting log level to DEBUG

Default: False

-n, --nworkers

Number of jobs to concurrently run.

Default: 1

-j, --junit-xml
 

JUnit output file for all tests

Default: “junit_combined_results.xml”

Parallel wrapper for pbtestkit-service-runner, similar to pbtestkit-multirunner

usage: pbtestkit-service-multirunner [-h] [--version] [--log-file LOG_FILE]
                                     [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} | --debug | --quiet | -v]
                                     [-u HOST] [-p PORT] [-n NWORKERS]
                                     [-t TIME_OUT] [-s SLEEP]
                                     [--ignore-test-failures] [--import-only]
                                     [-j JUNIT_OUT] [-x NUNIT_OUT]
                                     testkit_cfg_fofn [testkit_cfg_fofn ...]

Positional Arguments

testkit_cfg_fofn
 Text file listing testkit.cfg files to run; you may provide more than one of these

Named Arguments

--version show program’s version number and exit
--log-file Write the log to file. Default(None) will write to stdout.
--log-level

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL

Set log level

Default: “INFO”

--debug

Alias for setting log level to DEBUG

Default: False

--quiet

Alias for setting log level to CRITICAL to suppress output.

Default: False

-v, --verbose Set the verbosity level.
-u, --host Default: “localhost”
-p, --port

Port number

Default: 8081

-n, --nworkers

Number of jobs to concurrently run.

Default: 1

-t, --timeout

Timeout for blocking after job submission

Default: 1800

-s, --sleep

Sleep time after job submission

Default: 4

--ignore-test-failures
 

Only exit with non-zero return code if the job itself failed, regardless of test outcome

Default: False

--import-only

Import datasets without running pipelines

Default: False

-j, --junit-xml
 

JUnit output file for all tests

Default: “junit_combined_results.xml”

-x, --nunit-xml
 

NUnit output file for all tests

Default: “nunit_combined_results.xml”

See the CLI docs for details Command Line Interface Tools.