Job Directory Structure

Job directory Structure. Details and examples of the files are shown below.

  • root-job-dir/
    • stdout (pbsmrtpipe exe standard out. Minimal status updates)
    • stderr (pbsmrtpipe exe standard err. Should be the first place to look for workflow level traceback and task errors
    • logs/
      • pbsmrtpipe.log
      • master.log
    • workflow/
      • entry_points.json (this is essentially a heterogeneous dataset of the input.xml or cmdline entry points with entry_id)
      • datastore.json (fundamental store of all output files. Also contains initial task and workflow level values)
    • html/
      • css/
      • js/
      • index.html (main summary page)
    • tasks/ # all tasks are here
      • {task_id}-{instance_id}/ # each task has a directory
        • tool-contract.json (Tool Contract for {task_id})
        • resolved-tool-contract.json (Resolved Tool Contract. Contains paths to files and specific options used in the task execution)
        • runnable-task.json (
        • task-report.json (when the task is completed)
        • outfile.1.txt (output files)
        • stdout
        • stderr
        • task.log (optional if task uses $logfile in resources
        • cluster.stdout (if non-local job)
        • cluster.stderr (if non-local job)
        • cluster.sh (qsub submission script)
      • {task_id}-{instance_id}/ # more tasks

Example datastore.json

File paths are workflow directory.

{
    "files": [
        {
            "file_id": "global-graph-file-node-id",
            "type_id": "PacBio.FileTypes.JsonReport",
            "path": "relative/path/to/report.json",
            "produced_by": "task_instance_id"
        },
        {
            "file_id": "my_file_id",
            "type_id": "PacBio.FileTypes.csv",
            "path": "relative/path/to/f.csv",
            "produced_by": "task_id-instance_id"
        }
    ],
    "version": "0.4.3"
}

Task Specific Components

The are several structured JSON files within a task directory of note.

  • tool-contract.json
  • resolved-tool-contract.json
  • runnable-task.json
  • task-report.json

Example Tool Contract JSON

See pbcommand for more details.

{
    "driver": {
        "env": {},
        "exe": "python -m pbsmrtpipe.pb_tasks.dev  run-rtc  ",
        "serialization": "json"
    },
    "tool_contract": {
        "_comment": "Created by v0.4.11",
        "description": "Quick tool dev_reference_ds_report pbsmrtpipe.tasks.dev_reference_ds_report",
        "input_types": [
            {
                "description": "description for PacBio.DataSet.ReferenceSet_0",
                "file_type_id": "PacBio.DataSet.ReferenceSet",
                "id": "Label PacBio.DataSet.ReferenceSet_0",
                "title": "<DataSetFileType id=PacBio.DataSet.ReferenceSet name=file >"
            }
        ],
        "is_distributed": false,
        "name": "Tool dev_reference_ds_report",
        "nproc": 3,
        "output_types": [
            {
                "default_name": "report",
                "description": "description for <FileType id=PacBio.FileTypes.JsonReport name=report >",
                "file_type_id": "PacBio.FileTypes.JsonReport",
                "id": "Label PacBio.FileTypes.JsonReport_0",
                "title": "<FileType id=PacBio.FileTypes.JsonReport name=report >"
            }
        ],
        "resource_types": [],
        "schema_options": [
            {
                "$schema": "http://json-schema.org/draft-04/schema#",
                "pb_option": {
                    "default": false,
                    "description": "Option dev_diagnostic_strict description",
                    "name": "Option dev_diagnostic_strict",
                    "option_id": "pbsmrtpipe.task_options.dev_diagnostic_strict",
                    "type": "boolean"
                },
                "properties": {
                    "pbsmrtpipe.task_options.dev_diagnostic_strict": {
                        "default": false,
                        "description": "Option dev_diagnostic_strict description",
                        "title": "Option dev_diagnostic_strict",
                        "type": "boolean"
                    }
                },
                "required": [
                    "pbsmrtpipe.task_options.dev_diagnostic_strict"
                ],
                "title": "JSON Schema for pbsmrtpipe.task_options.dev_diagnostic_strict",
                "type": "object"
            }
        ],
        "task_type": "pbsmrtpipe.task_types.standard",
        "tool_contract_id": "pbsmrtpipe.tasks.dev_reference_ds_report"
    },
    "tool_contract_id": "pbsmrtpipe.tasks.dev_reference_ds_report",
    "version": "0.1.0"
}

Example Resolved Tool Contracts JSON

See pbcommand for more details.

{
    "driver": {
        "env": {},
        "exe": "python -m pbsmrtpipe.pb_tasks.dev  run-rtc  ",
        "serialization": "json"
    },
    "resolved_tool_contract": {
        "_comment": "Created by pbcommand v0.4.11",
        "input_files": [
            "/Users/mkocher/gh_projects/pbsmrtpipe/testkit-data/dev_diagnostic/referenceset.xml"
        ],
        "is_distributed": false,
        "log_level": "INFO",
        "nproc": 3,
        "options": {
            "pbsmrtpipe.task_options.dev_diagnostic_strict": false
        },
        "output_files": [
            "/Users/mkocher/gh_projects/pbsmrtpipe/testkit-data/dev_diagnostic/job_output/tasks/pbsmrtpipe.tasks.dev_reference_ds_report-0/report.json"
        ],
        "resources": [],
        "task_type": "pbsmrtpipe.task_types.standard",
        "tool_contract_id": "pbsmrtpipe.tasks.dev_reference_ds_report"
    }
}

Example Runnable Task JSON

In each task directory, a runnable-task is written. It contains all the metadata necessary for the task to be run on the execution node (or run locally). The task manifest is run via the pbtools-runner commandline tool.

There are several motivations for pbtool-runner abstraction - NFS checks to validate input files can be found (related to python NFS caching errors that often result in IOErrors) - create and cleanup tmp resources on execution node - write env.json to document the environment vars - write metadata results about the output of the tasks - allow a bit of tweaking and rerunning by hand for failed tasks - strict documenting of input files, resolved task type, resolved task options used to run the task

runnable-task.json contains all the resolved task type, resolved task values, such as resolved options, input files, output files, and resources (e.g., temp files, temp dirs).

Example file.

{
    "cluster": {
        "start": "qsub -S /bin/bash -sync y -V -q default -N ${JOB_ID} \\\n    -o \"${STDOUT_FILE}\" \\\n    -e \"${STDERR_FILE}\" \\\n    -pe smp ${NPROC} \\\n    \"${CMD}\"", 
        "stop": "qdel ${JOB_ID}"
    }, 
    "env": {}, 
    "id": "pbsmrtpipe.tasks.dev_reference_ds_report", 
    "resource_types": [], 
    "task": {
        "cmds": [
            "python -m pbsmrtpipe.pb_tasks.dev  run-rtc   /Users/mkocher/gh_projects/pbsmrtpipe/testkit-data/dev_diagnostic/job_output/tasks/pbsmrtpipe.tasks.dev_reference_ds_report-0/resolved-tool-contract.json"
        ], 
        "input_files": [
            "/Users/mkocher/gh_projects/pbsmrtpipe/testkit-data/dev_diagnostic/referenceset.xml"
        ], 
        "is_distributed": false, 
        "nproc": 3, 
        "options": {
            "pbsmrtpipe.task_options.dev_diagnostic_strict": false
        }, 
        "output_dir": "/Users/mkocher/gh_projects/pbsmrtpipe/testkit-data/dev_diagnostic/job_output/tasks/pbsmrtpipe.tasks.dev_reference_ds_report-0", 
        "output_files": [
            "/Users/mkocher/gh_projects/pbsmrtpipe/testkit-data/dev_diagnostic/job_output/tasks/pbsmrtpipe.tasks.dev_reference_ds_report-0/report.json"
        ], 
        "resources": [], 
        "task_id": "pbsmrtpipe.tasks.dev_reference_ds_report", 
        "task_type_id": "pbsmrtpipe.tasks.dev_reference_ds_report", 
        "uuid": "252d72cd-4617-4a33-9622-606720dec512"
    }, 
    "version": "0.44.2"
}

Example task-report.json

After pbtool-runner completes executing the task, a metadata report of job is written to job directory.

task-report.json contains basic metadata about the completed task.

{
    "_changelist": 127707,
    "_version": "2.1",
    "attributes": [
        {
            "id": "workflow_task.host",
            "name": null,
            "value": "mp-f027.nanofluidics.com"
        },
        {
            "id": "workflow_task.run_time",
            "name": null,
            "value": 3563
        },
        {
            "id": "workflow_task.exit_code",
            "name": null,
            "value": 0
        },
        {
            "id": "workflow_task.error_msg",
            "name": null,
            "value": ""
        }
    ],
    "id": "workflow_task",
    "plotGroups": [],
    "tables": []
}