Pipeline inspection

FlowCraft offers an inspect mode for tracking the progress of a nextflow pipeline either directly in a terminal (overview) or by broadcasting information to the flowcraft web application (broadcast).

Note

This mode was design for nextflow pipelines generated by FlowCraft. It should be possible to inspect any nextflow pipeline, provided that the requirements below are met, but compatibility it’s not guaranteed.

How it works: Simply run flowcraft inspect -m <mode> in the directory where the pipeline is running. In either run mode, FlowCraft will keep running (until you cancel it) and continuously update the progress of a pipeline. If the pipeline is interrupted or fails for some reason, FlowCraft should be able to correctly reset the inspection automatically when resuming its execution.

Requirements for inspect

While the inspect mode is running, it will parse the information written into two files that are generated by nextflow:

  • .nextflow.log: The log file that is automatically generated by nextflow.
  • trace file: The trace file that is generated by nextflow when using the -with-trace option. By default, it searches for the pipeline_stats.txt file, but this can be changed using the -i option.

Trace fields

FlowCraft parses several fields of the trace file, but only a few are mandatory for its execution. If the trace file does not contain any of the optional fields, that information will simply not appear on the terminal or web app. Nevertheless, to take full advantage of the inspect mode, the following trace fields should be present:

  • Mandatory:
    • tag: The tag of the nextflow process. Flowcraft assumes that this is a string with only the sample name (e.g.: SampleA). While this is not strictly required, providing strings with other information (e.g.: Running bowtie for sampleA) may result in some inconsistencies in the inspection.
    • task_id: The task ID is used to skip entries that have already been parsed.
  • Optional:
    • hash: Used to get the work directory the process execution.
    • cpus, %cpu, memory, rss, rchar and wchar: Used for statistics of computational resources.

Note

Any additional fields present in the trace file are ignored.

Usage

flowcraft inspect --help
usage: flowcraft inspect [-h] [-i TRACE_FILE] [-r REFRESH_RATE]
                         [-m {overview,broadcast}] [-u URL] [--pretty]

optional arguments:
  -h, --help            show this help message and exit
  -i TRACE_FILE         Specify the nextflow trace file.
  -r REFRESH_RATE       Set the refresh frequency for the continuous inspect
                        functions
  -m {overview,broadcast}, --mode {overview,broadcast}
                        Specify the inspection run mode.
  -u URL, --url URL     Specify the URL to where the data should be broadcast
  --pretty              Pretty inspection mode that removes usual reporting
                        processes.
  • -i: Used to specify the path to the trace file that should be parsed. By default, FlowCraft will try to parse the pipeline_stats.txt file in current working directory.
  • -r: Sets the time interval in seconds between each parsing of the relevant nextflow files. By default it is set to 0.01.
  • -m: The inspection mode. overview is the terminal display while broadcast sends the data to FlowCraft’s web service.
  • -u: The URL of FlowCraft’s web service. By default it is already set to the main service and you do not need to specify it. It is only useful when the service is running on local host or in other custom instance.
  • --pretty: By default the inspection shows the progress of all processes in the pipeline. Using this option filters the processes to the most relevant ones of FlowCraft’s pipelines.