graph LR A(file?.tsv) --> B[/remove_comments/] B --> C[/take_column/] C --> D[/combine_columns/] D --> E(output)
This guide explains how to create an example pipeline that’s closer to a typical use-case of a Nextflow bioinformatics pipeline.
This page assumes knowledge of how to create and manipulate Nextflow channels using DSL2. For more information, check out the Nextflow reference docs or contact Data Intuitive for a complete Nextflow+Viash course.
To get started with building a pipeline, we provide a template project which already contains a few components. First create a new repository by clicking the “Use this template” button in the viash_project_template repository or clicking the button below.
Then clone the repository using the following command.
The pipeline already contains three components with which we will build the following pipeline:
graph LR A(file?.tsv) --> B[/remove_comments/] B --> C[/take_column/] C --> D[/combine_columns/] D --> E(output)
remove_comments
is a Bash script which removes all lines starting with a #
from a file.take_column
is a Python script which extracts one of the columns in a TSV file.combine_columns
is an R script which combines multiple files into a TSV.First, we need to build the components into VDSL3 modules.
Exporting combine_columns (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/combine_columns
Exporting take_column (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/take_column
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/combine_columns:dev' with Dockerfile
Exporting remove_comments (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/remove_comments
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/remove_comments:dev' with Dockerfile
Exporting remove_comments (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/remove_comments
Exporting combine_columns (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/combine_columns
Exporting take_column (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/take_column
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/take_column:dev' with Dockerfile
[32mAll 6 configs built successfully[0m
Once everything is built, a new target directory has been created containing the executables and modules grouped per platform:
target
├── docker
│ └── demo
│ ├── combine_columns
│ │ └── combine_columns
│ ├── remove_comments
│ │ └── remove_comments
│ └── take_column
│ └── take_column
└── nextflow
└── demo
├── combine_columns
│ ├── main.nf
│ └── nextflow.config
├── remove_comments
│ ├── main.nf
│ └── nextflow.config
└── take_column
├── main.nf
└── nextflow.config
10 directories, 9 files
Below is a first Nextflow pipeline which uses just one VDSL3 module and with hard-coded input parameters (file1 and file2).
nextflow.enable.dsl=2
include { remove_comments } from "./target/nextflow/demo/remove_comments/main.nf"
workflow {
// Create a channel with two events
// Each event contains a string (an identifier) and a file (input)
Channel.fromList([
["file1", file("resources_test/file1.tsv")],
["file2", file("resources_test/file2.tsv")]
])
// View channel contents
| view { tup -> "Input: $tup" }
// Process the input file using the 'remove_comments' module.
// This removes comment lines from the input TSV.
| remove_comments.run(
directives: [
publishDir: "output/"
]
)
// View channel contents
| view { tup -> "Output: $tup" }
}
It’s important to note what the interface of every VDSL3 module is. A VDSL3 module expects an input to be a tuple with the following elements:
id
(String
): A unique identifier used for tracking data objects and for ensuring output filenames are unique.data
(Map[String, Any]
or File
): A named map (or dictionary) used to pass the module’s input arguments. If the module only has a single input file, the file itself can simply be passed....
(Any*
): Any other elements in the tuple simply pass through the module without being altered in any way. For this reason, it is often referred to as the “passthrough” objects.In turn, a VDSL3 module will return a tuple with the same interface, except that the input data object has been replaced with the output data:
id
(String
): The identifier from the input tuple.data
(Map[String, Any]
or File
): A named map (or dictionary) containing the module’s output files. Important: If the module only has a single output file, the file itself will be returned....
(Any*
): The passthrough objects from the input tuple (if any)..run()
?Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
The run()
function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. In this case, we use it to set the publishDir
directive to "output/"
so the output of that step in the pipeline will be stored as output.
Now run the pipeline with Nextflow:
N E X T F L O W ~ version 22.10.6
Launching `main.nf` [agitated_kirch] DSL2 - revision: 111508427e
[- ] process > remove_comments:remove_comm... -
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]
executor > local (2)
[0d/a28c57] process > remove_comments:remove_comm... [ 50%] 1 of 2
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]
Output: [file1, /home/runner/work/website/website/guide/_viash_project_template/work/0d/a28c575fc3b352f763bec2d3fa70ad/file1.remove_comments.output.tsv]
executor > local (2)
[ad/ae6e3c] process > remove_comments:remove_comm... [100%] 2 of 2 ✔
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]
Output: [file1, /home/runner/work/website/website/guide/_viash_project_template/work/0d/a28c575fc3b352f763bec2d3fa70ad/file1.remove_comments.output.tsv]
Output: [file2, /home/runner/work/website/website/guide/_viash_project_template/work/ad/ae6e3c557217bc8292abc79ddddc20/file2.remove_comments.output.tsv]
output
├── file1.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/0d/a28c575fc3b352f763bec2d3fa70ad/file1.remove_comments.output.tsv
└── file2.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/ad/ae6e3c557217bc8292abc79ddddc20/file2.remove_comments.output.tsv
0 directories, 2 files
The above example pipeline serves as the backbone for creating more advanced pipelines. However, for the sake of simplicity it contained several hardcoded elements: