graph LR A(file?.tsv) --> B[/remove_comments/] B --> C[/take_column/] C --> D[/combine_columns/] D --> E(output)
Create a pipeline
This guide explains how to create an example pipeline that’s closer to a typical use-case of a Nextflow bioinformatics pipeline.
This page assumes knowledge of how to create and manipulate Nextflow channels using DSL2. For more information, check out the Nextflow reference docs or contact Data Intuitive for a complete Nextflow+Viash course.
Get the template project
To get started with building a pipeline, we provide a template project which already contains a few components. First create a new repository by clicking the “Use this template” button in the viash_project_template repository or clicking the button below.
Then clone the repository using the following command.
git clone https://github.com/youruser/my_first_pipeline.git
The pipeline already contains three components with which we will build the following pipeline:
remove_comments
is a Bash script which removes all lines starting with a#
from a file.take_column
is a Python script which extracts one of the columns in a TSV file.combine_columns
is an R script which combines multiple files into a TSV.
Build the VDSL3 modules
First, we need to build the components into VDSL3 modules.
viash ns build --setup cachedbuild --parallel
Exporting remove_comments (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/remove_comments
Exporting combine_columns (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/combine_columns
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/remove_comments:dev' with Dockerfile
Exporting take_column (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/take_column
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/take_column:dev' with Dockerfile
Exporting remove_comments (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/remove_comments
Exporting combine_columns (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/combine_columns
Exporting take_column (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/take_column
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/combine_columns:dev' with Dockerfile
[32mAll 6 configs built successfully[0m
Once everything is built, a new target directory has been created containing the executables and modules grouped per platform:
tree target
target
├── docker
│ └── demo
│ ├── combine_columns
│ │ └── combine_columns
│ ├── remove_comments
│ │ └── remove_comments
│ └── take_column
│ └── take_column
└── nextflow
└── demo
├── combine_columns
│ ├── main.nf
│ └── nextflow.config
├── remove_comments
│ ├── main.nf
│ └── nextflow.config
└── take_column
├── main.nf
└── nextflow.config
10 directories, 9 files
Importing a VDSL3 module
After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module.
Example:
{ mymodule } from 'target/nextflow/mymodule/main.nf' include
VDSL3 module interface
VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an ‘id’ and a ‘state’: [id, state]
, where id
is a unique String and state
is a Map[String, Object]
. The resulting channel then consists of tuples [id, new_state]
.
Example:
{
workflow Channel.fromList([
["myid", [input: file("in.txt")]]
])
| mymodule
}
If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple. That is, an input tuple [id, input, ...]
will result in a tuple [id, output, ...]
after running the module. For example, an input tuple ["foo", [input: file("in.txt")], "bar"]
will result in an output tuple ["foo", [output: file("out.txt")], "bar"]
.
Create a pipeline
Below is a first Nextflow pipeline which uses just one VDSL3 module and with hard-coded input parameters (file1 and file2).
.enable.dsl=2
nextflow
{ remove_comments } from "./target/nextflow/demo/remove_comments/main.nf"
include
{
workflow // Create a channel with two events
// Each event contains a string (an identifier) and a file (input)
Channel.fromList([
["file1", file("resources_test/file1.tsv")],
["file2", file("resources_test/file2.tsv")]
])
// View channel contents
| view { tup -> "Input: $tup" }
// Process the input file using the 'remove_comments' module.
// This removes comment lines from the input TSV.
| remove_comments.run(
: [
directives: "output/"
publishDir]
)
// View channel contents
| view { tup -> "Output: $tup" }
}
Customizing VDSL3 modules on the fly
Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.
The un()
function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the publishDir
directive to "output/"
so the output of that step in the pipeline will be stored as output.
See the reference documentation for a complete list of arguments of .run()
.
Run the pipeline
Now run the pipeline with Nextflow:
nextflow run . \
-main-script main.nf
N E X T F L O W ~ version 23.04.3
Launching `main.nf` [admiring_faggin] DSL2 - revision: 111508427e
[- ] process > remove_comments:remove_comm... -
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]
executor > local (2)
[fd/d87e76] process > remove_comments:remove_comm... [ 0%] 0 of 2
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]
executor > local (2)
[fd/d87e76] process > remove_comments:remove_comm... [100%] 2 of 2 ✔
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]
Output: [file2, /home/runner/work/website/website/guide/_viash_project_template/work/2a/6131d1e960add2a44393353781a119/file2.remove_comments.output.tsv]
Output: [file1, /home/runner/work/website/website/guide/_viash_project_template/work/fd/d87e76abdf89aedc6e146ad317ae7e/file1.remove_comments.output.tsv]
tree output
output
├── combined.combine_columns.output
├── file1.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/fd/d87e76abdf89aedc6e146ad317ae7e/file1.remove_comments.output.tsv
└── file2.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/2a/6131d1e960add2a44393353781a119/file2.remove_comments.output.tsv
0 directories, 3 files
cat output/*
"1" 0.11
"2" 0.23
"3" 0.35
"4" 0.47
one 0.11 123
two 0.23 456
three 0.35 789
four 0.47 123
eins 0.111 234
zwei 0.222 234
drei 0.333 123
vier 0.444 123
Discussion
The above example pipeline serves as the backbone for creating more advanced pipelines. However, for the sake of simplicity it contained several hardcoded elements:
- Input parameters
- Output directory
- VDSL3 module directory