Create a pipeline

This guide explains how to create an example pipeline that’s closer to a typical use-case of a Nextflow bioinformatics pipeline.

Note

This page assumes knowledge of how to create and manipulate Nextflow channels using DSL2. For more information, check out the Nextflow reference docs or contact Data Intuitive for a complete Nextflow+Viash course.

Get the template project

To get started with building a pipeline, we provide a template project which already contains a few components. First create a new repository by clicking the “Use this template” button in the viash_project_template repository or clicking the button below.

Use project template

Then clone the repository using the following command.

git clone https://github.com/youruser/my_first_pipeline.git

The pipeline already contains three components with which we will build the following pipeline:

graph LR
   A(file?.tsv) --> B[/remove_comments/]
   B --> C[/take_column/]
   C --> D[/combine_columns/]
   D --> E(output)
  • remove_comments is a Bash script which removes all lines starting with a # from a file.
  • take_column is a Python script which extracts one of the columns in a TSV file.
  • combine_columns is an R script which combines multiple files into a TSV.

Build the VDSL3 modules

First, we need to build the components into VDSL3 modules.

viash ns build --setup cachedbuild --parallel
Exporting remove_comments (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/remove_comments
Exporting combine_columns (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/combine_columns
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/remove_comments:dev' with Dockerfile
Exporting take_column (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/take_column
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/take_column:dev' with Dockerfile
Exporting remove_comments (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/remove_comments
Exporting combine_columns (demo) =docker=> /home/runner/work/website/website/guide/_viash_project_template/target/docker/demo/combine_columns
Exporting take_column (demo) =nextflow=> /home/runner/work/website/website/guide/_viash_project_template/target/nextflow/demo/take_column
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo/combine_columns:dev' with Dockerfile
All 6 configs built successfully

Once everything is built, a new target directory has been created containing the executables and modules grouped per platform:

tree target
target
├── docker
│   └── demo
│       ├── combine_columns
│       │   └── combine_columns
│       ├── remove_comments
│       │   └── remove_comments
│       └── take_column
│           └── take_column
└── nextflow
    └── demo
        ├── combine_columns
        │   ├── main.nf
        │   └── nextflow.config
        ├── remove_comments
        │   ├── main.nf
        │   └── nextflow.config
        └── take_column
            ├── main.nf
            └── nextflow.config

10 directories, 9 files

Importing a VDSL3 module

After building a VDSL3 module from A VDSL3 module can be imported just like any other Nextflow module.

Example:

include { mymodule } from 'target/nextflow/mymodule/main.nf'

VDSL3 module interface

VDSL3 modules are actually workflows which take one channel and emit one channel. It expects the channel events to be tuples containing an ‘id’ and a ‘state’: [id, state], where id is a unique String and state is a Map[String, Object]. The resulting channel then consists of tuples [id, new_state].

Example:

workflow {
  Channel.fromList([
    ["myid", [input: file("in.txt")]]
  ])
    | mymodule
}
Note

If the input tuple has more than two elements, the elements after the second element are passed through to the output tuple. That is, an input tuple [id, input, ...] will result in a tuple [id, output, ...] after running the module. For example, an input tuple ["foo", [input: file("in.txt")], "bar"] will result in an output tuple ["foo", [output: file("out.txt")], "bar"].

Create a pipeline

Below is a first Nextflow pipeline which uses just one VDSL3 module and with hard-coded input parameters (file1 and file2).

nextflow.enable.dsl=2

include { remove_comments } from "./target/nextflow/demo/remove_comments/main.nf"

workflow {
  // Create a channel with two events
  // Each event contains a string (an identifier) and a file (input)
  Channel.fromList([
    ["file1", file("resources_test/file1.tsv")],
    ["file2", file("resources_test/file2.tsv")]
  ])

    // View channel contents
    | view { tup -> "Input: $tup" }
    
    // Process the input file using the 'remove_comments' module.
    // This removes comment lines from the input TSV.
    | remove_comments.run(
      directives: [
        publishDir: "output/"
      ]
    )

    // View channel contents
    | view { tup -> "Output: $tup" }
}

Customizing VDSL3 modules on the fly

Usually, Nextflow processes are quite static objects. For example, changing its directives can be quite tricky.

The un() function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the publishDir directive to "output/" so the output of that step in the pipeline will be stored as output.

See the reference documentation for a complete list of arguments of .run().

Run the pipeline

Now run the pipeline with Nextflow:

nextflow run . \
  -main-script main.nf
N E X T F L O W  ~  version 23.04.3
Launching `main.nf` [admiring_faggin] DSL2 - revision: 111508427e
[-        ] process > remove_comments:remove_comm... -
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]

executor >  local (2)
[fd/d87e76] process > remove_comments:remove_comm... [  0%] 0 of 2
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]

executor >  local (2)
[fd/d87e76] process > remove_comments:remove_comm... [100%] 2 of 2 ✔
Input: [file1, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]
Input: [file2, /home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]
Output: [file2, /home/runner/work/website/website/guide/_viash_project_template/work/2a/6131d1e960add2a44393353781a119/file2.remove_comments.output.tsv]
Output: [file1, /home/runner/work/website/website/guide/_viash_project_template/work/fd/d87e76abdf89aedc6e146ad317ae7e/file1.remove_comments.output.tsv]
tree output
output
├── combined.combine_columns.output
├── file1.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/fd/d87e76abdf89aedc6e146ad317ae7e/file1.remove_comments.output.tsv
└── file2.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/2a/6131d1e960add2a44393353781a119/file2.remove_comments.output.tsv

0 directories, 3 files
cat output/*
"1" 0.11
"2" 0.23
"3" 0.35
"4" 0.47
one 0.11    123
two 0.23    456
three   0.35    789
four    0.47    123
eins    0.111   234
zwei    0.222   234
drei    0.333   123
vier    0.444   123

Discussion

The above example pipeline serves as the backbone for creating more advanced pipelines. However, for the sake of simplicity it contained several hardcoded elements:

  • Input parameters
  • Output directory
  • VDSL3 module directory