Create a pipeline
This guide explains how to create an example pipeline that’s closer to a typical use-case of a Nextflow bioinformatics pipeline.
Please review the VDSL3 principles section for the necessary background.
Get the template project
To get started with building a pipeline, we provide a template project which already contains a few components. First create a new repository by clicking the “Use this template” button in the viash_project_template repository or clicking the button below.
Then clone the repository using the following command.
git clone https://github.com/youruser/my_first_pipeline.git
The pipeline contains three components and uses two utility components from vsh_utils
with which we will build the following pipeline:
vsh_flatten
is a component to transform a Channel event containing multiple files (in this case using a glob?
) into multiple Channel events each containing one file to operate on. It is a Viash-compatible version of the Nextflowflatten
operator.remove_comments
is a Bash script which removes all lines starting with a#
from a file.take_column
is a Python script which extracts one of the columns in a TSV file.vsh_toList
is a component/module that does the oposite asvsh_flatten
: turn multiple Channel items into one Channel item containing a list.combine_columns
is an R script which combines multiple files into a TSV.
Build the VDSL3 modules and workflow
First, we need to build the components into VDSL3 modules. Since Viash version 0.8.x this includes the workflows and subworkflows themselves as well since they are (or better /can/ be) stored under src
and built to target/
.
viash ns build --setup cachedbuild --parallel
Exporting combine_columns (template) =executable=> target/executable/template/combine_columns
Exporting remove_comments (template) =nextflow=> target/nextflow/template/remove_comments
Exporting remove_comments (template) =executable=> target/executable/template/remove_comments
Exporting take_column (template) =executable=> target/executable/template/take_column
[notice] Building container 'ghcr.io/viash-io/project_template/template/combine_columns:0.3.0' with Dockerfile
[notice] Building container 'ghcr.io/viash-io/project_template/template/remove_comments:0.3.0' with Dockerfile
[notice] Building container 'ghcr.io/viash-io/project_template/template/take_column:0.3.0' with Dockerfile
Exporting workflow (template) =nextflow=> target/nextflow/template/workflow
Exporting take_column (template) =nextflow=> target/nextflow/template/take_column
Exporting combine_columns (template) =nextflow=> target/nextflow/template/combine_columns
All 7 configs built successfully
For more information about the --setup
and --parallel
arguments, please refer to the reference section.
The output of viash ns build
tells us that
- two dependencies are fetched (from Viash Hub)
- the locally defined components are built into Nextflow modules
- the locally defined worfklow
template/workflow
is built (see later) - containers are built for the local modules
Once viash ns build
is finished, a new target directory has been created containing the executables and modules grouped per platform:
tree target
target
├── executable
│ └── template
│ ├── combine_columns
│ │ └── combine_columns
│ ├── remove_comments
│ │ └── remove_comments
│ └── take_column
│ └── take_column
└── nextflow
└── template
├── combine_columns
│ ├── main.nf
│ └── nextflow.config
├── remove_comments
│ ├── main.nf
│ └── nextflow.config
├── take_column
│ ├── main.nf
│ └── nextflow.config
└── workflow
├── main.nf
└── nextflow.config
11 directories, 11 files
Import a VDSL3 module
Viash version 0.8 and beyond
This functionality is available since Viash version 0.8.x and assumes the workflow code is encoded as a Viash component with a corresponding config.vsh.yaml
config file.
In order to use a module or subworkflow one simply has to add the module (either local or remote) to the dependencies
slot in the Viash config file, for example:
functionality:
dependencies:
- name: template/combine_columns
repository: local
repositories:
- name: local
type: local
After that, the module will be include
d automatically during the Viash build stage. For more information, please refer to the reference.
All Viash versions
As illustrated by the tree
output above, a module can be included by pointing to its location. This approach can be used for any Nextflow module (that exposes a compatible API):
{ remove_comments } from "./target/nextflow/template/remove_comments/main.nf" include
Create a pipeline
All Viash versions
We can use a module in a conventional Nextflow pipeline which takes two input files (file1
and file2
) and removes the lines that contain comments (lines starting with #
) from those files:
{ remove_comments } from "./target/nextflow/template/remove_comments/main.nf"
include
{
workflow
// Create a channel with two events
// Each event contains a string (an identifier) and a file (input)
Channel.fromList([
["file1", [ input: file("resources_test/file1.tsv") ] ],
["file2", [ input: file("resources_test/file2.tsv") ] ]
])
// View channel contents
| view { tup -> "Input: $tup" }
// Process the input file using the 'remove_comments' module.
// This removes comment lines from the input TSV.
| remove_comments.run(
: [
directives: "output/"
publishDir]
)
// View channel contents
| view { tup -> "Output: $tup" }
}
In plain English, the workflow works as follows:
- Create a Channel with 2 items, corresponding to 2 input files.
- Specify the respective input files as corresponding to the
--input
argument:[ input: ... ]
. - Add a
view
operation for introspection of the Channel - Run the
remove_comments
step and publish the results tooutput/
. No additionalfromState
andtoState
arguments are specified because the defaults suffice. - One more
view
to show the resulting processed Channel items.
We point the reader to the VDSL3 principles section for more information about how data flow (aka state) is management in a VDSL3 workflow.
Pipeline as a component
The run()
function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the publishDir
directive to "output/"
so the output of that step in the pipeline will be stored as output.
This functionality is available since Viash version 0.8.x.
We can do the same but this time encoding the pipeline as a Viash compoment itself:
{
workflow run_wf :
take
input_ch
:
main
=
output_ch
// Create a channel with two events
// Each event contains a string (an identifier) and a file (input)
Channel.fromList([
["file1", [ input: file("resources_test/file1.tsv") ] ],
["file2", [ input: file("resources_test/file2.tsv") ] ]
])
// View channel contents
| view { tup -> "Input: $tup" }
// Process the input file using the 'remove_comments' module.
// This removes comment lines from the input TSV.
| remove_comments
// View channel contents
| view { tup -> "Output: $tup" }
:
emit
output_ch| map{ id, state -> [ "run", state ] }
}
Together with a config file like this one:
functionality:
name: test
namespace: template
description: |
An example pipeline and project template.
arguments:
- name: "--output"
alternatives: [ "-o" ]
type: file
direction: output
required: true
description: Output TSV file
example: output.tsv
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
dependencies:
- name: template/remove_comments
repository: local
repositories:
- name: local
type: local
platforms:
- type: nextflow
Run the pipeline
Now run the pipeline with Nextflow:
nextflow run . \
-main-script main.nf
N E X T F L O W ~ version 24.04.4
Launching `main.nf` [drunk_perlman] DSL2 - revision: dc137fbfcf
[- ] rem…Wf:remove_comments_process -
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
[- ] rem…Wf:remove_comments_process -
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
executor > local (2)
[e5/d2ad78] rem…e_comments_process (file2) | 0 of 2
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
Output: [file2, [output:/home/runner/work/website/website/guide/_viash_project_template/work/e5/d2ad78c861f3e5a8f15020faadacc8/file2.remove_comments.output.tsv]]
Output: [file1, [output:/home/runner/work/website/website/guide/_viash_project_template/work/db/8c1b9c3557abbcaa9ee9533d46a013/file1.remove_comments.output.tsv]]
executor > local (2)
[e5/d2ad78] rem…e_comments_process (file2) | 2 of 2 ✔
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
Output: [file2, [output:/home/runner/work/website/website/guide/_viash_project_template/work/e5/d2ad78c861f3e5a8f15020faadacc8/file2.remove_comments.output.tsv]]
Output: [file1, [output:/home/runner/work/website/website/guide/_viash_project_template/work/db/8c1b9c3557abbcaa9ee9533d46a013/file1.remove_comments.output.tsv]]
On the example data:
cat resources_test/file?.tsv
# this is a header
# this is also a header
one 0.11 123
two 0.23 456
three 0.35 789
four 0.47 123
# this is not a header
# just kidding yes it is
eins 0.111 234
zwei 0.222 234
drei 0.333 123
vier 0.444 123
This results in the following output:
tree output
output
├── file1.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/db/8c1b9c3557abbcaa9ee9533d46a013/file1.remove_comments.output.tsv
└── file2.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/e5/d2ad78c861f3e5a8f15020faadacc8/file2.remove_comments.output.tsv
0 directories, 2 files
cat output/*
one 0.11 123
two 0.23 456
three 0.35 789
four 0.47 123
eins 0.111 234
zwei 0.222 234
drei 0.333 123
vier 0.444 123
Discussion
The above example pipeline serves as the backbone for creating real-life pipelines. However, for the sake of simplicity it contained several hardcoded elements that should be avoided:
- Input parameters should be provided as an argument to the pipeline or as part of the pipeline configuration
- The output directory should be specified as an argument to the pipeline
As illustrated earlier these come for free when encoding the workflow as a Viash component. One even gets parameter checks with it!