graph TD A(file?.tsv) --> X[vsh_flatten] X --file1.tsv--> B1[/remove_comments/] --> C1[/take_column/] --> Y X --file2.tsv--> B2[/remove_comments/] --> C2[/take_column/] --> Y Y[vsh_toList] --> D[/combine_columns/] D --> E(output)
Create a pipeline
This guide explains how to create an example pipeline that’s closer to a typical use-case of a Nextflow bioinformatics pipeline.
Please review the VDSL3 principles section for the necessary background.
Get the template project
To get started with building a pipeline, we provide a template project which already contains a few components. First create a new repository by clicking the “Use this template” button in the viash_project_template repository or clicking the button below.
Then clone the repository using the following command.
git clone https://github.com/youruser/my_first_pipeline.gitThe pipeline contains three components and uses two utility components from vsh_utils with which we will build the following pipeline:
vsh_flattenis a component to transform a Channel event containing multiple files (in this case using a glob?) into multiple Channel events each containing one file to operate on. It is a Viash-compatible version of the Nextflowflattenoperator.remove_commentsis a Bash script which removes all lines starting with a#from a file.take_columnis a Python script which extracts one of the columns in a TSV file.vsh_toListis a component/module that does the oposite asvsh_flatten: turn multiple Channel items into one Channel item containing a list.combine_columnsis an R script which combines multiple files into a TSV.
Build the VDSL3 modules and workflow
First, we need to build the components into VDSL3 modules. Since Viash version 0.8.x this includes the workflows and subworkflows themselves as well since they are (or better /can/ be) stored under src and built to target/.
viash ns build --setup cachedbuild --parallelExporting workflow (template) =nextflow=> target/nextflow/template/workflow
Exporting combine_columns (template) =executable=> target/executable/template/combine_columns
Exporting take_column (template) =executable=> target/executable/template/take_column
Exporting remove_comments (template) =executable=> target/executable/template/remove_comments
[notice] Building container 'ghcr.io/viash-io/project_template/template/take_column:0.3.1' with Dockerfile
[notice] Building container 'ghcr.io/viash-io/project_template/template/combine_columns:0.3.1' with Dockerfile
[notice] Building container 'ghcr.io/viash-io/project_template/template/remove_comments:0.3.1' with Dockerfile
Exporting take_column (template) =nextflow=> target/nextflow/template/take_column
Exporting combine_columns (template) =nextflow=> target/nextflow/template/combine_columns
Exporting remove_comments (template) =nextflow=> target/nextflow/template/remove_comments
All 7 configs built successfully
For more information about the --setup and --parallel arguments, please refer to the reference section.
The output of viash ns build tells us that
- two dependencies are fetched (from Viash Hub)
- the locally defined components are built into Nextflow modules
- the locally defined worfklow
template/workflowis built (see later) - containers are built for the local modules
Once viash ns build is finished, a new target directory has been created containing the executables and modules grouped per platform:
tree targettarget
├── executable
│ └── template
│ ├── combine_columns
│ │ └── combine_columns
│ ├── remove_comments
│ │ └── remove_comments
│ └── take_column
│ └── take_column
└── nextflow
└── template
├── combine_columns
│ ├── main.nf
│ └── nextflow.config
├── remove_comments
│ ├── main.nf
│ └── nextflow.config
├── take_column
│ ├── main.nf
│ └── nextflow.config
└── workflow
├── main.nf
└── nextflow.config
12 directories, 11 files
Import a VDSL3 module
Viash version 0.8 and beyond
This functionality is available since Viash version 0.8.x and assumes the workflow code is encoded as a Viash component with a corresponding config.vsh.yaml config file.
In order to use a module or subworkflow one simply has to add the module (either local or remote) to the dependencies slot in the Viash config file, for example:
functionality:
dependencies:
- name: template/combine_columns
repository: local
repositories:
- name: local
type: localAfter that, the module will be included automatically during the Viash build stage. For more information, please refer to the reference.
All Viash versions
As illustrated by the tree output above, a module can be included by pointing to its location. This approach can be used for any Nextflow module (that exposes a compatible API):
include { remove_comments } from "./target/nextflow/template/remove_comments/main.nf"Create a pipeline
All Viash versions
We can use a module in a conventional Nextflow pipeline which takes two input files (file1 and file2) and removes the lines that contain comments (lines starting with #) from those files:
include { remove_comments } from "./target/nextflow/template/remove_comments/main.nf"
workflow {
// Create a channel with two events
// Each event contains a string (an identifier) and a file (input)
Channel.fromList([
["file1", [ input: file("resources_test/file1.tsv") ] ],
["file2", [ input: file("resources_test/file2.tsv") ] ]
])
// View channel contents
| view { tup -> "Input: $tup" }
// Process the input file using the 'remove_comments' module.
// This removes comment lines from the input TSV.
| remove_comments.run(
directives: [
publishDir: "output/"
]
)
// View channel contents
| view { tup -> "Output: $tup" }
}In plain English, the workflow works as follows:
- Create a Channel with 2 items, corresponding to 2 input files.
- Specify the respective input files as corresponding to the
--inputargument:[ input: ... ]. - Add a
viewoperation for introspection of the Channel - Run the
remove_commentsstep and publish the results tooutput/. No additionalfromStateandtoStatearguments are specified because the defaults suffice. - One more
viewto show the resulting processed Channel items.
We point the reader to the VDSL3 principles section for more information about how data flow (aka state) is management in a VDSL3 workflow.
Pipeline as a component
The run() function is a unique feature for every VDSL3 module which allows dynamically altering the behaviour of a module from within the pipeline. For example, we use it to set the publishDir directive to "output/" so the output of that step in the pipeline will be stored as output.
This functionality is available since Viash version 0.8.x.
We can do the same but this time encoding the pipeline as a Viash compoment itself:
workflow run_wf {
take:
input_ch
main:
output_ch =
// Create a channel with two events
// Each event contains a string (an identifier) and a file (input)
Channel.fromList([
["file1", [ input: file("resources_test/file1.tsv") ] ],
["file2", [ input: file("resources_test/file2.tsv") ] ]
])
// View channel contents
| view { tup -> "Input: $tup" }
// Process the input file using the 'remove_comments' module.
// This removes comment lines from the input TSV.
| remove_comments
// View channel contents
| view { tup -> "Output: $tup" }
emit:
output_ch
| map{ id, state -> [ "run", state ] }
}Together with a config file like this one:
functionality:
name: test
namespace: template
description: |
An example pipeline and project template.
arguments:
- name: "--output"
alternatives: [ "-o" ]
type: file
direction: output
required: true
description: Output TSV file
example: output.tsv
resources:
- type: nextflow_script
path: main.nf
entrypoint: run_wf
dependencies:
- name: template/remove_comments
repository: local
repositories:
- name: local
type: local
platforms:
- type: nextflowRun the pipeline
Now run the pipeline with Nextflow:
nextflow run . \
-main-script main.nf
N E X T F L O W ~ version 25.04.8
Launching `main.nf` [sleepy_gauss] DSL2 - revision: dc137fbfcf
[- ] rem…Wf:remove_comments_process -
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
[- ] rem…Wf:remove_comments_process | 0 of 2
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
executor > local (2)
[f6/8958f6] rem…e_comments_process (file2) | 0 of 2
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
executor > local (2)
[f6/8958f6] rem…e_comments_process (file2) | 2 of 2 ✔
Input: [file1, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file1.tsv]]
Input: [file2, [input:/home/runner/work/website/website/guide/_viash_project_template/resources_test/file2.tsv]]
Output: [file1, [output:/home/runner/work/website/website/guide/_viash_project_template/work/2d/9973cb056922b6b100fc312f26e1f3/file1.remove_comments.output.tsv]]
Output: [file2, [output:/home/runner/work/website/website/guide/_viash_project_template/work/f6/8958f67b0d1515753657583b5fdb6e/file2.remove_comments.output.tsv]]
On the example data:
cat resources_test/file?.tsv# this is a header
# this is also a header
one 0.11 123
two 0.23 456
three 0.35 789
four 0.47 123
# this is not a header
# just kidding yes it is
eins 0.111 234
zwei 0.222 234
drei 0.333 123
vier 0.444 123
This results in the following output:
tree outputoutput
├── combined.workflow.output.tsv
├── combined.workflow.state.yaml
├── file1.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/2d/9973cb056922b6b100fc312f26e1f3/file1.remove_comments.output.tsv
└── file2.remove_comments.output.tsv -> /home/runner/work/website/website/guide/_viash_project_template/work/f6/8958f67b0d1515753657583b5fdb6e/file2.remove_comments.output.tsv
1 directory, 4 files
cat output/*0.11
0.23
0.35
0.47
id: combined
output: !file 'combined.workflow.output.tsv'
one 0.11 123
two 0.23 456
three 0.35 789
four 0.47 123
eins 0.111 234
zwei 0.222 234
drei 0.333 123
vier 0.444 123
Discussion
The above example pipeline serves as the backbone for creating real-life pipelines. However, for the sake of simplicity it contained several hardcoded elements that should be avoided:
- Input parameters should be provided as an argument to the pipeline or as part of the pipeline configuration
- The output directory should be specified as an argument to the pipeline
As illustrated earlier these come for free when encoding the workflow as a Viash component. One even gets parameter checks with it!