viash build config.vsh.yaml -o target --runner nextflow
Create and use a module
Creating a VDSL3 module is as simple as adding { type: nextflow }
to the runners
section in the Viash config. Luckily, our previous example already contained such an entry:
name: example_bash
description: A minimal example component.
arguments:
- type: file
name: --input
example: file.txt
required: true
- type: file
name: --output
direction: output
example: output.txt
required: true
resources:
- type: bash_script
path: script.sh
engines:
- type: docker
image: "bash:4.0"
- type: native
runners:
- type: executable
- type: nextflow
name: example_csharp
description: A minimal example component.
arguments:
- type: file
name: --input
example: file.txt
required: true
- type: file
name: --output
direction: output
example: output.txt
required: true
resources:
- type: csharp_script
path: script.csx
engines:
- type: docker
image: "ghcr.io/data-intuitive/dotnet-script:1.3.1"
- type: native
runners:
- type: executable
- type: nextflow
name: example_js
description: A minimal example component.
arguments:
- type: file
name: --input
example: file.txt
required: true
- type: file
name: --output
direction: output
example: output.txt
required: true
resources:
- type: javascript_script
path: script.js
engines:
- type: docker
image: "node:19-bullseye-slim"
- type: native
runners:
- type: executable
- type: nextflow
name: example_python
description: A minimal example component.
arguments:
- type: file
name: --input
example: file.txt
required: true
- type: file
name: --output
direction: output
example: output.txt
required: true
resources:
- type: python_script
path: script.py
engines:
- type: docker
image: "python:3.10-slim"
- type: native
runners:
- type: executable
- type: nextflow
name: example_r
description: A minimal example component.
arguments:
- type: file
name: --input
example: file.txt
required: true
- type: file
name: --output
direction: output
example: output.txt
required: true
resources:
- type: r_script
path: script.R
engines:
- type: docker
image: "eddelbuettel/r2u:22.04"
- type: native
runners:
- type: executable
- type: nextflow
name: example_scala
description: A minimal example component.
arguments:
- type: file
name: --input
example: file.txt
required: true
- type: file
name: --output
direction: output
example: output.txt
required: true
resources:
- type: scala_script
path: script.scala
engines:
- type: docker
image: "sbtscala/scala-sbt:eclipse-temurin-19_36_1.7.2_2.13.10"
- type: native
runners:
- type: executable
- type: nextflow
Generating a VDSL3 module
We will now turn the Viash component into a VDSL3 module. By default, the viash build
command will select the first runner (executable) in the list of runners. To select the nextflow
runner, use the --runner nextflow
argument, or -r nextflow
for short.
This will generate a Nextflow module in the target/
directory:
tree target
target
├── main.nf
└── nextflow.config
1 directory, 2 files
This main.nf
file is both a standalone Nextflow pipeline and a module which can be imported as part of another pipeline.
In larger projects it’s recommended to use the viash ns build
command to build all of the components in one go. Give it a try!
Running a module as a standalone pipeline
Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline.
To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a --publish_dir
parameter, as Nextflow will automatically choose the parameter names of the output files.
You can run the executable by providing a value for --input
and --publish_dir
:
nextflow run target/main.nf --input config.vsh.yaml --publish_dir output/
N E X T F L O W ~ version 24.10.3
Launching `target/main.nf` [hopeful_hugle] DSL2 - revision: 06e3f8616d
[- ] exa…essWf:example_bash_process -
[- ] exa…sSimpleWf:publishFilesProc -
executor > local (1)
[58/5018f0] exa…example_bash_process (run) | 0 of 1
[- ] exa…sSimpleWf:publishFilesProc -
[- ] exa…SimpleWf:publishStatesProc -
executor > local (2)
[58/5018f0] exa…example_bash_process (run) | 1 of 1 ✔
[23/cfd88e] exa…eWf:publishFilesProc (run) | 0 of 1
[- ] exa…SimpleWf:publishStatesProc -
executor > local (3)
[58/5018f0] exa…example_bash_process (run) | 1 of 1 ✔
[23/cfd88e] exa…eWf:publishFilesProc (run) | 1 of 1 ✔
[a5/0a51ad] exa…Wf:publishStatesProc (run) | 1 of 1 ✔
This results in the following output:
tree output
output
├── run.example_bash.output.txt
└── run.example_bash.state.yaml
1 directory, 2 files
The pipeline help can be shown by passing the --help
parameter (Output not shown).
nextflow run target/main.nf --help
Passing a parameter list
Every VDSL3 can accept a list of parameters to populate a Nextflow channel with.
For example, we create a set of input files which we want to process in parallel.
touch sample1.txt sample2.txt sample3.txt sample4.txt
Next, we create a YAML file param_list.yaml
containing an id
and an input
value for each parameter entry.
- id: sample1
input: /tmp/RtmpJLlJtE/create-a-module37702a970635/bash/sample1.txt
- id: sample2
input: /tmp/RtmpJLlJtE/create-a-module37702a970635/bash/sample2.txt
- id: sample3
input: /tmp/RtmpJLlJtE/create-a-module37702a970635/bash/sample3.txt
- id: sample4
input: /tmp/RtmpJLlJtE/create-a-module37702a970635/bash/sample4.txt
You can run the pipeline on the list of parameters using the --param_list
parameter.
nextflow run target/main.nf --param_list param_list.yaml --publish_dir output2
N E X T F L O W ~ version 24.10.3
Launching `target/main.nf` [fabulous_mendel] DSL2 - revision: 06e3f8616d
[- ] exa…essWf:example_bash_process -
[- ] exa…essWf:example_bash_process -
[- ] exa…sSimpleWf:publishFilesProc -
[- ] exa…SimpleWf:publishStatesProc -
executor > local (4)
[33/b7ed57] exa…ple_bash_process (sample1) | 0 of 4
[- ] exa…sSimpleWf:publishFilesProc -
[- ] exa…SimpleWf:publishStatesProc -
executor > local (6)
[33/b7ed57] exa…ple_bash_process (sample1) | 4 of 4 ✔
[ad/04cd2c] exa…publishFilesProc (sample1) | 0 of 4
[- ] exa…SimpleWf:publishStatesProc -
executor > local (12)
[33/b7ed57] exa…ple_bash_process (sample1) | 4 of 4 ✔
[44/2578ae] exa…publishFilesProc (sample4) | 4 of 4 ✔
[de/0f3a55] exa…ublishStatesProc (sample4) | 4 of 4 ✔
executor > local (12)
[33/b7ed57] exa…ple_bash_process (sample1) | 4 of 4 ✔
[44/2578ae] exa…publishFilesProc (sample4) | 4 of 4 ✔
[de/0f3a55] exa…ublishStatesProc (sample4) | 4 of 4 ✔
This results in the following outputs:
tree output2
output2
├── sample1.example_bash.output.txt
├── sample1.example_bash.state.yaml
├── sample2.example_bash.output.txt
├── sample2.example_bash.state.yaml
├── sample3.example_bash.output.txt
├── sample3.example_bash.state.yaml
├── sample4.example_bash.output.txt
└── sample4.example_bash.state.yaml
1 directory, 8 files
Instead of a YAML, you can also pass a JSON or a CSV to the --param_list
parameter.
Module as part of a pipeline
This module can also be used as part of a Nextflow pipeline. Below is a short preview of what this looks like.
{ mymodule1 } from 'target/nextflow/mymodule1/main.nf'
include { mymodule2 } from 'target/nextflow/mymodule2/main.nf'
include
{
workflow Channel.fromList([
[
// a unique identifier for this tuple
"myid",
// the state for this tuple
[
: file("in.txt"),
input: 10,
module1_k: 4
module2_k]
]
])
| mymodule1.run(
// use a hashmap to define which part of the state is used to run mymodule1
: [
fromState: "input",
input: "module1_k"
k],
// use a hashmap to define how the output of mymodule1 is stored back into the state
: [
toState: "output"
module1_output]
)
| mymodule2.run(
// use a closure to define which data is used to run mymodule2
: { id, state ->
fromState[
: state.module1_output,
input: state.module2_k
k]
},
// use a closure to return only the output of module2 as a new state
: { id, output, state ->
toState
output},
: [
auto: true
publish]
)
}
We will discuss building pipelines with VDSL3 modules in more detail in Create a pipeline.