Create and use a module

Creating a VDSL3 module is as simple as adding { type: nextflow } to the platforms section in the Viash config. Luckily, our previous example already contained such an entry:

functionality:
  name: example_bash
  description: A minimal example component.
  arguments:
    - type: file
      name: --input
      example: file.txt
      required: true
    - type: file
      name: --output
      direction: output
      example: output.txt
      required: true
  resources:
    - type: bash_script
      path: script.sh
platforms:
  - type: docker
    image: "bash:4.0"
  - type: native
  - type: nextflow
functionality:
  name: example_csharp
  description: A minimal example component.
  arguments:
    - type: file
      name: --input
      example: file.txt
      required: true
    - type: file
      name: --output
      direction: output
      example: output.txt
      required: true
  resources:
    - type: csharp_script
      path: script.csx
platforms:
  - type: docker
    image: "ghcr.io/data-intuitive/dotnet-script:1.3.1"
  - type: native
  - type: nextflow
functionality:
  name: example_js
  description: A minimal example component.
  arguments:
    - type: file
      name: --input
      example: file.txt
      required: true
    - type: file
      name: --output
      direction: output
      example: output.txt
      required: true
  resources:
    - type: javascript_script
      path: script.js
platforms:
  - type: docker
    image: "node:19-bullseye-slim"
  - type: native
  - type: nextflow
functionality:
  name: example_python
  description: A minimal example component.
  arguments:
    - type: file
      name: --input
      example: file.txt
      required: true
    - type: file
      name: --output
      direction: output
      example: output.txt
      required: true
  resources:
    - type: python_script
      path: script.py
platforms:
  - type: docker
    image: "python:3.10-slim"
  - type: native
  - type: nextflow
functionality:
  name: example_r
  description: A minimal example component.
  arguments:
    - type: file
      name: --input
      example: file.txt
      required: true
    - type: file
      name: --output
      direction: output
      example: output.txt
      required: true
  resources:
    - type: r_script
      path: script.R
platforms:
  - type: docker
    image: "eddelbuettel/r2u:22.04"
  - type: native
  - type: nextflow
functionality:
  name: example_scala
  description: A minimal example component.
  arguments:
    - type: file
      name: --input
      example: file.txt
      required: true
    - type: file
      name: --output
      direction: output
      example: output.txt
      required: true
  resources:
    - type: scala_script
      path: script.scala
platforms:
  - type: docker
    image: "sbtscala/scala-sbt:eclipse-temurin-19_36_1.7.2_2.13.10"
  - type: native
  - type: nextflow

Generating a VDSL3 module

We will now turn the Viash component into a VDSL3 module. By default, the viash build command will select the first platform in the list of platforms. To select the nextflow platform, use the --platform nextflow argument, or -p nextflow for short.

viash build config.vsh.yaml -o target -p nextflow

This will generate a Nextflow module in the target/ directory:

tree target
target
├── main.nf
└── nextflow.config

0 directories, 2 files

This main.nf file is both a standalone Nextflow pipeline and a module which can be imported as part of another pipeline.

Tip

In larger projects it’s recommended to use the viash ns build command to build all of the components in one go. Give it a try!

Running a module as a standalone pipeline

Unlike typical Nextflow modules, VDSL3 modules can actually be used as a standalone pipeline.

To run a VDSL3 module as a standalone pipeline, you need to specify the input parameters and a --publish_dir parameter, as Nextflow will automatically choose the parameter names of the output files.

You can run the executable by providing a value for --input and --publish_dir:

nextflow run target/main.nf --input config.vsh.yaml --publish_dir output/
N E X T F L O W  ~  version 23.10.1
Launching `target/main.nf` [insane_noether] DSL2 - revision: 4928277b92
[-        ] process > example_bash:processWf:exam... [  0%] 0 of 1
[-        ] process > example_bash:publishStatesS... -

executor >  local (1)
[bf/7fdc16] process > example_bash:processWf:exam... [100%] 1 of 1 ✔
[-        ] process > example_bash:publishStatesS... -

executor >  local (2)
[bf/7fdc16] process > example_bash:processWf:exam... [100%] 1 of 1 ✔
[cb/2c71f8] process > example_bash:publishStatesS... [100%] 1 of 1 ✔

This results in the following output:

tree output
output
├── run.example_bash.output.txt
└── run.example_bash.state.yaml

0 directories, 2 files

The pipeline help can be shown by passing the --help parameter (Output not shown).

nextflow run target/main.nf --help

Passing a parameter list

Every VDSL3 can accept a list of parameters to populate a Nextflow channel with.

For example, we create a set of input files which we want to process in parallel.

touch sample1.txt sample2.txt sample3.txt sample4.txt

Next, we create a YAML file param_list.yaml containing an id and an input value for each parameter entry.

- id: sample1
  input: /tmp/RtmpRD6U2c/create-a-moduleb9605a3ce9f6/bash/sample1.txt
- id: sample2
  input: /tmp/RtmpRD6U2c/create-a-moduleb9605a3ce9f6/bash/sample2.txt
- id: sample3
  input: /tmp/RtmpRD6U2c/create-a-moduleb9605a3ce9f6/bash/sample3.txt
- id: sample4
  input: /tmp/RtmpRD6U2c/create-a-moduleb9605a3ce9f6/bash/sample4.txt

You can run the pipeline on the list of parameters using the --param_list parameter.

nextflow run target/main.nf --param_list param_list.yaml --publish_dir output2
N E X T F L O W  ~  version 23.10.1
Launching `target/main.nf` [stoic_raman] DSL2 - revision: 4928277b92
[-        ] process > example_bash:processWf:exam... -
[-        ] process > example_bash:publishStatesS... -

executor >  local (4)
[b6/c244cc] process > example_bash:processWf:exam... [  0%] 0 of 4
[-        ] process > example_bash:publishStatesS... -

executor >  local (7)
[b6/c244cc] process > example_bash:processWf:exam... [ 75%] 3 of 4
[c9/e485ba] process > example_bash:publishStatesS... [  0%] 0 of 3

executor >  local (8)
[d7/496001] process > example_bash:processWf:exam... [100%] 4 of 4 ✔
[39/3d9b19] process > example_bash:publishStatesS... [100%] 4 of 4 ✔

This results in the following outputs:

tree output2
output2
├── sample1.example_bash.output.txt
├── sample1.example_bash.state.yaml
├── sample2.example_bash.output.txt
├── sample2.example_bash.state.yaml
├── sample3.example_bash.output.txt
├── sample3.example_bash.state.yaml
├── sample4.example_bash.output.txt
└── sample4.example_bash.state.yaml

0 directories, 8 files
Tip

Instead of a YAML, you can also pass a JSON or a CSV to the --param_list parameter.

Module as part of a pipeline

This module can also be used as part of a Nextflow pipeline. Below is a short preview of what this looks like.

include { mymodule1 } from 'target/nextflow/mymodule1/main.nf'
include { mymodule2 } from 'target/nextflow/mymodule2/main.nf'

workflow {
  Channel.fromList([
    [
      // a unique identifier for this tuple
      "myid", 
      // the state for this tuple
      [
        input: file("in.txt"),
        module1_k: 10,
        module2_k: 4
      ]
    ]
  ])
    | mymodule1.run(
      // use a hashmap to define which part of the state is used to run mymodule1
      fromState: [
        input: "input",
        k: "module1_k"
      ],
      // use a hashmap to define how the output of mymodule1 is stored back into the state
      toState: [
        module1_output: "output"
      ]
    )
    | mymodule2.run(
      // use a closure to define which data is used to run mymodule2
      fromState: { id, state -> 
        [
          input: state.module1_output,
          k: state.module2_k
        ]
      },
      // use a closure to return only the output of module2 as a new state
      toState: { id, output, state ->
        output
      },
      auto: [
        publish: true
      ]
    )
}

We will discuss building pipelines with VDSL3 modules in more detail in Create a pipeline.