Nextflow Platform

Run a Viash component as a Nextflow module.


An example of a NextFlow platform yaml can be found below, each part of which is explained in more depth in the following sections.

  - type: nextflow
    image: dataintuitive/viash:0.4.0
    tag: 0.4.0
    registry: url-to-registry
    organization: viash-io
    namespace_separator: "+"\
    executor: ""
    publish: true
    per_id: false
    path: raw_data
    label: highmem
    labels: [ highmem, highcpu ]
    separate_multiple_outputs: false
    stageInMode: copy
    directive_cpus: 4
    directive_max_forks: 4
    directive_time: ""
    directive_memory: ""
    directive_cache: ""

id [string]

Every platform can be given a specific id that can later be referred to explicitly when running or building the Viash component.

image [string]

If no image attributes are configured, Viash will use the auto-generated image name from the Docker platform:


It’s possible to specify the container image explicitly with which to run the module in different ways:

image: dataintuitive/viash:0.4.0

Exactly the same can be obtained with

image: dataintuitive/viash
tag: 0.4.0

Specifying the attribute(s) like this will use the container dataintuitive/viash:0.4.0 from Docker hub (registry).

If no tag is specified Viash will use functionality.version as the tag.

If no registry is specified, Viash (and NextFlow) will assume the image is available locally or on Docker Hub. In other words, the registry: ... attribute above is superfluous. No other registry is checked automatically due to a limitation from Docker itself.

tag [string]

Specify a Docker image based on its tag.


tag: 4.0

registry [string]

The URL to the a custom Docker registry.



organization [string]

Name of a container’s organization.


organization: viash-io

namespace_separator [string]

The default namespace separator is "_".


namespace_separator: "+"

publish [boolean]

NextFlow uses the autogenerated work dirs to manage process IO under the hood. In order effectively output something one can publish the results a module or step in the pipeline. In order to do this, add publish: true to the config:

  • publish is optional
  • Default value is false

This attribute simply defines if output of a component should be published yes or no. The output location has to be provided at pipeline launch by means of the option --publishDir ... or as params.publishDir in nextflow.config:

params.publishDir = "..."

per_id [boolean]

By default, a subdirectory is created corresponding to the unique ID that is passed in the triplet. Let us illustrate this with an example. The following code snippet uses the value of --input as an input of a workflow. The input can include a wildcard so that multiple samples can run in parallel. We use the parent directory name (.getParent().baseName) as an identifier for the sample. We pass this as the first entry of the triplet:

Channel.fromPath(params.input) \
    | map{ it -> [ it.getParent().baseName , it ] } \
    | map{ it -> [ it[0] , it[1], params ] }
    | ...

Say the resulting sample names are SAMPLE1 and SAMPLE2. The next step in the pipeline will be published (at least by default) under:


These per-ID subdirectories can be avoided by setting:

per_id: false

path [string]

When publish: true, this attribute defines where the output is written relative to the params.publishDir setting. For example, path: processed in combination with --output s3://some_bucket/ will store the output of this component under


This attribute gives control over the directory structure of the output. For example:

path: raw_data

Or even:

path: raw_data/bcl

Please note that per_id and path can be combined.

label [string] / labels [list of strings]

When running the module in a cluster context and depending on the cluster type, NextFlow allows for attaching labels to the process that can later be used as selectors for associating resources to this process.

In order to attach one label to a process/component, one can use the label: ... attribute, multiple labels can be added using labels: [ ..., ... ] and the two can even be mixed.

In the main nextflow.config, one can now use this label:

process {
  withLabel: bigmem {
     maxForks = 5

Examples: label: highmem labels: [ highmem, highcpu ]

separate_multiple_outputs [boolean]

Separates the outputs generated by a Nextflow component with multiple outputs as separate events on the channel. Default value: true.

separate_multiple_outputs: false

stageInMode [string]

By default NextFlow will create a symbolic link to the inputs for a process/module and run the tool at hand using those symbolic links. Some applications do not cope well with this strategy, in that case the files should effectively be copied rather than linked to. This can be achieved by using stageInMode: copy.

This attribute is optional, the default is symlink.

Example: stageInMode: copy

directive_cpus [integer]

directive_max_forks [integer]

directive_time [string]

directive_memory [string]

directive_cache [string]