Nextflow Platform
Run a Viash component as a Nextflow module.
On this page
- Example
- id [string]
- image [string]
- tag [string]
- registry [string]
- organization [string]
- namespace_separator [string]
- publish [boolean]
- per_id [boolean]
- path [string]
- label [string] / labels [list of strings]
- separate_multiple_outputs [boolean]
- stageInMode [string]
- directive_cpus [integer]
- directive_max_forks [integer]
- directive_time [string]
- directive_memory [string]
- directive_cache [string]
Example
An example of a NextFlow platform yaml can be found below, each part of which is explained in more depth in the following sections.
...
platforms:
- type: nextflow
image: dataintuitive/viash:0.4.0
tag: 0.4.0
registry: url-to-registry
organization: viash-io
namespace_separator: "+"\
executor: ""
publish: true
per_id: false
path: raw_data
label: highmem
labels: [ highmem, highcpu ]
separate_multiple_outputs: false
stageInMode: copy
directive_cpus: 4
directive_max_forks: 4
directive_time: ""
directive_memory: ""
directive_cache: ""
id [string]
Every platform can be given a specific id
that can later be referred
to explicitly when running or building the Viash component.
image [string]
If no image attributes are configured, Viash will use the auto-generated image name from the Docker platform:
[<namespace>/]<name>:<version>
It’s possible to specify the container image explicitly with which to run the module in different ways:
image: dataintuitive/viash:0.4.0
Exactly the same can be obtained with
image: dataintuitive/viash
registry: index.docker.io/v1/
tag: 0.4.0
Specifying the attribute(s) like this will use the container
dataintuitive/viash:0.4.0
from Docker hub (registry).
If no tag is specified Viash will use functionality.version
as the
tag.
If no registry is specified, Viash (and NextFlow) will assume the image
is available locally or on Docker Hub. In other words, the registry: ...
attribute above is superfluous. No other registry is checked
automatically due to a limitation from Docker itself.
tag [string]
Specify a Docker image based on its tag.
Example:
tag: 4.0
registry [string]
The URL to the a custom Docker registry.
Example:
registry: https://my-docker-registry.org
organization [string]
Name of a container’s organization.
Example:
organization: viash-io
namespace_separator [string]
The default namespace separator is "_".
Example:
namespace_separator: "+"
publish [boolean]
NextFlow uses the autogenerated work
dirs to manage process IO under
the hood. In order effectively output something one can publish the
results a module or step in the pipeline. In order to do this, add
publish: true
to the config:
publish
is optional- Default value is
false
This attribute simply defines if output of a component should be
published yes or no. The output location has to be provided at pipeline
launch by means of the option --publishDir ...
or as
params.publishDir
in nextflow.config
:
params.publishDir = "..."
per_id [boolean]
By default, a subdirectory is created corresponding to the unique ID
that is passed in the triplet. Let us illustrate this with an example.
The following code snippet uses the value of --input
as an input of a
workflow. The input can include a wildcard so that multiple samples can
run in parallel. We use the parent directory name
(.getParent().baseName
) as an identifier for the sample. We pass this
as the first entry of the triplet:
Channel.fromPath(params.input) \
| map{ it -> [ it.getParent().baseName , it ] } \
| map{ it -> [ it[0] , it[1], params ] }
| ...
Say the resulting sample names are SAMPLE1
and SAMPLE2
. The next
step in the pipeline will be published (at least by default) under:
<publishDir>/SAMPLE1/
<publishDir>/SAMPLE2/
These per-ID subdirectories can be avoided by setting:
per_id: false
path [string]
When publish: true
, this attribute defines where the output is written
relative to the params.publishDir
setting. For example, path: processed
in combination with --output s3://some_bucket/
will store
the output of this component under
s3://some_bucket/processed/
This attribute gives control over the directory structure of the output. For example:
path: raw_data
Or even:
path: raw_data/bcl
Please note that per_id
and path
can be combined.
label [string] / labels [list of strings]
When running the module in a cluster context and depending on the cluster type, NextFlow allows for attaching labels to the process that can later be used as selectors for associating resources to this process.
In order to attach one label to a process/component, one can use the
label: ...
attribute, multiple labels can be added using labels: [ ..., ... ]
and the two can even be mixed.
In the main nextflow.config
, one can now use this label:
process {
...
withLabel: bigmem {
maxForks = 5
...
}
}
Examples: label: highmem
labels: [ highmem, highcpu ]
separate_multiple_outputs [boolean]
Separates the outputs generated by a Nextflow component with multiple
outputs as separate events on the channel. Default value: true
.
Example:separate_multiple_outputs: false
stageInMode [string]
By default NextFlow will create a symbolic link to the inputs for a
process/module and run the tool at hand using those symbolic links. Some
applications do not cope well with this strategy, in that case the files
should effectively be copied rather than linked to. This can be achieved
by using stageInMode: copy
.
This attribute is optional, the default is symlink
.
Example: stageInMode: copy