Quickstart

This tutorial will guide you through using our Viash template project to run a data pipeline.

Requirements

This guide assumes you’ve already installed Viash, Docker and Nextflow.

What is Viash?

Viash is a script code wrapper for building modular software components that serve as building blocks to develop (Nextflow) data pipelines. All you need is your script and a metadata file to get started.
Here are a few of Viash’s key features:

  • You can use your preferred scripting language per component, and mix and match scripts between multiple components as you please. Supported languages include: Bash, Python, R, Scala, JS and C#.
  • A custom Docker container is automatically generated based on your dependencies described in your metadata. No expert Docker knowledge is required.
  • Viash generates a Nextflow module from your script. No expert Nextflow knowledge is required.
  • You can simply script the nextflow modules to create and run your scalable and reproducible data pipeline.
  • You can test every single module on your local workstation through the built-in development kit.

Quickstart example project

This Quickstart will take you from nothing to a scalable and reproducible Nextflow data pipeline. Here’s the flow of the pipeline you’ll be using:

graph LR
   A(file?.tsv) --> B[/remove_comments/]
   B --> C[/take_column/]
   C --> D[/combine_columns/]
   D --> E(output)

One or more TSV files are taken as the input and will be processed through a series of modules. At the end, the output is written away to a folder.

Step 1: Get the template

To get up and running fast, we provide a template project for you to use.

First create a new repository by clicking the “Use this template” button in the viash_project_template repository or clicking the button below.

Use project template

Then clone the repository using the following command.

git clone https://github.com/youruser/my_first_pipeline

Click the button below to download a zip file containing the template project. Once downloaded, unzip the file and rename the root directory to my_first_pipeline.

Download template

The template repo contains the following files:

.
├── LICENSE.md                            License information
├── README.qmd                            The source qmd file for this readme
├── README.md                             This readme
├── _viash.yaml                           Global Viash settings
├── resources_test/*.tsv                  Sample files to showcase pipeline and
│   ├── file1.tsv                         run component unit tests.
│   └── file2.tsv
├── src/demo                              Source directory with Viash components
│   ├── combine_columns
│   ├── remove_comments
│   └── take_column
└── workflows
    └── demo_pipeline                     Demo Nextflow pipeline
        ├── main.nf
        └── nextflow.config

Step 2: Build the Viash components

With Viash you can turn the components in src/ into Dockerized Nextflow modules by running:

viash ns build --setup cachedbuild --parallel

While building, this will result in the following output:

Exporting take_column (demo) =docker=> target/docker/demo/take_column
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo_take_column:dev' with Dockerfile
Exporting take_column (demo) =nextflow=> target/nextflow/demo/take_column
Exporting remove_comments (demo) =docker=> target/docker/demo/remove_comments
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo_remove_comments:dev' with Dockerfile
Exporting remove_comments (demo) =nextflow=> target/nextflow/demo/remove_comments
Exporting combine_columns (demo) =docker=> target/docker/demo/combine_columns
[notice] Building container 'ghcr.io/viash-io/viash_project_template/demo_combine_columns:dev' with Dockerfile
Exporting combine_columns (demo) =nextflow=> target/nextflow/demo/combine_columns
All 6 configs built successfully

Once everything is built, a new target directory has been created containing the executables and modules grouped per platform:

target/
├── docker
│   └── demo
│       ├── combine_columns
│       │   ├── combine_columns
│       │   └── viash.yaml
│       ├── remove_comments
│       │   ├── remove_comments
│       │   └── viash.yaml
│       └── take_column
│           ├── take_column
│           └── viash.yaml
└── nextflow
    └── demo
        ├── combine_columns
        │   ├── main.nf
        │   ├── nextflow.config
        │   └── viash.yaml
        ├── remove_comments
        │   ├── main.nf
        │   ├── nextflow.config
        │   └── viash.yaml
        └── take_column
            ├── main.nf
            ├── nextflow.config
            └── viash.yaml

Step 3: Run the pipeline

Now run run the pipeline with Nextflow:

nextflow run . \
  -main-script workflows/demo_pipeline/main.nf \
  -with-docker \
  --input resources_test/file*.tsv \
  --publishDir temp

This will run the three modules in sequence, with the final result result being stored in a file named combined.combine_columns.output.tsv in a new temp directory:

"1"     0.11
"2"     0.23
"3"     0.35
"4"     0.47

What’s next?

Now that you’ve had a taste of what Viash can do for you, take a look at our Guide and Reference pages to learn more about how to use Viash. If you want to start simple, we suggest to take a look at the Native component creation guide.