Creating a Scala Component
Developing a new Viash component using Scala.
In this tutorial, you’ll create a component that does the following:
- Extract all hyperlinks from a markdown file
- Check if every URL is reachable
- Create a text report with the results
The component will be able to run locally and as a docker container. In order to create a component you need two files: a script for the functionality and a config file that describes the component.
The files used in this tutorial can be found here:
https://github.com/viash-io/viash_web/tree/main/static/examples/md_url_checker_scala
Prerequisites
To follow along with this tutorial, you need to have this software installed on your machine:
- An installation of Viash.
- A Bash Unix shell.
- An installation of Docker.
- An installation of Scala 2
We recommend you take a look at the hello world example first to understand how components work.
Write a script in scala
The first step of developing this component, is writing the core
functionality of the component, in this case a Scala script.
Create a new folder named my_viash_component and open it. Now
create a new file named script.scala in there and add this code as
its content:
import scala.util.matching.Regex
import scala.sys.process._
object Main extends App {
// VIASH START
case class ViashPar(
inputfile: String,
domain: String
)
val par = ViashPar(
inputfile = "Testfile.md",
domain = "https://viash.io"
)
// VIASH END
val MdLinkPattern: Regex = """\[(.*?)\]\((.*?)\)""".r
val md = scala.io.Source.fromFile(par.inputfile).mkString
println(md)
// 2
val links = MdLinkPattern.findAllMatchIn(md).toList
// 3
for ((MdLinkPattern(name, url), index) <- links.zipWithIndex) {
// 4
val fullUrl = if (url.startsWith("http")) url else par.domain + url
println(index + ": " + fullUrl)
// 5
val cmd = Seq("curl", "-ILs", "--max-redirs", "5", fullUrl)
val out = cmd.!!
if (out contains " 200") {
println("OK")
} else {
println("ERROR! URL cannot be reached. ")
}
}
}
Note the numbered comments scattered about looking like // X
, here’s a
breakdown of the code:
- The variables are placed between
// VIASH START
and// VIASH END
for debugging purposes, their final values will be dynamically generated by Viash once the script is turned into a component. If you want to skip the testing of your script, you can leave these out and Viash will create variables based on the configuration file. There are three variables:inputfile
: The markdown file that needs to be parsed.domain
: The domain URL that gets inserted before any relative URLs. For example, “/documentation/intro” could be replaced with “https://my-website/documentation/intro” to create a valid URL.output
: The path of the output text file that will contain the report.
- The script extracts the hyperlinks from the markdown file.
- Start a loop to iterate the hyperlinks.
- Any relative URLs (or those that don’t start with “http” at least) will get the domain added before it.
- A HEAD request is used to check for a response from the URL. The results get written to the terminal and the report.
Test the script
Before turning the script into a component, it’s a good idea to test if
it actually works as expected.
As the script expects a markdown file with hyperlinks, create a new file
in the script folder named Testfile.md and paste in the following:
# Test File
This is a simple markdown file with some hyperlinks to test if the check_if_URLS_reachable component works correctly.
Some links to websites:
- [Google](https://www.google.com)
- [Reddit](https://www.reddit.com)
- [A broken link](http://microsoft.com/random-link)
Links that are relative to [viash.io](http://www.viash.io):
- You can [install viash here](/guides/getting_started/installation).
- It all starts with a script and a [config file](/api/config/config) for your components.
Now open a terminal in the folder and execute the following command to run the scala script:
scala -nc script.scala
The script will now show the following output:
# Test File
This is a simple markdown file with some hyperlinks to test if the check_if_URLS_reachable component works correctly.
Some links to websites:
- [Google](https://www.google.com)
- [Reddit](https://www.reddit.com)
- [A broken link](http://microsoft.com/random-link)
Links that are relative to [viash.io](http://www.viash.io):
- You can [install viash here](/guides/getting_started/installation).
- It all starts with a script and a [config file](/api/config/config) for your components.
0: https://www.google.com
OK
1: https://www.reddit.com
OK
2: http://microsoft.com/random-link
ERROR! URL cannot be reached.
3: http://www.viash.io
OK
4: https://viash.io/guides/getting_started/installation
ERROR! URL cannot be reached.
5: https://viash.io/api/config/config
ERROR! URL cannot be reached.
If you get this same output, that means the script is working as intended! You might have noticed you didn’t have to provide any arguments, that’s because the values are hard-coded into the script for debugging purposes.
Now the script has been tested, it’s time to create a config file to describe the component based on it.
Describe the component using YAML
A viash config file is a YAML file that describes the behaviour and supported platforms of a Viash component. Create new file named config.vsh.yaml and paste the following template inside of it:
functionality:
name: NAME
description: DESCRIPTION
arguments:
- type: string
name: --input
description: INPUT DESCRIPTION
resources:
- type: LANGUAGE_script
path: SCRIPT
platforms:
- type: native
Every config file requires these two dictionaries: functionality and platforms. This bare-bones config file makes it easy to “fill in the blanks” for this example. For more information about config files, take a look at the Config section of the API.
Let’s start off by defining the functionality of our component.
Defining the functionality
The functionality dictionary describes what the component does and the resources it needs to do so. The first key is name, this will be the name of the component once it’s built. Replace the NAME value with md_url_checker_scala or any other name of your choosing.
Next up is the description key, its value will be printed out at the top when the –help command is called. Replace DESCRIPTION with “Check if URLs in a markdown are reachable and create a text report with the results.”. You can use multiple lines for a description by starting its value with a pipe (|) and a new line, like so:
functionality:
name: md_url_checker_scala
description: |
This is the first line of my description.
Here's a second line!
The arguments dictionary contains all of the arguments that are accepted by the component. These arguments will be injected as variables in the script. In the case of the example script, this are the variables we’re working with:
inputfile
domain
output
To create good arguments, you need to ask yourself a few essential questions about each variable:
- What is the most fitting data type?
- Is it an input or an output?
- Is it required?
Let’s take a closer look at inputfile
for starters:
We know it’s a file, as the script needs the path to a markdown file
as its input. It’s also definitely a required variable, as the
script would be pointless without it.
With this in mind, modify the first argument as follows:
- Change type’s value to file.
- Set name’s value to –inputfile. The name of an argument has to match the variable name as the argument will be injected into the final script. In the case of C# scripts, the variables are added to an anonymous class named par.
- Use “The input markdown file.” for the description value. This description will be included when the –help option is called.
- Add a new key named required and set its value to true. This ensures that the component will not be run without a value for this argument.
- Add another key, name it must_exist and set its value to true. This key is unique to file type arguments, it adds extra logic to the component to check if a file exists before running the component. This saves you from having to do this check yourself in the script.
That’s it for the first argument! The result should look like this:
- type: file
name: --inputfile
description: The input markdown file.
required: true
must_exist: true
Now for domain
, this is a simple optional string that gets added
before relative URLs. Make room for a new argument by creating a new
line below must_exist: true
and press Shift + Tab to back up one
tab so the cursor is aligned with the start of the first argument. Add
the --domain
argument here:
- type: string
name: --domain
description: The domain URL that gets inserted before any relative URLs. For example, "/documentation/intro" could be replaced with "https://my-website/documentation/intro" to create a valid URL.
If an argument isn’t required, you can simply omit the required key. Here’s what the arguments dictionary look like up until now:
arguments:
- type: file
name: --inputfile
description: The input markdown file.
required: true
must_exist: true
- type: string
name: --domain
description: The domain URL that gets inserted before any relative URLs. For example, "/documentation/intro" could be replaced with "https://my-website/documentation/intro" to create a valid URL.
With that, there’s just one more part of the functionality to fill in:
the script itself!
Every Viash component has one or more resources, the most important of
which is often the script. The template already contains a resources
dictionary, so replace the following values to point to the script:
- Set the value of type to scala_script. The script used in this case was written in scala, so the resource type is set accordingly so Viash knows what flavour of code to generate to create the final component. You can find a full overview of the different resource types on the Functionality page.
- Change the value of path to script.scala. This points to the resource and can be a relative path, an absolute path or even a URL. In this case we keep the script in the same directory as the config file to keep things simple.
That finishes up the functionality side of the component! All that’s left is defining the platforms with their dependencies and then running and building the component.
Defining the platforms
The platforms dictionary specifies the requirements to execute the component on zero or more platforms. The list of currently supported platforms are Native, Docker, and Nextflow. If no platforms are specified, a native platform is assumed. Here’s a quick overview of the platforms:
- native: The platform for developers that know what they’re doing or for simple components without any dependencies. All dependencies need to be installed on the system the component is run on.
- docker: This platform is recommended for most components. The dependencies are resolved by using docker containers, either from scratch or by pulling one from a docker repository. This has huge benefits as the end user doesn’t need to have any of the dependencies installed locally.
- nextflow: This converts the component into a NextFlow module that can be imported into a pipeline.
In this tutorial, we’ll take a look at both the native and docker platforms. The platforms are also defined in the config.vsh.yaml file at the very bottom. The native platform is actually already defined in the template, that one type key with a value of native is enough! Now for adding the docker platform, add a new line below the last and add the following:
- type: docker
image: hseeberger/scala-sbt:17.0.0_1.5.5_2.13.6
This tells Viash that this component can be built to a docker container with the an image containing scala as its base. That’s it for the config! Be sure to save it and let’s move on to actually running the component you’ve created. For reference, you can take a look at the completed config.vsh.yaml file in our Github repository.
Run the component
Time to run the component! First off, let’s see what the output of
--help
is. To do that, open a terminal in the my_viash_component
folder and execute the following command:
viash run config.vsh.yaml -- --help
This will show the following:
md_url_checker_scala <not versioned>
Check if URLs in a markdown are reachable.
Options:
--inputfile
type: file, required parameter, file must exist
The input markdown file.
--domain
type: string, required parameter
The domain URL that gets inserted before any relative URLs. For example, "/documentation/intro" could be replaced with "https://my-website/documentation/intro" to create a valid URL.
As you can see, the values you entered into the config file are all
here.
Next, let’s run the component natively with some arguments. You can use
one of your own markdown files as the input if you desire. In that case,
replace Testfile.md in the command with the path to your file.
Execute the following command to run the component with the default
platform, in this case native as it’s the first in the platforms
dictionary:
viash run config.vsh.yaml -- --inputfile=Testfile.md --domain=https://viash.io/
If all goes well, you’ll see something like this output in the terminal:
# Test File
This is a simple markdown file with some hyperlinks to test if the check_if_URLS_reachable component works correctly.
Some links to websites:
- [Google](https://www.google.com)
- [Reddit](https://www.reddit.com)
- [A broken link](http://microsoft.com/random-link)
Links that are relative to [viash.io](http://www.viash.io):
- You can [install viash here](/guides/getting_started/installation).
- It all starts with a script and a [config file](/api/config/config) for your components.
0: https://www.google.com
OK
1: https://www.reddit.com
OK
2: http://microsoft.com/random-link
ERROR! URL cannot be reached.
3: http://www.viash.io
OK
4: https://viash.io//guides/getting_started/installation
ERROR! URL cannot be reached.
5: https://viash.io//api/config/config
ERROR! URL cannot be reached.
For more information on the run command, take a look at the Viash run command page. Great! With that working, the next step is building an executable.
Building an executable
You can generate an executable using either the native or the docker
platform. The former will generate a file that can be run locally, but
depends on your locally installed software packages to work. A docker
executable on the other hand can build and start up a docker container
that handles the dependencies for you.
To create a native build, execute the following command:
viash build config.vsh.yaml
A new folder named output will have been created with an executable inside named md_url_checker_scala. To test it out, execute the following command:
output/md_url_checker_csharp --inputfile=Testfile.md --domain=https://viash.io/
The output is the same as by running the component, but the executable
can be easily shared and now includes the ability to feed arguments to
it and an included --help
command. Not bad!
Next up is the docker executable. You can specify the platform with the
-p
argument and choose an output folder using -o
, apart from that
it’s the same as the previous build command:
viash build -p docker -o docker_output config.vsh.yaml
You’ll now have a docker_output folder alongside the output one. This folder also contains a file named md_url_checker_scala, but its inner workings are slightly different than before. Run md_url_checker_scala with the full arguments list to test what happens:
docker_output/md_url_checker_scala --inputfile=Testfile.md --domain=https://viash.io/
Here’s what just happened:
- If the docker image wasn’t found, Viash will download it.
- A check is made to see if a container named “md_url_checker_scala” exists. If not, one will be built with the image defined in the config as its base.
- All dependencies defined in the config are taken care of.
- The script is run with the passed arguments and the output is passed to your shell.
For more information about the viash build
command, take a look at
its command page. That concludes the building of
executables based on components using Viash!
Writing and running a unit test
To finish off this tutorial, it’s important to talk about unit tests. To ensure that your component works as expected during its development cycle, writing one or more tests is essential. Luckily, writing a unit test for a Viash component is straightforward.
You just need to add test parameters in the config file and write a
script which runs the executable and verifies the output. When running
tests, Viash will automatically build an executable and place it
alongside the other defined resources in a temporary working directory.
To get started, open up config.vsh.yaml file again and add this at
the end of the functionality dictionary, between the path: script.scala
and platforms:
lines:
tests:
- type: bash_script
path: test.sh
- path: Testfile.md
This test dictionary contains a reference to the test script and all of the files that need to be copied over in order to complete a test. In the case of our example, test.sh will be the test script and Testfile.md is necessary as an input markdown file is required for the script to function. Now create a new file named test.sh in the my_viash_component folder and add this as its content:
set -ex
echo ">>> Checking whether output is correct"
./md_url_checker_scala --inputfile Testfile.md --domain https://viash.io > output.txt
[[ ! -f output.txt ]] && echo "Output file could not be found!" && exit 1
grep -q 'https://www.google.com' output.txt
grep -q 'ERROR! URL cannot be reached' output.txt
echo ">>> Test finished successfully!"
exit 0
This bash script will run the component and perform several checks to its output. A successful test runs all the way down and exits with a 0 exit code, any other code means a failure:
set -ex
will stop the script once any of the lines fail and will output all commands to the shell with a ‘+’ before it../md_url_checker --inputfile Testfile.md > test-output.txt
runs the component and writes its output to a file.- All of the
grep
calls check if a certain piece of text could be found. Each of these calls can exit the script if the text wasn’t found. - If everything succeeded, exit with a 0 code. Make sure not to forget this final line in your own tests.
Make sure both the config and test files are saved, then run a test by running this command:
viash test config.vsh.yaml
The output will look like this:
Running tests in temporary directory: '/tmp/viash_test_md_url_checker_scala15666090665075196095'
====================================================================
+/tmp/viash_test_md_url_checker_scala15666090665075196095/test_test.sh/test.sh
>>> Checking whether output is correct
+ echo '>>> Checking whether output is correct'
+ ./md_url_checker_scala --inputfile Testfile.md --domain https://viash.io
+ [[ ! -f output.txt ]]
+ grep -q https://www.google.com output.txt
+ grep -q 'ERROR! URL cannot be reached' output.txt
+ echo '>>> Test finished successfully!'
>>> Test finished successfully!
+ exit 0
====================================================================
[32mSUCCESS! All 1 out of 1 test scripts succeeded![0m
Cleaning up temporary directory
If the test succeeds it simply writes the full output to the shell. If there’s any issues, the script stops and an error message will appear in red. For more information on tests take a look at the viash test command page.
What’s next?
Now you’re ready to use Viash to creating components from your own scripts, check out the rest of our guides and the API section. Here are some good starting points:
- The API section
- An overview of the functionality dictionary of the config file
- More details about the docker platform