Arvados
Arvados is an open source platform for managing, processing, and sharing genomic and other large scientific and biomedical data. With Arvados, bioinformaticians run and scale compute-intensive workflows, developers create biomedical applications, and IT administrators manage large compute and storage resources.
The key components of Arvados are:
-
Keep: Keep is the Arvados storage system for managing and storing large
collections of files. Keep combines content addressing and a
distributed storage architecture resulting in both high reliability
and high throughput. Every file stored in Keep can be accurately
verified every time it is retrieved. Keep supports the creation of
collections as a flexible way to define data sets without having to
re-organize or needlessly copy data. Keep works on a wide range of
underlying filesystems and object stores. -
Crunch: Crunch is the orchestration system for running Common Workflow Language workflows. It is
designed to maintain data provenance and workflow
reproducibility. Crunch automatically tracks data inputs and outputs
through Keep and executes workflow processes in Docker containers. In
a cloud environment, Crunch optimizes costs by scaling compute on demand. -
Workbench: The Workbench web application allows users to interactively access
Arvados functionality. It is especially helpful for querying and
browsing data, visualizing provenance, and tracking the progress of
workflows. -
Command Line tools: The command line interface (CLI) provides convenient access to Arvados
functionality in the Arvados platform from the command line. -
API and SDKs: Arvados is designed to be integrated with existing infrastructure. All
the services in Arvados are accessed through a RESTful API. SDKs are
available for Python, Go, R, Perl, Ruby, and Java.
Quick start
To try out Arvados on your local workstation, you can use Arvbox, which
provides Arvados components pre-installed in a Docker container (requires
Docker 1.9+). After cloning the Arvados git repository:
$ cd arvados/tools/arvbox/bin
$ ./arvbox start localdemo
In this mode you will only be able to connect to Arvbox from the same host. To
configure Arvbox to be accessible over a network and for other options see
http://doc.arvados.org/install/arvbox.html for details.