This site presents a development of the KnowARC EU project to allow the bioinformatics workflow suite Taverna to access computational grids. To get started fast, watch the video guide.
Two different approaches to access the grid have been implemented, each with its very own pros and cons. Both draw from the same repository providing description of use cases.
The implementation was performed for the ARC (Advanced Resource Connector) grid middleware of the NorduGrid. The interface comes as a plugin to Taverna 1.6, 1.7 or 2.0 and without further configuration it runs on all operating systems that run Java.
The concepts behind the implementation should be transferable to other grid middlewares. This page first introduces to a repository of use cases. If your workflow environment features a functional equivalent to this implementation, then you might consider using this interface to add grid computing to your workflows.
Use cases formally describe
In Taverna these show up as usual workflow elements. This technical aid eases the communication with the user of that workflow suite since the manual typing of commands can be avoided, a major motivation of using the workflow suite in the first place. Otherwise, a text-based interface would be required. The mentionend repository can be accessed at http://taverna.nordugrid.org/sharedRepository/index.php.
The grid middleware that is underlying this effort is called ARC (Advanced Resource Connector). It is a mature technology that holds larger efforts like the NorduGrid together.
Taverna was originally designed to integrate and interoperate with web services. The first approach proposes using a web service to mediate the accession of grid resources. In the second, a plug-in for Taverna accesses the grid directly.
For the web service-based approach the gateway service acts as a mediator. It reads the use case repository and automatically spawns web services for each use case. These web services are in turn used by Taverna and presented to the user in the ordinary way. The implementation of the gateway service is based on Tomcat.
In Taverna, the sources that shall be available to workflows are collected by multiple Scavenger modules. A newly developed scavenger queries the workflow element database and presents the such described use cases as if they were regular web services.
To submit to these instances, a new Processor was provided. It will deal with the grid certificate, submit jobs and transfer results. A special novelty in comparison with the web service based approach, besides the dramatically reduced complexity in the overall setup, is that no data is transferred from the grid to the local machine if that is not required. I.e., two consecutively executed of jobs allow the second to read the output of the first directly. Also, multiple jobs may refer to a common data source, i.e., a database, that is stored redundantly on the grid.
For most users the plug-in to Taverna will be the preferred means to access the grid.
The mediator web service needs a regular client installation of ARC to access the grid and its information system. It installs as a regular service under Tomcat. The address needs to be presented to Taverna. Unless if you need the invariance of that service from the version of Taverna that is used, to regular users it is suggested to install solely the plugin to Taverna.
The latest version of the Taverna plug-in is available via
Taverna's plug-in manager. To use it add
http://taverna.nordugrid.org
as a new repository and Taverna will then offer an additional plug-in with the
name ARC Use Cases.
It implements a
Processor (for computations on the Grid and certificate
handling) and a Scavenger (retrieval of use cases).
Screenshots:
1
2
3
4
5
To use the plug-in it needs a grid proxy certificate. It can be created like usual with one of the command grid-proxy-init or voms-proxy-init. These commands store the proxy in the file /tmp/x509up_u${UID} where ${UID} is the numerical user id. This file can be used directly by the plugin.
The plug-in must be able to verify the grid site's certificate. For this it needs the certificate of the corresponding CA. Certificates stored in ${HOME}/.globus/certificates are picked up automatically. For other ways of making them available to the plug-in please refer to the documentation of the Java CoG Kit.
The source of the plug-in is available as a zip file. The source package of the gateway service is still being prepared. Soon it will be offered here under terms of the GNU Public License.
The newest version of the source is always available in the nordugrid public subversion repository at http://svn.nordugrid.org/trac/workarea/browser/T2.6 To compile, please use maven 2. First compile and deploy knowarc-usecases, which is a Taverna-independent pure Java 1.5 client for ARC0 and ARC1. Then you can compile the actual Taverna 2 plugin which is called usecase-activity. The old Taverna 1 plugin is called janitor-taverna-processor and although still completely functional, not longer supported.
Taverna is under continuous development. The here described additions are functional for Taverna versions 1.6.2, 1.7.0 and 1.7.1. The latter two already prepared for the very different internal structure of Taverna 2.0. The very latest version is compatible with Taverna 2.0..
An emphasis is now on bringing this new technology to practice. The target community is less in the grid but on application specialists in bioinformatics. New workflows are hence prepared that serve local needs and a collaboration with the developers of myexperiment is seeked to help with respective communication.
H. Krabbenhöft, S. Möller, D. Bayer (2008) Integrating ARC Grid Middleware with Taverna Workflows. Bioinformatics, 24(9):1221-1222.
Dr. Steffen Möller and Hajo Krabbenhöft
University of Lübeck
Institute for Neuro- and Bioinformatics
Ratzeburger Allee 160
23538 Lübeck
Germany
T: +49 451 500 5504
F: +49 451 500 5502
E: {moeller,krabbenh} at inb.uni-luebeck.de