Distributed Computing With JAS

Contents

Introduction

This document describes a prototype distributed computing package for use with JAS. The package enables data analysis to be performed on a farm of machines, while using the standard JAS client to create analysis modules, control job execution and view plots.  This can be useful for CPU-intensive analysis, where the CPU power of many machines running in parallel can be exploited, or for data-intensive analysis, where the IO power of each machine can be exploited by distributing the data to local disk on the farm machines.

The package described here works with the standard JAS client and server, version 2.2.4 or later. The distributed computing package consists of three main components:

  1. A catalog server which keeps track of what data is available to each machine in the farm.
  2. A protocol module which can be loaded into the JAS server, to cause it to register itself and its data with the catalog server. 
  3. A control server. This server looks like a normal JAS data server to the JAS client, but uses the catalog server to find out what data is available, and distributes jobs submitted to it to the different machines in the farm. The control server is responsible for making a distributed job running on the farm look like a single job to the client. One particularly important function is to collect up histograms from the server and present them to the JAS client as a single (summed) histogram.

The following figure illustrates the different components of the distributed computing package.

Future of distributed computing in JAS - working with the GRID

The current implementation of distributed computing for JAS is a prototype to explore design strategies. In the future many of the services provided by the GRID infrastructure would fit naturally into an improved implementation of this package. In particular:

Combining the JAS distributed analysis package with the GRID would result in a system by which user would be able to exploit the enormous computing and data-access power of the GRID, while still using an interactive graphical user interface.

Limitations of Current Implementation

Implementation limitations - should be fixed soon

Limitations which require more effort to fix

The current implementation uses the standard JAS communication protocol, Java RMI. This works with about 50 nodes, but will not scale up to 1000 nodes. Instead a scalable system would have to use some type of stateless asynchronous messaging protocol, perhaps JMS.

The current implementation relies on all nodes working correctly. In particular it will hang up if some machine fails to respond to a request, or will abort an operation if a single machine generates an exception. A scalable system would have to be resilient to network or machine failures, and would preferably be able to compensate for machine failures (for example by resubmitting the task which was assigned to the failed machine to a new node).

Using the Distributed Computing Package

Setting Things Up

You need these items to be able to run the distributed computing package

Since the code is all written in Java it can be run on any type of machine (or even a farm of non-heterogeneous machines). These following instructions use windows syntax, you will need to make necessary compensations if using a different type of machine.

Define the following environment variables

Set your PATH and CLASSPATH environment variables as follows:

Setting up rmiregistry

rmiregistry must be run on all nodes which will be used as servers, including the node that will run the catalog server, the node that will run the control server, and any nodes that will run JAS servers. Note that CLASSPATH should not be set when running rmiregistry.

set CLASSPATH=
rmiregistry

Setting up the catalog server

The catalog server must be started before starting any JAS servers. The catalog server can be run on any node.

java catalog.implementation.CatalogServer

Setting up the control server

The control server can be run on any node.

java CSCServer <catalog-host>

where <catalog-host> is the name of the node that is running the catalog server. Note the control server appears to the JAS client exactly like a normal JAS data server, except that it registers itself with the rmiregistry under the name CSCServer instead of JDSServer. This is to make it possible to run a normal JAS data server and a control server on the same node.

Setting up the JAS data servers

Follow the normal instructions for setting up and running a JAS Data Server, except add at the end of the config file the following line:

Protocol protocol.CatalogProtocol <catalog-host>

This will cause the server to register itself and its data with the catalog server. The catalog server makes the following assumptions:

A specific example is also available.

Starting up the JAS client and running a distributed job.

  1. Start the JAS client
  2. Select Job, New Job
  3. Choose a Remote Job for Data Analysis, Next
  4. Enter the name of the host on which your control server is running. Click Advanced Connection Options, and set the service name to CSCServer (instead of JDSServer). Note that JAS always resets the service name to its default when you start a new job, so you must explicitly set the service name each time you connect to a control server. Click OK, Next.
  5. Choose the dataset you want to analyze. If the dataset is composed of different segments of data on different nodes the catalog server will have inserted an asterisk into the name of the dataset.
  6. Click Finish.

Once you have connected to the Control Server then the JAS client should function the same as when you are running a local job, or a normal client-server job, subject to the limitations described earlier. Here is a specific example.