At the CERN School of Computing in Marathon there were presentations given on JAS (presentation), Root and LHC++/Lizard. Students were given the opportunity to ask questions to compare the different systems. The answers to these questions for JAS are given below.
Provide summary information for each
product stating the basic philosophy or approach to solving the
analysis/reconstruction problem
Leverage the power of Java as much
as possible because:
Is a highly productive language (no time wasted debugging core dumps).
Age of product (how long has it been
in development)
4 years (since Hepvis
96)
Platforms supported
Windows
(95/98/NT/2000), Linux, Solaris, or
any platform with a Java VM.
Number of components and total
number of lines of code produced.
As of JAS 2.2.1 -- 53894 lines of
Java code. Note that this is less the 10% of the number of lines in Root, but
not because JAS has 10% of the features of Root (in fact I think it has a
broadly comparable feature set) but because large amounts of Root code
replicate features already directly supported by Java (GUI, IO, reflection).
Over time it is quite likely the number of lines will go down, as we are
better able to use standard tools (e.g. XML instead of custom parsers). (I believe
a significant fraction of root code deals with IO, yet I estimate, based on our
initial very simple implementation of Root IO in Java, that the entire root IO package
for reading and writing Root files (including random access, compression, trees,
automatic splits, pointer following, StreamerInfo) can be implemented in
<1000 lines of Java -- you will have to check back later to see if I am
right).
I'm not quite sure how to enumerate "components", but as I showed in my talk the system is composed of (maybe 20?) highly modular subcomponents which can be used together or independently.
List of external packages used
Binary distributions of JAS are self
contained and can be run "out-of-the-box" with no requirements (except
Java itself). Internally JAS uses many
other packages, such as JavaHelp, jEdit editor, XML parsers, but these are
all freely redistributable and are included in the JAS distribution.
Number of FTEs involved in
development
Approx 2 Full time + contributions
from many others via collaborations (with Wired, LCD, Babar, FreeHEP) and via "open
source" model.
List of experiments where the
product is in active use
CLEO (online monitoring), Babar
(online monitoring), LCD (reconstruction+analysis), µLAN (online monitoring),
CMS and Atlas (test beam work, evaluation), SLD (mini-dst analysis)
Process by which decisions are made and
feedback handled
Currently most decisions have been
made by the developers, with feedback from people using JAS for specific
experiments (particularly LCD, Babar). We have a mailing list and bug report
page and encourage feedback and suggestions from anyone (negative feedback is
very welcome, especially if accompanied by suggestions for improvements).
Interfacing of product with: Experiment
software
Direct interface with C++ code is
currently a weak point of Java, thus direct interface with C++ experiment code
is currently difficult. We expect more and more experiments to adopt Java as the
huge productivity benefits of using Java become more widely appreciated,
meanwhile we are attempting to address this issue via the development of tools
such as JACO, and plan to test this in the context of the Atlas event model.
Interfacing with experiment software in Java (such as LCD) or via some
intermediate storage format (e.g. PAW, Objectivity, ROOT) is comparatively
straightforward.
Common HEP software packages
(including G4)
We have interfaces with Root,
Objectivity, PAW, WIRED, G4, StdHEP, and AIDA. Due to the simple "plugin"
mechanism we expect to develop many more.
Existence within GRID context.
JAS has been designed from the
outset to run in a "client-server" mode, and to support distributed
data analysis. There are Java bindings to many
of the GRID components (e.g GLOBUS) and we expect that features of the GRID such
as global authentification will be easy to interface to JAS. We believe that the
model of moving the code to the data (rather than vice-versa) is most applicable
to HEP data, and think Java is the best language for exploiting this due to its
high performance and built-in network and code portability features.
Alternative products (e.g. JAS &
ROOT)
The Data Interface Model and Plugin
architecture used by JAS means that unlike other systems it is not tied to a
particular data format and is easily able to inter-operate with other tools.
Alternative components (for GUI,
data storage, fitting etc)
We have designed JAS so you can take
individual components out and use them alone (many people are using our plot
bean by itself, or in the
context of Java servlets). Using C++ components directly in Java GUI is not
easy. We plan to add an interface to the LHC++ Gemini fitter, which will allow
fitting using either Minuit or NAG fitters.
Explain product capacity for scaling
in the following areas: more concurrent/distributed users access same dataset,
access and processing of very large datasets
To some extent since the data access
is "external" to JAS this question is not directly relevant. We
believe the JAS data access model will scale to very large datasets, but this
has not been tested extensively to date.
Use of scripting language with
product
Initially JAS was designed to be
operated by the GUI, and by writing Java programs. We have received many
requests for scripting capabilities and have therefore started to
implement this. We demonstrated the use of Beanshell with JAS during the
CSC talk on JAS. Many other scripting languages for Java are available,
including JPython - a complete implementation of Python in Java. Interfacing
these to JAS is almost trivial. Although I rarely remember to talk about it, JAS
histogramming can be used in a "batch" mode - with no GUI, and
analysis can be written in Java, or in any Java scripting language.
Future directions of development:
new features, major changes, short-term and long-term.
Near term we expect to:
Longer term:
To what extent does Xxx allow a user
or experiment to choose their scripting language (e.g. Java, Python, CINT, etc)?
Can an experiment choose more than one?
First, Java is NOT a scripting
language. Scripting languages are designed differently from compiled languages
such as Java, C++ and Fortran, and to use a compiled language as a scripting
language or vice-versa would be unwise. Having said that Java does exhibit some
of the advantages sometimes associated with scripting languages, such as very
fast compile, load, run cycle (especially when using dynamic loading to load
only your analysis routines, as in JAS).
We are currently adding support for scripting languages to JAS, we demoed
beanshell as a scripting language during the talk. There are many other
scripting languages available for Java, including JPython, a complete and very
fast implementation of Python in Java. Any Java scripting language can be very
easily used with JAS (or any Java program). There is no technical reason why an
experiment should not use more than one.
How does Xxx work with non-native data
storage? If an experiment defines its own storage system, can Xxx use it?
Also, can ROOT/JAS work with HepODBMS/Objectivity? Can JAS/LHC++ work with
ROOT files? ("Work" may not be the right word here - perhaps something
like "What capabilities are lost when using .... data" is a better
phrasing?)
JAS does not have a "native"
data format, it can work with any data format for which a DIM exists. DIM's
already exist for PAW, ROOT and Objectivity and many other formats, and it is
fairly easy to create new DIMs for experiment specific data.
The more detailed question is harder to answer, the specifics depend mainly
on how completely the DIM has been implemented. For example the current
Objectivity DIM is only able to read HEPTuple data from objectivity databases.
Objectivity does have a Java binding, so writing a more fully functioned
interface is possible, although there are some complications arising when
attempting to read data initially stored into Objectivity from C++,
especially if no thought was given to Java access up front.
What will need to be developed in Xxx
to handle the expected size of LHC data analysis? What are the current strengths
and weaknesses of Xxx for storing very large amounts of data?
The strengths of JAS are in its ability
to adapt to whatever data format is eventually decided upon, and to support
access to very large datasets using its distributed client-server mode. There
are some weaknesses in the current java.io package when dealing with large
amounts of binary data, but these will be addressed by the addition of a new
java.nio package in the next release of Java (JDK 1.4 scheduled for release next
summer), after which there is no reason to expect Java IO will be any less
efficient than C++ IO.
How does Xxx work with external
software such as GEANT4? GEANT3? What can you do and not do via Xxx?
We demonstrated the use of JAS with
Geant4 during the workshop. The Geant4 collaboration is considering a proposal
to adopt the AIDA interface as a standard interface to histogramming in Geant4,
meaning that it will be easy for Geant4 to interact with any AIDA compliant
analysis tool.
If an experiment has an existing software package, how do you interface
it, and how much its capability will be available via Xxx?
In principle, using a combination of
plugins and DIM's you should be able to interface any experiment to JAS. In
practice in depends how "Java Friendly" the experiment is (extensive
use of C++ features such as templates tend to make it more difficult). Well
designed, modular experiment software also helps. The person who builds the JAS
interface will need to learn a fair bit about Java and JAS, but once that is
done it should be easy for other collaborators to use the interface.
How does Xxx utilize large parallel
farms for computation?
The "Client-server" model in
JAS was designed to support distributed computing. It has not yet been tested
with very large datasets on large farms, but that will hopefully be done in the
coming year.
If I want to make an improvement in Xxx,
how do I go about it?
JAS
is an "Open Source" project. All of the source code is easily
available and we use a Java version of make which allows you to build the system
yourself, in the same way on any platform, using the simple instructions on our
web site. (Building JAS from scratch takes less than one minute, and much less
if you only need to recompile files you have changed). Any changes or additions
you make are likely to be happily accepted back into the project.
In addition you can often extend JAS without having to learn the internals of the program by writing a plugin which adds the extra functionality you require.
Could you show the min. COMPILED
program to:
The program is given below. Note
that JAS supports many different histogramming algorithms, but to make this test
as comparable to Root and LHC++ as possible I have choosen to use HBOOK style
"fixed" binning in this example. The question did not state how many
bins should be used for the histograms, so the size of the histogram file is
somewhat arbitrary (I took the JAS default of 50 bins for fixed binned
histograms).
The size of the file was 12667 bytes, and the program took 891 ms to execute on my 300MHz PII laptop (probably dominated by startup time).
Could
you show the minimum INTERACTIVE program to:
This
is very easy to do with the JAS interactive GUI, just open the javahist file,
create a new 1x2 plot page, display the desired histograms, and perform the fits.
Currently JAS supports saving plots in XML or GIF format, but the next release
of JAS will also support encapsulated postscript and other vector graphics
formats (using the freehep org.freehep.graphics2d
package) You can generate a PS file today by using "Print" and
then selecting, "Print to File".
As explained above support for scripting languages such as beanshell is just being added to JAS, and so support is not yet complete, in particular fitting is not yet (easily) usable from the scripting language. This will be fixed very soon and you will then be able to use the following script.