Root IO for Java
Using JAS to Analyze Root Data
Tony Johnson
July 2001

Contents
Root IO for Java
Goals and Motivation
Using the Package
Demo apps
Accessing Root Files from your own Java programs
Possible uses
Using JAS to analyze Root Data
Future enhancements to JAS
Root IO Implementation
Current Implementation
Limitations of current implementation
Future Plans for ROOT IO
Conclusions

Root IO
A Pure Java Implementation
Part of the FreeHEP Java Library

Goals
Provide a pure-Java package for reading Root files.
Could be extended to writing later
Should work with any Root file
Should not need to know about objects ahead of reading (no need for dll’s, .so files)
Provides access to data from Root file, not methods of C++ objects stored in file.
Suitable for data-centric objects
Mini DST’s, NTuples, Raw Data
User can provide own implementation of object
Easy to use
Efficient (at least in later implementations)
Goal is to be as efficient as native Root IO

Why Bother – Root already exists?
Philosophical Reasons
If we are committing a large amount of HEP data to Root, it is good to know it can be read back even without Root
Java package is currently only 5000 lines of code.
We need it for JAS, Wired etc.
Calling C++, Fortran code + interface to Java is bigger overhead (c.f. Paw, StdHEP).
Returns all the problems we got away from by using Java
Porting issues (E.g. MacOS)
Crashes
Java Applets etc.
Security considerations may not allow native code

Demo: Root Object Browser

Demo: Root Histogram Browser

Interface Builder
Package can read Root file with no a-priori knowledge of contents
Great for systems which use scripting or reflection to get information about objects read.
If you want to compile code against user-defined objects in Root file must use InterfaceBuilder
java hep.io.root.util.InterfaceBuilder <rootfile>
Builds Java interfaces for all user-defined objects found in file

Example of Generated Interface

Example of Reading Root File in Java

Possible Uses
Java based online monitoring
Java based Event Display
EG WIRED – an experiment independent event display toolkit written in Java
Web based histogram browser
Applet based (Java runs in browser)
Servlet based (Java runs in server)
Java based data analysis (e.g. JAS)
Script based data analysis
Jython, Beanshell, Dynamic Java….

Using JAS to Analyze Root Files
Brief Overview of JAS
Why use JAS for Root Analysis
Analyzing Root file with JAS
Demo
Plugin Manager – download Root extensions
Opening a Root File
Using the Object Browser
Creating and Filling Histograms

Introduction to JAS
Pure Java Analysis Environment
Data Format Independent
Modular/Extensible via Plugins/Data Interface Modules
Rich Easy to use GUI
Built in editor/compiler for writing analysis code
Local and Client-Server Operation
Originally targeted at offline analysis – but also used extensively for online monitoring
Written entirely in Java

JAS GUI

JAS Plotter

JAS Editor/Compiler

Extensible via Plugins
Plugins can:
Define experiment specific utilities (event display, analysis utilities, specialized tables).
Define data interfaces to handle new types of data.
Define new plotting routines (e.g. to display special display).
Add menus, create control areas, consoles, and output pages.
Plugins will be more flexible in JAS 3.0 (see discussion of FreeHEP application framework, later).

Examples of Plugins

JAS+Wired

Data Format Independent
Unlike
Root (requires Root files)
PAW (requires PAW files)
JAS is “data format independent”
Special type of Plugin, a Data Interface Module (DIM), reads data and makes it available for analysis in JAS
DIMs exist for
PAW, StdHEP, FlatFiles, SQL database (JDBC), Objectivity (HepTuples), Root
Several experiment specific data formats
You can write your own DIM for your data format

Remote Data Access
Rather than transporting Peta-bytes of data to the physicist
Transport the physics analysis code to the data
Transparently - so that it feels just like local data access
Just ship histogram contents back to the physicists desktop (on demand)
Allows remote analysis with modest network bandwidth
Allows user to “feel” as if using local machine even when accessing remote data.

Why use JAS for Root Analysis?
Root already has great analysis tools!
Why use JAS?
If you (and your users) are 100% happy with Root
No reason to change or try alternatives
Java is a good alternative to C++
Java is simpler to learn and use than C++
Not everyone who wants to do data analysis is a C++ guru – or wants to become one
The robustness of a scripting language
Impossible to crash program using Java (or python etc)
The performance of a compiled language
JAS is still newer than root, but more plugins

Using JAS for Root Analysis
Demo
See writeup at
http://java.freehep.org/lib/freehep/doc/root/rootjas.shtml

JAS Plans
Current release is 2.2.4
Expect to continue to release 2.2.5 etc with incremental improvements.
More plugins coming:
Neural network plugin
Multivariate analysis
AIDA – Abstract Interfaces for Data Analysis
Also working on JAS 3
Larger overhaul of JAS architecture/functionality
Scripting support (Jython?)
AIDA Histograming/Ntuples/etc.
Use FreeHEP application framework
JAS, WIRED, will be plugins into framework.
NTuple explorer

Root IO In Java: Implementation
Current Implementation
Limitations of Current Implementation
Future Plans for Root IO

Methodology
Very little documentation exists on Root internals.
To create IO package involves a reading Root code and reverse engineering
Many features – a lot of trial and error, need lots of test files
Dual track approach….

Anatomy of a Root File
Root File is a Random Access Object Store
Objects in file can be looked up by “Key”
Key is a String.
Each key can correspond to a hierarchy of linked objects
TTree objects are special
Can contain multiple branches
Each branch contains
More branches
A set of objects (e.g. Events, Tracks etc).
TTree objects provide random access to events, and allow reading only a subset of branches for efficiency.

Anatomy of a Root File
Starting with Root 3.0 each file contains a special key “StreamerInfo”
Contains a collection of TStreamerInfo objects which contains information on data members of all objects in file. Allows:
Reading root files without the original code
Reading root files with older versions of objects (schema evolution)
Root files are now self describing
This allows Java program to read files without accessing compiled C++ code.

Implementation
RootFileReader is used to open file.
Understands how to find Keys and Streamer info in file
As objects are read from file
Delegate to RootClassFactory to create objects
Normally use DefaultClassFactory
Can be user provided (or extended)
Each object is responsible to read its own data

RootClassFactory

Representations and Interfaces
Representations are the internal representation of the Root objects created by the RootClassFactory
GenericRootObject is current Representation
Uses a Hashtable to store data – quite inefficient
Easy to debug and fix bugs, add new functionality.
Different objects are created depending on how object is stored in file
Objects stored in TTree’s typically create hollow objects
No data is read from file until it is requested by user
Hence no need to say up-front which branches will be read

Root Class Factory
The DefaultClassFactory looks in the following places to create classes:
For a specific Java class in the package hep.io.root.reps (a SpecificRootObject).
StreamerInfo in the file being read – used to create a “GenericRootObject”
Streamer info in the “bootstrap” file – StreamerInfo.properties
Info in the file “typedef.properties” file – used to define Java mapping for Int_t etc.

Status/Limitations
Currently only supports Root 3.0 or later
Could support earlier files too, but is it worth it?
User supported objects supported so long as they have StreamerInfo
Small problem with TTree in Root 3.01, fix coming soon.
Aims to support all Root files, including compression, splits, etc.
No support yet for
Chaining files, TTree split across files, friendly TTrees.
Performance
Adequate for testing, event displays, small datasets
Analysis of large datasets will require more efficient implementation of representations
Need more test cases, much easier to debug, add new functionality now rather than later.

Future Plans for Root IO
Dynamically build representations
StreamerInfo à JavaByteCode à machine code
Different objects depending on:
How object was stored
Version of object in file (schema evolution)
Expect to have this ready October/November
Use java.nio package in Java 1.4 (due end of year)
Provides more efficient IO for large binary files
Provides support for memory mapped IO
Expect to get very good performance

Common Reflection API for C++?
One advantage Java has over C++ is built-in reflection for all classes
Given pointer to object can find out:
What class of object it is
All methods, members, constructors.
Access member values, call methods and constructors
Recent Analysis Tools Meeting at CERN attended by:
Rene+Fons (Root), myself, Andeas Pfeiffer (Anaphne), Lassi Turra (Iguana), Guy Barrand (OnX)
Identified common reflection API for C++ as a possible collaborative project
If this existed, and was adopted by Root, would make access from Java to Root files and in-memory objects much easier.

FreeHEP Java Library
Root IO is just one component of the open-source FreeHEP Java library.
Non-HEP specific
Application Framework – base for JAS 3 and Wired 2
JACO – Java access to C++ Objects
 2D Vector Graphics – generates .eps, .svg, .pdf
(E)PS viewer
HEP specific
hep.physics package
3-vector, 4-vector’s and utilities
Jet Finding, Event Shape routines
Generator Framework, Diagnostic Event Generator
hep.io – STDHEP, Root
hep.aida – Reference implementation of AIDA classes
Yappi – XML Particle Property Database
HEP3D – Some Java 3D utilities, 3D Plotting, Geant4 shapes
Check it out: http://java.freehep.org

Conclusions
Java IO for Root exists as part of the FreeHEP Java library
Currently suitable for many tasks
Event Display, Object Browser, Histogram Browser, Web access to histograms
JAS plugin makes analysis of Root files possible
Suitable for evaluation and analysis of small data samples.
Needs high performance Root IO for large data volumes
Much higher performance version of Root IO coming before end of year
Want feedback on what features are most needed to make this useful.

Links
Root IO package (hep.io.root)
http://java.freehep.org/lib/freehep/doc/root/
JAS
http://jas.freehep.org/
http://java.freehep.org/lib/freehep/doc/root/rootjas.shtml
FreeHEP