Root IO for Java
Using JAS to Analyze Root Data

Tony Johnson

July 2001

Contents

Root IO for Java

Goals and Motivation

Using the Package

Demo apps

Accessing Root Files from your own Java programs

Possible uses

Using JAS to analyze Root Data

Future enhancements to JAS

Root IO Implementation

Current Implementation

Limitations of current implementation

Future Plans for ROOT IO

Conclusions

Root IO

A Pure Java Implementation

Part of the FreeHEP Java Library

Goals

Provide a pure-Java package for reading Root files.

Could be extended to writing later

Should work with any Root file

Should not need to know about objects ahead of reading (no need for dll’s, .so files)

Provides access to data from Root file, not methods of C++ objects stored in file.

Suitable for data-centric objects

Mini DST’s, NTuples, Raw Data

User can provide own implementation of object

Easy to use

Efficient (at least in later implementations)

Goal is to be as efficient as native Root IO

Why Bother – Root already exists?

Philosophical Reasons

If we are committing a large amount of HEP data to Root, it is good to know it can be read back even without Root

Java package is currently only 5000 lines of code.

We need it for JAS, Wired etc.

Calling C++, Fortran code + interface to Java is bigger overhead (c.f. Paw, StdHEP).

Returns all the problems we got away from by using Java

Porting issues (E.g. MacOS)

Crashes

Java Applets etc.

Security considerations may not allow native code

Demo: Root Object Browser

Demo: Root Histogram Browser

Interface Builder

Package can read Root file with no a-priori knowledge of contents

Great for systems which use scripting or reflection to get information about objects read.

If you want to compile code against user-defined objects in Root file must use InterfaceBuilder

java hep.io.root.util.InterfaceBuilder <rootfile>

Builds Java interfaces for all user-defined objects found in file

Example of Generated Interface

Example of Reading Root File in Java

Possible Uses

Java based online monitoring

Java based Event Display

EG WIRED – an experiment independent event display toolkit written in Java

Web based histogram browser

Applet based (Java runs in browser)

Servlet based (Java runs in server)

Java based data analysis (e.g. JAS)

Script based data analysis

Jython, Beanshell, Dynamic Java….

Using JAS to Analyze Root Files

Brief Overview of JAS

Why use JAS for Root Analysis

Analyzing Root file with JAS

Demo

Plugin Manager – download Root extensions

Opening a Root File

Using the Object Browser

Creating and Filling Histograms

Introduction to JAS

Pure Java Analysis Environment

Data Format Independent

Modular/Extensible via Plugins/Data Interface Modules

Rich Easy to use GUI

Built in editor/compiler for writing analysis code

Local and Client-Server Operation

Originally targeted at offline analysis – but also used extensively for online monitoring

Written entirely in Java

JAS GUI

JAS Plotter

JAS Editor/Compiler

Extensible via Plugins

Plugins can:

Define experiment specific utilities (event display, analysis utilities, specialized tables).

Define data interfaces to handle new types of data.

Define new plotting routines (e.g. to display special display).

Add menus, create control areas, consoles, and output pages.

Plugins will be more flexible in JAS 3.0 (see discussion of FreeHEP application framework, later).

Examples of Plugins

JAS+Wired

Data Format Independent

Unlike

Root (requires Root files)

PAW (requires PAW files)

JAS is “data format independent”

Special type of Plugin, a Data Interface Module (DIM), reads data and makes it available for analysis in JAS

DIMs exist for

PAW, StdHEP, FlatFiles, SQL database (JDBC), Objectivity (HepTuples), Root

Several experiment specific data formats

You can write your own DIM for your data format

Remote Data Access

Rather than transporting Peta-bytes of data to the physicist

Transport the physics analysis code to the data

Transparently - so that it feels just like local data access

Just ship histogram contents back to the physicists desktop (on demand)

Allows remote analysis with modest network bandwidth

Allows user to “feel” as if using local machine even when accessing remote data.

Why use JAS for Root Analysis?

Root already has great analysis tools!

Why use JAS?

If you (and your users) are 100% happy with Root

No reason to change or try alternatives

Java is a good alternative to C++

Java is simpler to learn and use than C++

Not everyone who wants to do data analysis is a C++ guru – or wants to become one

The robustness of a scripting language

Impossible to crash program using Java (or python etc)

The performance of a compiled language

JAS is still newer than root, but more plugins

Using JAS for Root Analysis

Demo

See writeup at

http://java.freehep.org/lib/freehep/doc/root/rootjas.shtml

JAS Plans

Current release is 2.2.4

Expect to continue to release 2.2.5 etc with incremental improvements.

More plugins coming:

Neural network plugin

Multivariate analysis

AIDA – Abstract Interfaces for Data Analysis

Also working on JAS 3

Larger overhaul of JAS architecture/functionality

Scripting support (Jython?)

AIDA Histograming/Ntuples/etc.

Use FreeHEP application framework

JAS, WIRED, will be plugins into framework.

NTuple explorer

Root IO In Java: Implementation

Current Implementation

Limitations of Current Implementation

Future Plans for Root IO

Methodology

Very little documentation exists on Root internals.

To create IO package involves a reading Root code and reverse engineering

Many features – a lot of trial and error, need lots of test files

Dual track approach….

Anatomy of a Root File

Root File is a Random Access Object Store

Objects in file can be looked up by “Key”

Key is a String.

Each key can correspond to a hierarchy of linked objects

TTree objects are special

Can contain multiple branches

Each branch contains

More branches

A set of objects (e.g. Events, Tracks etc).

TTree objects provide random access to events, and allow reading only a subset of branches for efficiency.

Anatomy of a Root File

Starting with Root 3.0 each file contains a special key “StreamerInfo”

Contains a collection of TStreamerInfo objects which contains information on data members of all objects in file. Allows:

Reading root files without the original code

Reading root files with older versions of objects (schema evolution)

Root files are now self describing

This allows Java program to read files without accessing compiled C++ code.

Implementation

RootFileReader is used to open file.

Understands how to find Keys and Streamer info in file

As objects are read from file

Delegate to RootClassFactory to create objects

Normally use DefaultClassFactory

Can be user provided (or extended)

Each object is responsible to read its own data

RootClassFactory

Representations and Interfaces

Representations are the internal representation of the Root objects created by the RootClassFactory

GenericRootObject is current Representation

Uses a Hashtable to store data – quite inefficient

Easy to debug and fix bugs, add new functionality.

Different objects are created depending on how object is stored in file

Objects stored in TTree’s typically create hollow objects

No data is read from file until it is requested by user

Hence no need to say up-front which branches will be read

Root Class Factory

The DefaultClassFactory looks in the following places to create classes:

For a specific Java class in the package hep.io.root.reps (a SpecificRootObject).

StreamerInfo in the file being read – used to create a “GenericRootObject”

Streamer info in the “bootstrap” file – StreamerInfo.properties

Info in the file “typedef.properties” file – used to define Java mapping for Int_t etc.

Status/Limitations

Currently only supports Root 3.0 or later

Could support earlier files too, but is it worth it?

User supported objects supported so long as they have StreamerInfo

Small problem with TTree in Root 3.01, fix coming soon.

Aims to support all Root files, including compression, splits, etc.

No support yet for

Chaining files, TTree split across files, friendly TTrees.

Performance

Adequate for testing, event displays, small datasets

Analysis of large datasets will require more efficient implementation of representations

Need more test cases, much easier to debug, add new functionality now rather than later.

Future Plans for Root IO

Dynamically build representations

StreamerInfo à JavaByteCode à machine code

Different objects depending on:

How object was stored

Version of object in file (schema evolution)

Expect to have this ready October/November

Use java.nio package in Java 1.4 (due end of year)

Provides more efficient IO for large binary files

Provides support for memory mapped IO

Expect to get very good performance

Common Reflection API for C++?

One advantage Java has over C++ is built-in reflection for all classes

Given pointer to object can find out:

What class of object it is

All methods, members, constructors.

Access member values, call methods and constructors

Recent Analysis Tools Meeting at CERN attended by:

Rene+Fons (Root), myself, Andeas Pfeiffer (Anaphne), Lassi Turra (Iguana), Guy Barrand (OnX)

Identified common reflection API for C++ as a possible collaborative project

If this existed, and was adopted by Root, would make access from Java to Root files and in-memory objects much easier.

FreeHEP Java Library

Root IO is just one component of the open-source FreeHEP Java library.

Non-HEP specific

Application Framework – base for JAS 3 and Wired 2

JACO – Java access to C++ Objects

2D Vector Graphics – generates .eps, .svg, .pdf

(E)PS viewer

HEP specific

hep.physics package

3-vector, 4-vector’s and utilities

Jet Finding, Event Shape routines

Generator Framework, Diagnostic Event Generator

hep.io – STDHEP, Root

hep.aida – Reference implementation of AIDA classes

Yappi – XML Particle Property Database

HEP3D – Some Java 3D utilities, 3D Plotting, Geant4 shapes

Check it out: http://java.freehep.org

Conclusions

Java IO for Root exists as part of the FreeHEP Java library

Currently suitable for many tasks

Event Display, Object Browser, Histogram Browser, Web access to histograms

JAS plugin makes analysis of Root files possible

Suitable for evaluation and analysis of small data samples.

Needs high performance Root IO for large data volumes

Much higher performance version of Root IO coming before end of year

Want feedback on what features are most needed to make this useful.

Links

Root IO package (hep.io.root)

http://java.freehep.org/lib/freehep/doc/root/

JAS

http://jas.freehep.org/

http://java.freehep.org/lib/freehep/doc/root/rootjas.shtml

FreeHEP


Root IO for Java
	Goals and Motivation
	Using the Package
		Demo apps
		Accessing Root Files from your own Java programs
		Possible uses
Using JAS to analyze Root Data
	Future enhancements to JAS
Root IO Implementation
	Current Implementation
	Limitations of current implementation
	Future Plans for ROOT IO
Conclusions


Provide a pure-Java package for reading Root files.
	Could be extended to writing later
	Should work with any Root file
	Should not need to know about objects ahead of reading (no need for dll’s, .so files)
		Provides access to data from Root file, not methods of C++ objects stored in file.
			Suitable for data-centric objects
				Mini DST’s, NTuples, Raw Data
			User can provide own implementation of object
	Easy to use
	Efficient (at least in later implementations)
		Goal is to be as efficient as native Root IO


Philosophical Reasons
	If we are committing a large amount of HEP data to Root, it is good to know it can be read back even without Root
		Java package is currently only 5000 lines of code.
We need it for JAS, Wired etc.
	Calling C++, Fortran code + interface to Java is bigger overhead (c.f. Paw, StdHEP).
		Returns all the problems we got away from by using Java
			Porting issues (E.g. MacOS)
			Crashes
Java Applets etc.
	Security considerations may not allow native code


	Package can read Root file with no a-priori knowledge of contents
		Great for systems which use scripting or reflection to get information about objects read.
	If you want to compile code against user-defined objects in Root file must use InterfaceBuilder
		java hep.io.root.util.InterfaceBuilder <rootfile>
		Builds Java interfaces for all user-defined objects found in file


	Java based online monitoring
	Java based Event Display
		EG WIRED – an experiment independent event display toolkit written in Java
	Web based histogram browser
		Applet based (Java runs in browser)
		Servlet based (Java runs in server)
	Java based data analysis (e.g. JAS)
	Script based data analysis
		Jython, Beanshell, Dynamic Java….


Brief Overview of JAS
Why use JAS for Root Analysis
Analyzing Root file with JAS
	Demo
		Plugin Manager – download Root extensions
		Opening a Root File
		Using the Object Browser
		Creating and Filling Histograms


	Pure Java Analysis Environment
		Data Format Independent
		Modular/Extensible via Plugins/Data Interface Modules
		Rich Easy to use GUI
		Built in editor/compiler for writing analysis code
		Local and Client-Server Operation
		Originally targeted at offline analysis – but also used extensively for online monitoring
		Written entirely in Java


	Plugins can:
		Define experiment specific utilities (event display, analysis utilities, specialized tables).
		Define data interfaces to handle new types of data.
		Define new plotting routines (e.g. to display special display).
		Add menus, create control areas, consoles, and output pages.
		Plugins will be more flexible in JAS 3.0 (see discussion of FreeHEP application framework, later).


Unlike
	Root (requires Root files)
	PAW (requires PAW files)
JAS is “data format independent”
	Special type of Plugin, a Data Interface Module (DIM), reads data and makes it available for analysis in JAS
	DIMs exist for
		PAW, StdHEP, FlatFiles, SQL database (JDBC), Objectivity (HepTuples), Root
		Several experiment specific data formats
	You can write your own DIM for your data format


	Rather than transporting Peta-bytes of data to the physicist
		Transport the physics analysis code to the data
		Transparently - so that it feels just like local data access
		Just ship histogram contents back to the physicists desktop (on demand)
	Allows remote analysis with modest network bandwidth
	Allows user to “feel” as if using local machine even when accessing remote data.


Root already has great analysis tools!
	Why use JAS?
		If you (and your users) are 100% happy with Root
			No reason to change or try alternatives
Java is a good alternative to C++
	Java is simpler to learn and use than C++
		Not everyone who wants to do data analysis is a C++ guru – or wants to become one
	The robustness of a scripting language
		Impossible to crash program using Java (or python etc)
	The performance of a compiled language
JAS is still newer than root, but more plugins


	Demo
	See writeup at
		http://java.freehep.org/lib/freehep/doc/root/rootjas.shtml


Current release is 2.2.4
	Expect to continue to release 2.2.5 etc with incremental improvements.
		More plugins coming:
			Neural network plugin
			Multivariate analysis
			AIDA – Abstract Interfaces for Data Analysis
Also working on JAS 3
	Larger overhaul of JAS architecture/functionality
		Scripting support (Jython?)
		AIDA Histograming/Ntuples/etc.
		Use FreeHEP application framework
			JAS, WIRED, will be plugins into framework.
		NTuple explorer


	Very little documentation exists on Root internals.
	To create IO package involves a reading Root code and reverse engineering
	Many features – a lot of trial and error, need lots of test files
	Dual track approach….


Root File is a Random Access Object Store
Objects in file can be looked up by “Key”
	Key is a String.
	Each key can correspond to a hierarchy of linked objects
	TTree objects are special
		Can contain multiple branches
			Each branch contains
				More branches
				A set of objects (e.g. Events, Tracks etc).
		TTree objects provide random access to events, and allow reading only a subset of branches for efficiency.


Starting with Root 3.0 each file contains a special key “StreamerInfo”
	Contains a collection of TStreamerInfo objects which contains information on data members of all objects in file. Allows:
		Reading root files without the original code
		Reading root files with older versions of objects (schema evolution)
	Root files are now self describing
	This allows Java program to read files without accessing compiled C++ code.


RootFileReader is used to open file.
	Understands how to find Keys and Streamer info in file
	As objects are read from file
		Delegate to RootClassFactory to create objects
			Normally use DefaultClassFactory
			Can be user provided (or extended)
		Each object is responsible to read its own data


Representations are the internal representation of the Root objects created by the RootClassFactory
	GenericRootObject is current Representation
		Uses a Hashtable to store data – quite inefficient
		Easy to debug and fix bugs, add new functionality.
	Different objects are created depending on how object is stored in file
		Objects stored in TTree’s typically create hollow objects
		No data is read from file until it is requested by user
		Hence no need to say up-front which branches will be read


	The DefaultClassFactory looks in the following places to create classes:
		For a specific Java class in the package hep.io.root.reps (a SpecificRootObject).
		StreamerInfo in the file being read – used to create a “GenericRootObject”
		Streamer info in the “bootstrap” file – StreamerInfo.properties
		Info in the file “typedef.properties” file – used to define Java mapping for Int_t etc.


Currently only supports Root 3.0 or later
	Could support earlier files too, but is it worth it?
	User supported objects supported so long as they have StreamerInfo
	Small problem with TTree in Root 3.01, fix coming soon.
Aims to support all Root files, including compression, splits, etc.
	No support yet for
		Chaining files, TTree split across files, friendly TTrees.
Performance
	Adequate for testing, event displays, small datasets
	Analysis of large datasets will require more efficient implementation of representations
Need more test cases, much easier to debug, add new functionality now rather than later.


Dynamically build representations
	StreamerInfo à JavaByteCode à machine code
	Different objects depending on:
		How object was stored
		Version of object in file (schema evolution)
	Expect to have this ready October/November
Use java.nio package in Java 1.4 (due end of year)
	Provides more efficient IO for large binary files
	Provides support for memory mapped IO
Expect to get very good performance


One advantage Java has over C++ is built-in reflection for all classes
	Given pointer to object can find out:
		What class of object it is
		All methods, members, constructors.
		Access member values, call methods and constructors
Recent Analysis Tools Meeting at CERN attended by:
	Rene+Fons (Root), myself, Andeas Pfeiffer (Anaphne), Lassi Turra (Iguana), Guy Barrand (OnX)
	Identified common reflection API for C++ as a possible collaborative project
		If this existed, and was adopted by Root, would make access from Java to Root files and in-memory objects much easier.


Root IO is just one component of the open-source FreeHEP Java library.
	Non-HEP specific
		Application Framework – base for JAS 3 and Wired 2
		JACO – Java access to C++ Objects
		2D Vector Graphics – generates .eps, .svg, .pdf
		(E)PS viewer
	HEP specific
		hep.physics package
			3-vector, 4-vector’s and utilities
			Jet Finding, Event Shape routines
			Generator Framework, Diagnostic Event Generator
		hep.io – STDHEP, Root
		hep.aida – Reference implementation of AIDA classes
		Yappi – XML Particle Property Database
		HEP3D – Some Java 3D utilities, 3D Plotting, Geant4 shapes
Check it out: http://java.freehep.org


Java IO for Root exists as part of the FreeHEP Java library
	Currently suitable for many tasks
		Event Display, Object Browser, Histogram Browser, Web access to histograms
JAS plugin makes analysis of Root files possible
	Suitable for evaluation and analysis of small data samples.
	Needs high performance Root IO for large data volumes
Much higher performance version of Root IO coming before end of year
Want feedback on what features are most needed to make this useful.


	Root IO package (hep.io.root)
		http://java.freehep.org/lib/freehep/doc/root/
	JAS
		http://jas.freehep.org/
		http://java.freehep.org/lib/freehep/doc/root/rootjas.shtml
	FreeHEP