Histogramming

Introduction

Java Analysis Studio includes a rich set of classes for filling and operating on histograms, as part of the hep.analysis package. These classes have been carefully designed for ease of use, while at the same time providing a lot of capabilities for advanced users as well as allowing users to extend the built-in functionality by writing their own Java classes that inherit from the built-in classes. The classes described here deal only with creating, and filling histograms, in the current version of Java Analysis Studio display of histograms as well as fitting is done using the GUI described in detail in later chapters. (Future versions will allow for programmatic fitting and displaying of histograms as well).

Histograms in JAS are all given names and may in addition by grouped into folders. Within each folder each histogram must have a unique name (creating a new histogram with the same name causes the old histogram to be replaced). Folders are used both for logical grouping of histograms, and to allow options to be applied to a whole set of histograms by applying options to the folder containing the histograms. The Histogram class includes support for both 1-dimensional and 2-dimensional histograms as well as scatter plots. Binning of data is delegated by the histogram to a Partition class. There are many built-in partitions for common classes of histograms, but user written partitions are also supported, and provide a simple mechanism for extending the capabilities of histograms to support unusual cases with minimal work.

You may histogram not only floating point numbers, but also dates, times or strings. The histogram class will attempt to "do-the-right-thing" by selecting an appropriate partition the first time you fill a particular histogram, but you can modify this behavior by specifically setting a type of partition for an individual or a group of histograms.

There are two main ways to refer to a histogram, either by keeping a standard Java reference to the histogram as shown below:

Histogram myhist = new Histogram("My Histogram");
for (int i=0; i<100; i++) myhist.fill(Math.random());

Or by referring to the histogram by name as shown below:

for (int i=0; i<100; i++) Histogram.find("My Histogram").fill(Math.random());

The find method of Histogram will create a histogram with the given name if a histogram of the given name does not already exist (in the current folder, see discussion of folders below). The EventHandler class, which you will normally use to fill histograms, provides a convenience method histogram which further simplifies the filling of histograms, thus:

for (int i=0; i<100; i++) histogram("My Histogram").fill(Math.random());

The EventHandler.histogram method is logically equivalent to the Histogram.find method, but it contains some optimizations such that its use entails very little overhead in locating the histogram as compared to keeping a direct reference to the histogram. In general keeping one Java variable for each histogram can become very inconvenient when filling many histograms, so we recommend using the histogram method in EventHandler in most cases, unless speed is at an absolute premium, or unless you are filling histograms outside of an EventHandler.

Histogram Folders

As mentioned above, histograms may be grouped into folders. By default all histograms are created in the "root" histogram folder, and the root histogram folder is set as the default folder whenever EventHandler methods are called. You may easily create new folders, and navigate between folders using the HistogramFolder class. For example:

HistogramFolder f = new HistogramFolder("/Pass 1/xyz");
Histogram myhist = new Histogram("My Histogram",f);

This creates a new histogram called "My Histogram" in the folder "/Pass 1/xyz". Note that Unix filename conventions are used in naming and navigating folders, thus / is the folder delimiter, paths beginning with / are taken relative to the root folder, while folders not beginning with / are interpreted relative to the current folder, and .. can be used to navigate "up" from the current folder. Unlike Unix, filenames may contain spaces.

As was the case with Histograms, it is often inconvenient to keep a variable for each folder, so an alternative is to use the static method HistogramFolder.setCurrent. For example:

HistogramFolder.setCurrent("/Pass 1/xyz");
histogram("My Histogram").fill(Math.random());

Remember that the histogram method will create the named histogram in the current folder if it does not already exist.

Finally folders can also be used to set properties on the entire set of histograms contained in the folder. This is discussed further under partitions below.

Filling

The Histogram class contains methods for filling both one and two-dimensional histograms, as well as scatter plots. Histograms can be filled using not only Java double variables, but also Date and String objects.

The simplest method of filling histograms is to use the fill method that takes one argument. Some examples of filling a histogram with different type of data is given below.

FillW to fill with a weight.

Use Fill with two arguments to create a profile plot.

ScatterPlots

In the current version of Java Analysis Studio scatterplots are created using a separate class ScatterPlot. In a future version of Java Analysis Studio, ScatterPlot will become a special case of Histogram, and the functionality of scatterplots and histograms more closely integrated, so that it is possible to convert easily from scatterplots to profile plots etc.

Extracting information about histograms

Writing Custom Partitions

Partitions

Partition Nomenclature:

A fixed partition is one in which the bin width as well as min/max is specified when the partition is created and cannot subsequently be changed without re-analyzing the data. Fixed partitions correspond to traditional HBOOK and Handypak histograms.

A variable partition is one in which the individual elements are kept in an array, and binned only when a plot is requested. Thus bin size as well as min, max values can easily be changed without rerunning the analysis. While variable partitions are very flexible they can consume a lot of memory and so are not suitable for histograms containing many entries.

An automatic partition is a partition which initially acts are a variable partition, but converts itself automatically to a fixed partition after a certain number of entries have been accumulated.

A time partition is one in which bins represent a certain range of time, rather than a numeric range.

A string partition is one in which bins are given explicit names, rather than representing some continuous numeric function. Bins may (for example) represent types of particle, or tracking volumes.

An integer partition is one in which bin widths are constrained to be integer (as opposed to real) numbers.