ClusterID and ClusterAnalysis:Overview

ClusterID is the name given to a package developed for the Linear Collider Detector studies (LCD) to address the related issues of  evaluating calorimeter designs and reconstructing calorimeter data. The name "ClusterID" is intended to be descriptive of the approach taken to solving these problems as described below. ClusterAnalysis is a package of tools for evaluating techniques for reconstructing calorimeter data in a consistent and standardized way.

When the energy frontier of experimental HEP reached the level that narrow hadronic jets of particles were produced it created a challenge for segmented calorimeters  to reconstruct the particles from the energy they deposited in the calorimeter cells. The standard example of this problem is when a tightly collimated set of charged and neutral particle in a jet passes through a tracker and then enters a calorimeter. The tracker can give excellent separation and resolution of the three momenta of the charged particles. However, the calorimeter showers of both the charged and neutral particles seem to be spread out enough that they overlap each other making use of the calorimeter information difficult.

Thus, there are perceived to be two general problems with using calorimeter information in a jetty environment. First, the showers of different particles overlap each other. The other problem is implicit in the spread out nature of the individual particle showers, namely, the fragmentation problem of determining which hit cells belong to a particular particle's shower. The fragmentation problem is most severe in the shower of a hadronic particle. The particle interacts with nuclei of the calorimeter absorber material sometimes knocking out neutrons which may travel a large distance through the calorimeter, leaving no ionization trail, before it interacts producing its own hadronic shower. Associating this faraway neutron shower with the original hadronic particle is problematic.

An approach to solving the jets problem called "energy flow" was proposed. An energy flow algorithm sought to use tracking information for the charged particles and calorimeter information for the neutral particles. The general idea was to swim charged particles through the calorimeter and associate hit cells along this path with the charged particle. The remaining hit cells by definition are due to neutral particles and then some technique is required to effectively use the neutral cell hits to reconstruct information about the neutrals in the original jet. Unfortunately, the twin problems of overlap and fragmentation remain to be dealt with in the energy flow approach.

In the ClusterID approach we make a careful quantitative analysis of the overlap and fragmentation problems and deem them surmountable. In particular, we follow the outline of the approach taken in tracking reconstruction. A tracker, like a calorimeter, contains a large number of hits in space. Using pattern matching and fitting techniques, the hits in a tracker are grouped together into "clusters" that are deemed to have been created by a particular particle and the three momenta of the particle is reconstructed from this cluster of hits. In the ClusterID approach to calorimeter reconstruction,  we attempt to do the same thing. We attempt to group together hits from each particle that interacts in the calorimeter and derive from the resulting clusters the three momenta and the type of particle that created the cluster.

ClusterID is a fully functional package of code that reconstructs particles from simulated calorimeter detector data for use in physics analysis. It is written in the JAS/LCD environment (link here). Although it is fully functional it is not developed to its highest possible state of resolution. Much work remains that can be done to improve resolution of the package. As long as it does not correctly reconstruct every particle that interacted in the calorimeter, there is the opportunity to try to find a way to improve its resolution. ClusterID is designed to make this kind of future study as easy as possible. It provides the entire framework needed to investigate the resolution of each new technique so the developer can focus on technique development and not have to spend time coming up with ways to evaluate the technique's efficacy. We have included a Projects (link) section for people to make these kinds of studies and include their results in the ClusterID code and web page system.

From the performance page detailed studies of the performance of ClusterID are available, however, this ClusterID overview page would not be complete without a few plots. Below are two sets of plots of Z's from e+e- events at 91GeV in the LCD SD detector. (note: In these events the Zs have not been given their natural resonance width.) The first set measures the reconstructed Z mass by three different techniques and the second set of plots measures the efficiency and purity of identifying gammas and neutral hadrons in terms of the total kinetic energy of the gammas and neutral hadrons, respectively, in the event.

 

Briefly, these three plots show the same simulated data reconstructed by different means. In all three plots a fast monte carlo is used to create the same set of paramaterized tracks for the charged particles. What varies between the plots is only the reconstruction of neutrals. (In all cases neutrinos are ignored.) In the MCTruth plot all neutrals are included based on the their true momenta and mass. In PerfectClustering all the calorimeter hits from a single neutral particle are gathered together into a cluster and the direction and energy of the particle is determined from this cluster. Of course in PerfectClustering we are "cheating" and using monte carlo particle information to form the clusters. The ClusterID plot represents realistic reconstruction based on having no knowledge of the underlying particles, only hits in the calorimeters.

We may reasonably make the interpretation that the PerfectClustering plot represents the best we could ever do using calorimeter information available with the SD detector . The ClusterID plots represents where we are today in our quest to make the best possible use of the calorimeter information. We see that the width of the Z measured by ClusterID is about twice the best possible width measurement with the SD detector. Thus, the reconstruction technique can potentially be improved to double the resolution. The other area where improvements are possible is in the design of the calorimeter itself which is what determines the 4.5GeV limit in our PerfectClustering plot. For a more detailed interpretation of these plots go to the performance page.

 

 

Look on the performance page  for a complete discussion of how these plots from the ClusterAnalysis suite are made and what they mean. For present purposes what these plots are showing is efficiency and purity of identifying gammas and neutral hadrons at the current (early) stage of ClusterID resolution. An average of 90% of the gamma energy in Z pole events is correctly identified and 6% of the energy identified as a gamma is incorrect. 66% of the energy identified as a neutral hadron is correctly identified and 28% of the energy identified as neutral hadron is incorrectly identified. Thus, at the present stage of development ClusterID is doing quite well with gammas and has lots of room for improvement with neutral hadrons.

This very brief introduction to ClusterID only scratches the surface of the subject. To give a full presentation we have links to a variety subjects which we now describe. In passing we note that this web site is aimed at people with a good working knowledge of high energy physics data analysis but within that catagory there may be a wide range of experience with calorimetry and with physics at a high energy linear collider. Thus, we apologize to the real experts who may find our documentation rather tedious in its detail.

To get a full description of ClusterID.

To learn how to do physics analysis using ClusterID in event reconstruction.

To learn how to evaluate calorimeter recon algorithms and calorimeter designs using ClusterID and ClusterAnalysis.

For a discussion of the overlap and fragmentation problems

For a discussion of calorimeter performance and standardized ways of measuring it.

For a discussion of Jets and Calorimetry:

Tutorial Examples.

The ClusterID project. A list of projects to improve calorimeter reconstruction and testing. These are projects yet to be done, in progress or completed. For the projects undertaken this link will lead to people's results. Go here to learn how to develop improvements to ClusterID.

Updates. We will try to summarized improvements and changes as they are released.

Points to remember about ClusterID and ClusterAnalysis. Here we try to summarize items that sometimes cause confusion or problems for users. Come here if you are stuck.