Using the Cluster Cheater to evaluate an algorithm
In JAS/LCD there is are cluster builders which use some algorithm to collect calorimeter cell hits together into sets called clusters. One of the most useful is the SimpleClusterBuilder where the concept is quite intuitive, namely, gather together in a cluster all hits which are either spatially contiguous to each other or where there is a sequence of contiguous pairs connecting any two hits in the cluster. An algorithm to do this would be to pick any hit and put it in the first cluster. Then include all hits in its 26 neighbors (we are assuming a uniform rectangular array of cells and including all the cells on diagonals as neighbors.) Now loop over all the new hits just added to the cluster include their neighboring hits. Iterate until there are no new hits to add. Remove all this hits from the "stock" of remaining calorimeter hits and begin another cluster by picking a random hit and repeating the process. Keep repeating this process until all hits belong to one (and only one) cluster. The SimpleClusterBuilder makes no use of information from another other detector (eg tracker) nor does it make use of any Monte Carlo Truth information. Thus, the SimpleClusterBuilder is something that will apply to real physical detector data when the linear collider is built and taking data.
A second very important cluster builder in JAS/LCD is the Cluster Cheater. As the name implies there is something not quite kosher about this guy, namely, it cheats and uses Monte Carlo information to build its clusters. In particular, it looks at every hit and finds out which Monte Carlo particle created it and gathers hits together into clusters in a many-to-one correspondence with MC particles. The Cluster Cheater represents the perfect cluster builder. Given a calorimeter and a set of hits an event the best any clustering algorithm could ever achieve would be to match the Cluster Cheater. Thus, we can use the clusters from the ClusterCheater as a standard for comparison. (note: You are probably wondering about the case where a calorimeter cell has energy deposited by two different particles, which cluster does the hit go into? Answer: both. The hit carries the information on its mixed parentage and if it matters, algorithms using the Cheater's clusters can divide up the energy as necessary.)
The ultimate "cheat", of course, is to use the Monte Carlo particle information directly instead of the smearing of direction and momentum inherent in reconstruction from detector level data. Thus, with the SimpleClusterBuilder, the ClusterCheater and the Monte Carlo particles we have three levels of resolution we can use to compare the efficacy of algorithms and designs.
We present here the code used to make the "three level" comparison of Z mass reconstruction shown in the plots in the ClusterID Overview and the Performance section. This code nicely illustrates the flexibility of JAS/LCD to do comparative studies. For each simulated Z pole event we will be able to do all three level of reconstruction (Simple, Cheater and MCTruth) as each event is processed. This allows us to do such things as find the thrust axis for each event using each level of reconstruction and compare them on an event by event basis. Thus, when the code is run in JAS the histogram folders will include folders for plots comparing the results of a measurements(eg thrust axis direction differences) between the three levels. We show below the corresponding plots for the thrust axis difference on an event by event basis.