About Data Formats

Each bit of data in an event must be written in a supported data format. A data format is essentially a C++ class, where a class defines a data structure (a data type with data members). The term data format can be used to refer to the format of the data written using the class (e.g., data format as a sort of template), or to the instantiated class object itself. The DataFormats package and the SimDataFormats package (for simulated data) in the CMSSW CVS repository contain all the supported data formats that can be written to an Event file. So, for example, if you wish to add data to an Event, your EDProducer module? must instantiate one or more of these data format classes.

Data formats (classes) for reconstructed data, for example, include Reco.Track, Reco.TrackExtra, and many more. See the Reference Manual section RECO data tier for the full listing.

About Data Tiers

Event information from each step in the simulation and reconstruction chain is logically grouped into what we call a data tier. Examples of data tiers include RAW and RECO, and for MC, GEN, SIM and DIGI. A data tier may contain multiple data formats, as mentioned above for reconstructed data. A given dataset may consist of multiple data tiers, e.g., the term GenSimDigi includes the generation (MC), the simulation (Geant) and digitalization steps. The most important tiers from a physicist's point of view are probably RECO (all reconstructed objects and hits) and AOD (a smaller subset of RECO). The following table gives an overview.

E.g., the RAW data tier collects detector data after online formatting plus some trigger results, while the RECO tier collects reconstructed objects.

Data Tier Listing

Event Format Contents Purpose Data Type Ref Event Size (MB)
DAQ-RAW Detector data from front end electronics + L1 trigger result. Primary record of physics event. Input to online HLT   1-1.5
RAW Detector data after online formatting, the L1 trigger result, the result of the HLT selections (HLT trigger bits), potentially some of the higher level quantities calculated during HLT processing. Input to Tier-0 reconstruction. Primary archive of events at CERN.   1.5
RECO Reconstructed objects (tracks, vertices, jets, electrons, muons, etc.) and reconstructed hits/clusters Output of Tier-0 reconstruction and subsequent rereconstruction passes. Supports re-finding of tracks, etc. RECO & AOD 0.25
AOD Subset of RECO. Reconstructed objects (tracks, vertices, jets, electrons, muons, etc.). Possible small quantities of very localised hit information. Physics analysis, limited refitting of tracks and clusters RECO & AOD 0.05
TAG Run/event number, high-level physics objects, e.g. used to index events. Rapid identification of events for further study (event directory).   0.01
FEVT Full Event: Term used to refer to RAW+RECO together (not a distinct format). multiple   1.75
GEN Generated Monte Carlo event -   -
SIM Energy depositions of MC particles in detector (sim hits). -   -
DIGI Sim hits converted into detector response. Basically the same as the RAW output of the detector. -   1.5
The Data Type Ref column entries point to the CMSSW Reference Manual, which is not complete.

Data Tiers: Reconstructed (RECO) Data and Analysis Object Data (AOD)

RECO data contains objects from all stages of reconstruction. AOD are derived from the RECO information to provide data for physics analyses in a convenient, compact format. Typically, physics analyses don't require you to rerun the reconstruction process on the data. Most physics analyses can run on AOD data.



RECO is the name of the data-tier which contains objects created by the event reconstruction program. It is derived from RAW data and provides access to reconstructed physics objects for physics analysis in a convenient format. Event reconstruction is structured in several hierarchical steps:

  1. Detector-specific processing: Starting from detector data unpacking and decoding, detector calibration constants are applied and cluster or hit objects are reconstructed.
  2. Tracking: Hits in the silicon and muon detectors are used to reconstruct global tracks. Pattern recognition in the tracker is the most CPU-intensive task.
  3. Vertexing: Reconstructs primary and secondary vertex candidates.
  4. Particle identification: Produces the objects most associated with physics analyses. Using a wide variety of sophisticated algorithms, standard physics object candidates are created (electrons, photons, muons, missing transverse energy and jets; heavy-quarks, tau decay).

The normal completion of the reconstruction task will result in a full set of these reconstructed objects usable by CMS physicists in their analyses. You would only need to rerun these algorithms if your analysis requires you to take account of such things as trial calibrations, novel algorithms etc.

Reconstruction is expensive in terms of CPU and is dominated by tracking. The RECO data-tier will provide compact information for analysis to avoid the necessity to access the RAW data for most analysis. Following the hierarchy of event reconstruction, RECO will contain objects from all stages of reconstruction. At the lowest level it will be reconstructed hits, clusters and segments. Based on these objects reconstructed tracks and vertices are stored. At the highest level reconstructed jets, muons, electrons, b-jets, etc. are stored. A direct reference from high-level objects to low-level objects will be possible, to avoid duplication of information. In addition the RECO format will preserve links to the RAW information.

The RECO data includes quantities required for typical analysis usage patterns such as: track re-finding, calorimeter reclustering, and jet energy calibration. The RECO event content is documented in the Reference Manual at RECO Event Content, RECO data tier.


AOD are derived from the RECO information to provide data for physics analysis in a convenient, compact format. AOD data are usable directly by physics analyses. AOD data will be produced by the same, or subsequent, processing steps as produce the RECO data; and AOD data will be made easily available at multiple sites to CMS members. The AOD will contain enough information about the event to support all the typical usage patterns of a physics analysis. Thus, it will contain a copy of all the high-level physics objects (such as muons, electrons, taus, etc.), plus a summary of the RECO information sufficient to support typical analysis actions such as track refitting with improved alignment or kinematic constraints, re-evaluation of energy and/or position of ECAL clusters based on analysis-specific corrections. The AOD, because of the limited size that will not allow it to contain all the hits, will typically not support the application of novel pattern recognition techniques, nor the application of new calibration constants, which would typically require the use of RECO or RAW information.

The AOD data tier will contain physics objects: tracks with associated Hits, calorimetric clusters with associated Hits, vertices, jets and high-level physics objects (electrons, muons, Z boson candidates, and so on).

Because the AOD data tier is relatively compact, all Tier-1 computing centres are able to keep a full copy of the AOD, while they will hold only a subset of the RAW and RECO data tiers.

Reference Documentation for RECO and AOD Data Format Packages

Doxygen-generated reference documentation on data format packages is provided for every release. It is accessible at the following links:

These links provide a list of all packages related to the RECO and AOD data formats within the CMSSW repository. Links there point to the package documentation. Starting from the CMSSW Documentation Main Page, you should in principle be able to research packages using any of the views presented (Data, Functional, Detector). E.g.,

-- Referrence : WorkBookCMSSW? CMSSWWorkBook

-- DongHoMoon - 13 Nov 2007

Topic attachments
I Attachment History Action Size Date Who Comment
GIFgif whats_in_aod_reco.gif r1 manage 28.6 K 2007-12-03 - 12:41 UnknownUser  
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2007-12-03 - GaramHahn
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding KoreaCmsWiki? Send feedback