Event access

Outline

Problem statement

As of today, LHC data is stored as ROOT1 files. Some of the analysis on these files are relatively simple analytical queries, which as of today are done with hand-written C++ programs. It has been already proposed that declarative queries would likely be easier to write for physicists2.

As far as I know, this approach never went beyond a proof-of-concept.

It could be interesting to allow physicist to query storages directly for the set of events they are interested in, rather than having to do remote I/O and/or downloading the whole file to their machine or computing node.

Proposed solution

A declarative API running on a storage element (i.e. EOS3), so the users could ask directly for the set of events they are interested in, rather than having to write complex C++ programs that need to search file per file.

This solution should perform at least as good as a bare C++ program.

Possible approaches

There has been prototyping efforts to use the CPU of the EOS disks servers for running user jobs. Part of the system could follow the same approach, run jobs on EOS disks, and then aggregate.

Difficulties

  1. ROOT - An Object Oriented Data Analysis Framework, Rene Brun and Fons Rademakers ↩︎

  2. Adaptive query processing on RAW data, Manos Karpathiotakis et al ↩︎ ↩︎2

  3. Exabyte Scale Storage at CERN, Andreas J Peters and Lukasz Janyst ↩︎

  4. Presto ↩︎