Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 175 additions & 16 deletions io/io/inc/ROOT/RFile.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,14 @@

#include <ROOT/RError.hxx>

#include <deque>
#include <memory>
#include <iostream>
#include <string_view>
#include <typeinfo>

class TFile;
class TIterator;
class TKey;

namespace ROOT {
Expand All @@ -29,6 +32,122 @@ ROOT::RLogChannel &RFileLog();

} // namespace Internal

/// Given a "path-like" string (like foo/bar/baz), returns a pair `{ dirName, baseName }`.
/// `baseName` will be empty if the string ends with '/'.
/// `dirName` will be empty if the string contains no '/'.
/// `dirName`, if not empty, always ends with a '/'.
/// NOTE: this function does no semantic checking or path expansion, nor does it interact with the
/// filesystem in any way (so it won't follow symlink or anything like that).
/// Moreover it doesn't trim the path in any way, so any leading or trailing whitespaces will be preserved.
/// This function does not perform any copy: the returned string_views have the same lifetime as `path`.
std::pair<std::string_view, std::string_view> DecomposePath(std::string_view path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one sounds like an internal/utility function. Have you considered moving it to the implementation file directly? If it must stay in the header, I believe it should go in Internal namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be a useful function for users to manipulate paths more easily, but arguably it could be considered "advanced" enough to go into Detail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I have two followup comments:

  • If the function is generic enough, i.e. it could be useful on its own even if RFile did not exist, then it should be placed in core/foundation/inc/ROOT/StringUtils.hxx or somewhere close imho.
  • If it's a function that can be situationally useful but only for users of RFile, then I would still put it at least in Detail and then it could be left in this header.


class RFileKeyIterable;

/**
\class ROOT::Experimental::RKeyInfo
\ingroup RFile
\brief Information about an RFile object's Key.

Every object inside a ROOT file has an associated "Key" which contains metadata on the object, such as its name, type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "Key"? I find both the capitalisation and the usage of quotes a bit misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this sentence is defining what a "Key" is in the context of a ROOT file

etc.
Querying this information can be done via RFile::ListKeys(). Reading an object's Key
doesn't deserialize the full object, so it's a relatively lightweight operation.
*/
class RKeyInfo final {
friend class ROOT::Experimental::RFileKeyIterable;

public:
enum class ECategory : std::uint16_t {
kInvalid,
kObject,
kDirectory
};

private:
std::string fPath;
std::string fTitle;
std::string fClassName;
std::uint16_t fCycle = 0;
ECategory fCategory = ECategory::kInvalid;

public:
/// Returns the absolute path of this key, i.e. the directory part plus the object name.
const std::string &GetPath() const { return fPath; }
/// Returns the base name of this key, i.e. the name of the object without the directory part.
std::string GetBaseName() const { return std::string(DecomposePath(fPath).second); }
const std::string &GetTitle() const { return fTitle; }
const std::string &GetClassName() const { return fClassName; }
std::uint16_t GetCycle() const { return fCycle; }
ECategory GetCategory() const { return fCategory; }
};

/// The iterable returned by RFile::ListKeys()
class RFileKeyIterable final {
using Pattern_t = std::string;

TFile *fFile = nullptr;
Pattern_t fPattern;
std::uint32_t fFlags = 0;

public:
class RIterator {
friend class RFileKeyIterable;

struct RIterStackElem {
// This is ugly, but TList returns an (owning) pointer to a polymorphic TIterator...and we need this class
// to be copy-constructible.
std::shared_ptr<TIterator> fIter;
std::string fDirPath;

// Outlined to avoid including TIterator.h
RIterStackElem(TIterator *it, const std::string &path = "");
// Outlined to avoid including TIterator.h
~RIterStackElem();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I would encourage use of rule of five, especially since this class is required to be copy-constructible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really useful in this case though? The only reason we're defining the dtor explicitly is to outline it in the C++ file, but it's not doing anything special, so I don't think we need to also specify the ctors


// fDirPath doesn't need to be compared because it's implied by fIter.
bool operator==(const RIterStackElem &other) const { return fIter == other.fIter; }
};

// Using a deque to have pointer stability
std::deque<RIterStackElem> fIterStack;
Pattern_t fPattern;
const TKey *fCurKey = nullptr;
std::uint16_t fRootDirNesting = 0;
std::uint32_t fFlags = 0;

void Advance();

// NOTE: `iter` here is an owning pointer (or null)
RIterator(TIterator *iter, Pattern_t pattern, std::uint32_t flags);

public:
using iterator = RIterator;
using iterator_category = std::input_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = RKeyInfo;
using pointer = const value_type *;
using reference = const value_type &;

iterator &operator++()
{
Advance();
return *this;
}
value_type operator*();
bool operator!=(const iterator &rh) const { return !(*this == rh); }
bool operator==(const iterator &rh) const { return fIterStack == rh.fIterStack; }
};

RFileKeyIterable(TFile *file, std::string_view rootDir, std::uint32_t flags)
: fFile(file), fPattern(std::string(rootDir)), fFlags(flags)
{
}

RIterator begin() const;
RIterator end() const;
};

/**
\class ROOT::Experimental::RFile
\ingroup RFile
Expand All @@ -37,16 +156,15 @@ ROOT::RLogChannel &RFileLog();
## When and why should you use RFile

RFile is a modern and minimalistic interface to ROOT files, both local and remote, that can be used instead of TFile
when the following conditions are met:
- you want a simple interface that makes it easy to do things right and hard to do things wrong;
- you only need basic Put/Get operations and don't need the more advanced TFile/TDirectory functionalities;
- you want more robustness and better error reporting for those operations;
- you want clearer ownership semantics expressed through the type system rather than having objects "automagically"
handled for you via implicit ownership of raw pointers.
when you only need basic Put/Get operations and don't need the more advanced TFile/TDirectory functionalities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly off-topic, but I would prefer seeing this text in RFile.cxx or in an associated .md in the future. The reasoning is that documentation updates shouldn't recompile the world.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is that documentation updates shouldn't recompile the world.

In a way that makes sense, but the counter-reasoning is that a user that only has access to headers (e.g. because they installed ROOT via the system package manager) should still be able to read the documentation from them. So I'm not very convinced about it, especially considering that documentation is not expected to change frequently in the long run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ROOT wasn't doing this in the past, but it's a good argument to start doing it now ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My absolutely completely personal view on this. if we pick random 100 users and ask them about our docs:

(I'm keeping out search engine/ROOT forum/LLM queries from this count)

So I don't believe the placement between headers or source files will make a difference for our users. On the other hand, if it makes a difference for us, it might still be worth understanding how to approach this best. That being said, we have never made a clear decision on this topic before.

Copy link
Contributor Author

@silverweed silverweed Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have always found documentation in the implementation a very weird choice and my personal preference is having it on the declaration.
If we never made a decision and we don't have a clear rule, my preference is to keep it as-is, unless there are strong reasons to do otherwise.

It provides:
- a simple interface that makes it easy to do things right and hard to do things wrong;
- more robustness and better error reporting for those operations;
- clearer ownership semantics expressed through the type system.

RFile doesn't try to cover the entirety of use cases covered by TFile/TDirectory/TDirectoryFile and is not
a 1:1 replacement for them. It is meant to simplify the most common use cases and make them easier to handle by
minimizing the amount of ROOT-specific quirks and conforming to more standard C++ practices.
RFile doesn't cover the entirety of use cases covered by TFile/TDirectory/TDirectoryFile and is not
a 1:1 replacement for them. It is meant to simplify the most common use cases by following newer standard C++
practices.

## Ownership model

Expand All @@ -65,13 +183,11 @@ file).

## Directories

Differently from TFile, the RFile class itself is not also a "directory". In fact, there is no RDirectory class at all.

Directories are still an existing concept in RFile (since they are a concept in the ROOT binary format),
but they are usually interacted with indirectly, via the use of filesystem-like string-based paths. If you Put an object
in an RFile under the path "path/to/object", "object" will be stored under directory "to" which is in turn stored under
directory "path". This hierarchy is encoded in the ROOT file itself and it can provide some optimization and/or
conveniencies when querying objects.
Even though there is no equivalent of TDirectory in the RFile API, directories are still an existing concept in RFile
(since they are a concept in the ROOT binary format). However they are for now only interacted with indirectly, via the
use of filesystem-like string-based paths. If you Put an object in an RFile under the path "path/to/object", "object"
will be stored under directory "to" which is in turn stored under directory "path". This hierarchy is encoded in the
ROOT file itself and it can provide some optimization and/or conveniences when querying objects.

For the most part, it is convenient to think about RFile in terms of a key-value storage where string-based paths are
used to refer to arbitrary objects. However, given the hierarchical nature of ROOT files, certain filesystem-like
Expand All @@ -96,8 +212,11 @@ auto myObj = file->Get<TH1D>("h");
~~~
*/
class RFile final {
/// Flags used in PutInternal()
enum PutFlags {
/// When encountering an object at the specified path, overwrite it with the new one instead of erroring out.
kPutAllowOverwrite = 0x1,
/// When overwriting an object, preserve the existing one and create a new cycle, rather than removing it.
kPutOverwriteKeepCycle = 0x2,
};

Expand Down Expand Up @@ -126,6 +245,12 @@ class RFile final {
TKey *GetTKey(std::string_view path) const;

public:
enum EListKeyFlags {
kListObjects = 1 << 0,
kListDirs = 1 << 1,
kListRecursive = 1 << 2,
};

// This is arbitrary, but it's useful to avoid pathological cases
static constexpr int kMaxPathNesting = 1000;

Expand Down Expand Up @@ -196,6 +321,40 @@ public:

/// Flushes the RFile if needed and closes it, disallowing any further reading or writing.
void Close();

/// Returns an iterable over all keys of objects and/or directories written into this RFile starting at path
/// `basePath` (defaulting to include the content of all subdirectories).
/// By default, keys referring to directories are not returned: only those referring to leaf objects are.
/// If `basePath` is the path of a leaf object, only `basePath` itself will be returned.
/// `flags` is a bitmask specifying the listing mode.
/// If `(flags & kListObject) != 0`, the listing will include keys of non-directory objects (default);
/// If `(flags & kListDirs) != 0`, the listing will include keys of directory objects;
/// If `(flags & kListRecursive) != 0`, the listing will recurse on all subdirectories of `basePath` (default),
/// otherwise it will only list immediate children of `basePath`.
///
/// Example usage:
/// ~~~{.cpp}
/// for (RKeyInfo key : file->ListKeys()) {
/// /* iterate over all objects in the RFile */
/// cout << key.GetPath() << ";" << key.GetCycle() << " of type " << key.GetClassName() << "\n";
/// }
/// for (RKeyInfo key : file->ListKeys("", kListDirs|kListObjects|kListRecursive)) {
/// /* iterate over all objects and directories in the RFile */
/// }
/// for (RKeyInfo key : file->ListKeys("a/b", kListObjects)) {
/// /* iterate over all objects that are immediate children of directory "a/b" */
/// }
/// for (RKeyInfo key : file->ListKeys("foo", kListDirs|kListRecursive)) {
/// /* iterate over all directories under directory "foo", recursively */
/// }
/// ~~~
RFileKeyIterable ListKeys(std::string_view basePath = "", std::uint32_t flags = kListObjects | kListRecursive) const
{
return RFileKeyIterable(fFile.get(), basePath, flags);
}

/// Prints the internal structure of this RFile to the given stream.
void Print(std::ostream &out = std::cout) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to Print's API.

Historically we had Print(const char* opt ="") and ls(const char *opt = "") with the later usually printing a more compact version of the former.

We may want to consider having a consistent 'experience' for printing information about the object across the new APIs. (Yes, it is no longer imposed by TObject but I believe we should still be consistent).

  • Examples that could lead to the desire to customize the output for RFile:

How are we planning of supporting the user 'pre-placing' object hierarchy (currently supported via the shared ownership). At some point I think we consider one of the option being RFile having a collection of shared_pointer. Thus one might want to Print different things about the RFile.

Another reason for wanting customization would be to 'limit' the amount of information printed (for example to the first level).

  • Current RNTuple related interfaces (subset that has Print in the name)
tree/ntuple/inc/ROOT/RNTupleMetrics.hxx:   void Print(std::ostream &output, const std::string &prefix = "") const;
tree/ntuple/inc/ROOT/RNTupleReader.hxx:   void PrintInfo(const ENTupleInfo what = ENTupleInfo::kSummary, std::ostream &output = std::cout) const;
tree/ntuple/inc/ROOT/RNTupleProcessor.hxx:   virtual void PrintStructureImpl(std::ostream &output) const = 0;
tree/ntuple/inc/ROOT/RNTupleProcessor.hxx:   void PrintStructure(std::ostream &output = std::cout) { PrintStructureImpl(output); }
tree/ntuple/inc/ROOT/RFieldVisitor.hxx:   void PrintIndent();
tree/ntuple/inc/ROOT/RFieldVisitor.hxx:   void PrintName(const ROOT::RFieldBase &field);
tree/ntuple/inc/ROOT/RFieldVisitor.hxx:   void PrintCollection(const ROOT::RFieldBase &field);
tree/ntuple/inc/ROOT/RFieldVisitor.hxx:   void PrintRecord(const ROOT::RFieldBase &field);
tree/ntuple/inc/ROOT/RNTupleDescriptor.hxx:   void PrintInfo(std::ostream &output) const;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we will need a more customizable Print, but I think it would be the job of a future PR to define that...in this PR it's mostly there for quick debugging, not intended as a final interface.
We might also want to discuss about this interface uniformity issue that you bring up, as the convenience of a single ls() across "all" possible objects is undeniable from the user's perspective (mostly in the context of interactive use)

};

} // namespace Experimental
Expand Down
Loading
Loading