-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[ntuple] fixes to subfield name handling #20629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[ntuple] fixes to subfield name handling #20629
Conversation
For fields that are constructed using existing item subfields, ensure that the subfields are called _0, _1, ... in the final type. This issue has been addressed for untyped collections only in commit bdce4b5 The issue is, however, present in many other field constructors, too.
In RFieldZero and RRecordField, check that the passed subfields have pairwise distinct names.
Test Results 22 files 22 suites 4d 5h 50m 55s ⏱️ Results for commit e8d96e7. |
| std::string expected{std::string("") + | ||
| "$ Field 1 : arrayasrvecfield (ROOT::VecOps::RVec<float>) $\n" + | ||
| "$ Field 1.1 : myfloat (float) $\n"}; | ||
| "$ Field 1.1 : _0 (float) $\n"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a degradation. The user used on line 121 the name myfloat which is now no longer represented in the schema ... is that the intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the binary format specification says that the item field has name _0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm .... when reading back how will the user properly retrieve myfloat vs myotherfloat if they are both turned into _0 and the name are completely forgotten?
i.e. Is the name really gone (likely a blocker of sort) or is it just an artefact of the printing?
| std::string expected{std::string("") + | ||
| "$ Field 1 : arrayasrvecfield (ROOT::VecOps::RVec<float>) $\n" + | ||
| "$ Field 1.1 : myfloat (float) $\n"}; | ||
| "$ Field 1.1 : _0 (float) $\n"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the binary format specification says that the item field has name _0
| /// A public version of the Attach method that allows piece-wise construction of the zero field. | ||
| /// Will throw on duplicate subfield names. | ||
| void Attach(std::unique_ptr<RFieldBase> child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, RFieldBase::Attach is non-virtual so "overriding" it in the base class only works if we statically know the type of the RFieldZero - is that the case?
|
|
||
| void ROOT::RFieldZero::Attach(std::unique_ptr<RFieldBase> child) | ||
| { | ||
| const std::string childName = child->GetFieldName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const std::string childName = child->GetFieldName(); | |
| const std::string &childName = child->GetFieldName(); |
to avoid the copy?
| if (fSubFieldNames.count(childName) > 0) | ||
| throw RException(R__FAIL("duplicate field name: " + childName)); | ||
| RFieldBase::Attach(std::move(child), ""); | ||
| fSubFieldNames.insert(childName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do this in one go:
| if (fSubFieldNames.count(childName) > 0) | |
| throw RException(R__FAIL("duplicate field name: " + childName)); | |
| RFieldBase::Attach(std::move(child), ""); | |
| fSubFieldNames.insert(childName); | |
| if (!fSubFieldNames.insert(childName).second) | |
| throw RException(R__FAIL("duplicate field name: " + childName)); | |
| RFieldBase::Attach(std::move(child), ""); |
| { | ||
| fTraits |= kTraitTrivialType; | ||
| fOffsets.reserve(itemFields.size()); | ||
| std::unordered_set<std::string> fieldNames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the local std::unordered_set, we probably only need std::string_view as the item type and avoid the string copies
| fOffsets.reserve(itemFields.size()); | ||
| std::unordered_set<std::string> fieldNames; | ||
| for (auto &item : itemFields) { | ||
| const auto itemName = item->GetFieldName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will copy the field name
| if (fieldNames.count(itemName) > 0) { | ||
| throw RException(R__FAIL("duplicate field name: " + itemName)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (fieldNames.count(itemName) > 0) { | |
| throw RException(R__FAIL("duplicate field name: " + itemName)); | |
| } | |
| if (!fieldNames.insert(itemName).second) { | |
| throw RException(R__FAIL("duplicate field name: " + itemName)); | |
| } |
The following two fixes to the treatment of subfield names: