-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[ntuple] fix schema evolution with streamer fields #18451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[ntuple] fix schema evolution with streamer fields #18451
Conversation
What do you mean by "fix de-s11n of streamer info records " ? |
diagRAII.requiredDiag(kError, "TBufferFile::ReadVersion", "Could not find the StreamerInfo with a checksum of", | ||
false /* matchFullMessage */); | ||
diagRAII.requiredDiag(kError, "TBufferFile::CheckByteCount", "object of class TemperatureKelvin read too few bytes", | ||
false /* matchFullMessage */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand those error message. What is the case being tested here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I write a TemperatureCelsius
and then try to read it as TemperatureKelvin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm ... why isn't the error message closer to "Can not convert (or read) a TemperatureCelsius
(in)to a TemperatureKelvin
" ? Without the corresponding checks the failure could be 'serious'. If I understood correctly, the user could have a class A
with version 2
which contains a long
then request the reading a class B
with also version 2
which contains a pointer the the I/O would succeed but not apply any check nor conversion and thus have a random number stored in the pointer. Did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a proper exception is part of the TODO
written above. I don't understand your example, how is this different from the case tested here that even has compatible class layouts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably mis-reading the actual test but ...
"TBufferFile::ReadVersion", "Could not find the StreamerInfo with a checksum of",
This message should appear only if there is something wrong with the file and the StreamerInfo record has not been properly stored. Seeing this indicates either a file corruption or a serious deficiency in ROOT (or maybe the user's code)
"TBufferFile::CheckByteCount", "object of class TemperatureKelvin read too few bytes",
This message should only appear if there is an unexpected impedance mismatch between the StreamerInfo and the data onfile. Seeing this indicates either a file corruption or a serious deficiency in ROOT (or maybe the user's code).
As far as my examples is concern, let me try to clarify it. With:
class A {
int value;
};
class B {
int value;
};
class C {
int value;
};
class D {
long value;
};
i.e. 3 distinct classes with the exact same layout one with a slight difference (And because they have a different name, 4 different CheckSums).
and with a dictionary for all 4 and the following rules:
#pragma read sourceClass="A" targetClass="C";
#pragma read sourceClass="A" targetClass="D" checksum="[correct_value_for_A]" source="int value" \
target="value" code="{ value = onfile.value*100; } "
you have the following consequences:
Reading From / Onfile | Reading Into / In Memory | |
---|---|---|
A | A | allowed |
A | B | NOT allowed, the user did not say those 2 types were equivalent |
A | C | allowed - need a trivial conversion StreamerInfo |
A | D | allowed (see note (1)) - need a non-trivial conversion StreamerInfo |
In the not allowed cases, we should get an error message akin to Can not read a "A" into a "B"
Note (1), since the user specified a checksum, if the file also contains instance of A with a different schema (eg. class A { long value; }
from a different release of the software), the conversion is not allowed. In this particular example, the user may or may not have wanted to multiply value
when reading those other As
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO(jblomer): this should fail with an exception when we connect the page source
humm ... Maybe the TODO
and I agree :) ... Assuming that is correct, I am strongly suggesting that it is actually 'urgent' to implement the TODO
as the current situation (if I understand correctly) will be very confusing to users (likely they will think that there is a file corruption (or bug in ROOT) rather than a bug in their code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess then we are not using the TClass
/ TBufferFile
optimally. Currently, when we read the data from a streamer field, we do
TBufferFile buffer(TBuffer::kRead, nbytes);
// Fill buffer.Buffer() with nbytes of data
fClass->Streamer(to, buffer);
Now, if the TClass
instance pointed to by fClass
is not compatible with the streamed data on disk, we get the error messages tested for in the test. How can I make TClass
report a better error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, when we read the data from a streamer field, we do
The check is meant to be done before hand. RNTuple/TTree know what is stored in the column/branch and what the user is passing and needs to check compatibility of the two. For TTree, this is done in Tree::CheckBranchAddressType
, in particular the lines:
ptrClass->GetSchemaRules()->HasRuleWithSourceClass( expectedClass->GetName() ) ) {
and (for collection)
inmemValueClass->GetSchemaRules()->HasRuleWithSourceClass(onfileValueClass->GetName() ) )
int fInt = 0; | ||
int fAdded = 137; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test the indirect collection of StreamerInfo, there should be here a pointer to another class (possibly initialized with an instance of a derived class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already test that with
root/tree/ntuple/test/rfield_streamer.cxx
Line 126 in 031f6bd
TEST(RField, StreamerPoly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the test linked will succeed whether or not the StreamerInfo
are stored ... since they are still in memory from the already loaded dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. like here to test the feature, one need 2 separate processes and 2 distinct schemas/versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I left the unit tests unchanged but I added this as an integration test.
Test Results 18 files 18 suites 4d 7h 0m 24s ⏱️ Results for commit 9851b52. ♻️ This comment has been updated with latest results. |
Two bugs are fixed in |
diagRAII.requiredDiag(kError, "TBufferFile::ReadVersion", "Could not find the StreamerInfo with a checksum of", | ||
false /* matchFullMessage */); | ||
diagRAII.requiredDiag(kError, "TBufferFile::CheckByteCount", "object of class TemperatureKelvin read too few bytes", | ||
false /* matchFullMessage */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a proper exception is part of the TODO
written above. I don't understand your example, how is this different from the case tested here that even has compatible class layouts?
Co-authored-by: Jonas Hahnfeld <[email protected]>
Register streamer infos of the extra type info when connecting streamer field to page source.
b901277
to
a7f514a
Compare
8c6aa8f
to
9851b52
Compare
No description provided.