-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OrcSchemaConversionValidator to avoid infinite recursion in AvroOrcSchemaConverter.getOrcSchema() #3794
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3794 +/- ##
============================================
+ Coverage 47.32% 47.39% +0.06%
- Complexity 10955 10989 +34
============================================
Files 2152 2156 +4
Lines 85114 85179 +65
Branches 9451 9469 +18
============================================
+ Hits 40279 40368 +89
+ Misses 41184 41154 -30
- Partials 3651 3657 +6
... and 19 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
adbc308
to
e0bd51d
Compare
…rcSchemaConverter.getOrcSchema()
19bae2d
to
55b8951
Compare
// Try converting the avro schema to orc schema to check if any errors. | ||
int maxRecursiveDepth = this.state.getPropAsInt(MAX_RECURSIVE_DEPTH_KEY, DEFAULT_MAX_RECURSIVE_DEPTH); | ||
AvroOrcSchemaConverter.tryGetOrcSchema(schema, 0, maxRecursiveDepth); | ||
} catch (StackOverflowError e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you plan to catch the StackOverflowError anyway, why do we even need to introduce the depth here?
public boolean validate(KafkaTopic topic) throws Exception { | ||
LOGGER.debug("Validating ORC schema conversion for topic {}", topic.getName()); | ||
try { | ||
Schema schema = (Schema) this.schemaRegistry.getLatestSchema(topic.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema from schema registry is not the final schema, especially we do have converter to change the schema there and some times remove the recursive schema to enable ingestion, so I believe we should use converted schema to do validation here
int maxRecursiveDepth = this.state.getPropAsInt(MAX_RECURSIVE_DEPTH_KEY, DEFAULT_MAX_RECURSIVE_DEPTH); | ||
AvroOrcSchemaConverter.tryGetOrcSchema(schema, 0, maxRecursiveDepth); | ||
} catch (StackOverflowError e) { | ||
LOGGER.warn("Failed to covert latest schema to ORC schema for topic: {}", topic.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible to detect which filed is recursive and include in this error message? so it will be easier for us to find the problematic filed and re-enable the ingestion when necessary
AvroOrcSchemaConverter.tryGetOrcSchema(schema, 0, maxRecursiveDepth); | ||
} catch (StackOverflowError e) { | ||
LOGGER.warn("Failed to covert latest schema to ORC schema for topic: {}", topic.getName()); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should emit some event to indicate this kind of error to avoid silent failure here. Same for other validator.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
GobblinOrcWriter
instance, during which the recursive call forAvroOrcSchemaConverter.getOrcSchema()
never ends and causes StackOverflowError.OrcSchemaConversionValidator
which can skip the topic that can cause the above issue.Tests
Commits