-
Notifications
You must be signed in to change notification settings - Fork 2.9k
NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle. #10013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @Freedom9339.
I understand that this uses Spring Boot following the convention of the Standard Content Viewer. However, after evaluating that implementation, this can be implemented using the standard Servlet API. See the following pull request:
The dependency on the Parquet Processors is a non-starter for the Content Viewer itself.
If you can start by reworking these things, this could be considered for initial review.
If you are uncertain about these elements, it would be better to defer this.
|
@exceptionfactory Thank you for the feedback. I've removed the dependency on nifi-parquet-processors and removed the use of springboot. Thank you. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @Freedom9339.
The LICENSE and NOTICE files need some work as they appear to be copied from the standard viewer and do not match the actual content of the NAR.
As for as the rendering itself, I'm not sure whether fixed-length table is the best approach, given the Parquet files may contain complex types. One option to consider could be writing the GenericRecord as JSON, which could simplify the implementation. What do you think?
nifi-extension-bundles/nifi-parquet-bundle/nifi-parquet-content-viewer/pom.xml
Outdated
Show resolved
Hide resolved
nifi-extension-bundles/nifi-parquet-bundle/nifi-parquet-content-viewer/pom.xml
Outdated
Show resolved
Hide resolved
...nifi-parquet-bundle/nifi-parquet-content-viewer/src/main/webapp/META-INF/nifi-content-viewer
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
@exceptionfactory Thank you for the review. I've made changes to the LICENSE and NOTICE files to match the parquet CV bundle. I changed the fixed length table view to a JSON view. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @Freedom9339, this looks like it is moving in a good direction. I highlighted several additional areas to adjust.
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...wer/src/main/java/org/apache/nifi/parquet/web/controller/ParquetContentViewerController.java
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
|
Thanks for the updates, the logs should be visible from the Actions view. With the recent release of NiFi 2.5.0, the version in the |
Thank you, I had tried doing that but it wasn't working for me. In any case, I've updated the release version. Thank You! |
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking closer to completion. I noted a few more things regarding packaging and optional query parameters that appear unnecessary.
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
|
|
||
| APACHE NIFI SUBCOMPONENTS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These additional licenses mention old versions, and some of the libraries do not appear to be referenced, so this needs to be reviewed and updated to match what actually ends up in the NAR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, this is the 1st time i deal with licenses. I've added additional libraries and reviewed the versions. Please let me know if I missed something.
nifi-extension-bundles/nifi-parquet-bundle/nifi-parquet-content-viewer-nar/pom.xml
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @Freedom9339, there are a few more issues to resolve with writing JSON, but the overall structure looks like it is getting closer to completion.
nifi-extension-bundles/nifi-parquet-bundle/nifi-parquet-shared/pom.xml
Outdated
Show resolved
Hide resolved
| response.getOutputStream().write(",\n".getBytes()); | ||
| } | ||
|
|
||
| objectMapper.writerWithDefaultPrettyPrinter().writeValue(response.getOutputStream(), objectMapper.readTree(record.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ObjectWriter instance from writerWithDefaultPrettyPrinter() should be declared once before the while loop and reused. The reference to the response OutputStream should also be declared in a try-with-resources block.
The readTree() call should not be needed, the record should be passed to writeValue()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried removing readTree(), but it does not format the record appropriately without it.
"{\"PassengerId\": 1, \"Survived\": 0,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a snippet of how that looks? I would expect ObjectWriter.writeValue(outputStream, record) to write the record as JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I meant sharing a snippet of the code that is not working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to use the type provided by the parquet schema to determine if the value is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Freedom9339, I believe more substantive changes are required. This approach only works for scalar values, and would not work for complex types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, The best approach I can think of is how I had it implemented in an earlier iteration where we get the JSON string from the parquet reader and use Jackson to prettify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The round trip required for parsing and serializing JSON for each record is less then optimal in that scenario. It sounds like using the lower-level Jackson writer, then iterating through and writing each field name, along with each value, could be a way forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I haven't had a chance to work on this until now. I've added support for complex types as well.
91f0171 to
b28554b
Compare
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for keeping this moving @Freedom9339, it looks close to completion. Can you rebase again and update all version references to 2.7.0-SNAPSHOT?
...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java
Outdated
Show resolved
Hide resolved
…o file size being loaded. Changed output from table view to json.
…ved unnecessary code.
|
@exceptionfactory I've rebased and made the suggested changes. Though for some reason I don't see the commit in the PR. I see it in the branch in my personal repo from which I submitted the PR from. |
b28554b to
9c4258b
Compare
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for rebasing @Freedom9339. On final review, I noticed a number of duplicative dependencies included in the Content Viewer WAR, many of which come from the parent nifi-hadoop-libraries-nar. After verifying successful processing, I pushed a commit with those changes, along with some minor adjustments to the servlet class for writing records.
I plan to merge soon following successful builds.
exceptionfactory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for sticking with this @Freedom9339! +1 merging
- Moved shared classes to `nifi-parquet-shared` - Reduced size of `nifi-parquet-nar` with provided scope for Hadoop libraries Co-authored-by: David Handermann <[email protected]> Signed-off-by: David Handermann <[email protected]>

Summary
NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle. The viewer is only visible if built with the include-hadoop profile. The viewer will format the parquet file content into a fixed width table.
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation