NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle. #10013

Freedom9339 · 2025-06-12T16:32:31Z

Summary

NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle. The viewer is only visible if built with the include-hadoop profile. The viewer will format the parquet file content into a fixed width table.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using mvn clean install -P contrib-check
- JDK 21

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

exceptionfactory

Thanks for working on this @Freedom9339.

I understand that this uses Spring Boot following the convention of the Standard Content Viewer. However, after evaluating that implementation, this can be implemented using the standard Servlet API. See the following pull request:

#10012

The dependency on the Parquet Processors is a non-starter for the Content Viewer itself.

If you can start by reworking these things, this could be considered for initial review.

If you are uncertain about these elements, it would be better to defer this.

Freedom9339 · 2025-06-16T17:26:26Z

@exceptionfactory Thank you for the feedback. I've removed the dependency on nifi-parquet-processors and removed the use of springboot. Thank you.

exceptionfactory

Thanks for the updates @Freedom9339.

The LICENSE and NOTICE files need some work as they appear to be copied from the standard viewer and do not match the actual content of the NAR.

As for as the rendering itself, I'm not sure whether fixed-length table is the best approach, given the Parquet files may contain complex types. One option to consider could be writing the GenericRecord as JSON, which could simplify the implementation. What do you think?

nifi-extension-bundles/nifi-parquet-bundle/nifi-parquet-content-viewer/pom.xml

...nifi-parquet-bundle/nifi-parquet-content-viewer/src/main/webapp/META-INF/nifi-content-viewer

nifi-extension-bundles/nifi-parquet-bundle/pom.xml

...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java

Freedom9339 · 2025-07-22T15:11:47Z

Thanks for the updates @Freedom9339.

The LICENSE and NOTICE files need some work as they appear to be copied from the standard viewer and do not match the actual content of the NAR.

As for as the rendering itself, I'm not sure whether fixed-length table is the best approach, given the Parquet files may contain complex types. One option to consider could be writing the GenericRecord as JSON, which could simplify the implementation. What do you think?

@exceptionfactory Thank you for the review.

I've made changes to the LICENSE and NOTICE files to match the parquet CV bundle.

I changed the fixed length table view to a JSON view.

exceptionfactory

Thanks for the updates @Freedom9339, this looks like it is moving in a good direction. I highlighted several additional areas to adjust.

...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java

...wer/src/main/java/org/apache/nifi/parquet/web/controller/ParquetContentViewerController.java

...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java

exceptionfactory · 2025-07-24T15:43:00Z

Thanks for the updates, the logs should be visible from the Actions view. With the recent release of NiFi 2.5.0, the version in the main branch is now 2.6.0-SNAPSHOT, so if you can set that version in the new modules, that should work.

Freedom9339 · 2025-07-24T16:16:20Z

Thanks for the updates, the logs should be visible from the Actions view. With the recent release of NiFi 2.5.0, the version in the main branch is now 2.6.0-SNAPSHOT, so if you can set that version in the new modules, that should work.

Thank you, I had tried doing that but it wasn't working for me. In any case, I've updated the release version. Thank You!

exceptionfactory

This is looking closer to completion. I noted a few more things regarding packaging and optional query parameters that appear unnecessary.

nifi-assembly/pom.xml

exceptionfactory · 2025-07-25T18:38:32Z

...dles/nifi-parquet-bundle/nifi-parquet-content-viewer-nar/src/main/resources/META-INF/LICENSE

+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+APACHE NIFI SUBCOMPONENTS:


These additional licenses mention old versions, and some of the libraries do not appear to be referenced, so this needs to be reviewed and updated to match what actually ends up in the NAR.

I apologize, this is the 1st time i deal with licenses. I've added additional libraries and reviewed the versions. Please let me know if I missed something.

nifi-extension-bundles/nifi-parquet-bundle/nifi-parquet-content-viewer-nar/pom.xml

...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java

exceptionfactory

Thanks for the updates @Freedom9339, there are a few more issues to resolve with writing JSON, but the overall structure looks like it is getting closer to completion.

nifi-extension-bundles/nifi-parquet-bundle/nifi-parquet-shared/pom.xml

exceptionfactory · 2025-08-04T19:32:05Z

...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java

+                    response.getOutputStream().write(",\n".getBytes());
+                }
+
+                objectMapper.writerWithDefaultPrettyPrinter().writeValue(response.getOutputStream(), objectMapper.readTree(record.toString()));


The ObjectWriter instance from writerWithDefaultPrettyPrinter() should be declared once before the while loop and reused. The reference to the response OutputStream should also be declared in a try-with-resources block.

The readTree() call should not be needed, the record should be passed to writeValue()

I've tried removing readTree(), but it does not format the record appropriately without it.
"{\"PassengerId\": 1, \"Survived\": 0,

Can you provide a snippet of how that looks? I would expect ObjectWriter.writeValue(outputStream, record) to write the record as JSON.

This is a sample file I'm using with fake data.

Thanks, I meant sharing a snippet of the code that is not working as expected.

I changed it to use the type provided by the parquet schema to determine if the value is a string.

@Freedom9339, I believe more substantive changes are required. This approach only works for scalar values, and would not work for complex types.

I see, The best approach I can think of is how I had it implemented in an earlier iteration where we get the JSON string from the parquet reader and use Jackson to prettify it.

The round trip required for parsing and serializing JSON for each record is less then optimal in that scenario. It sounds like using the lower-level Jackson writer, then iterating through and writing each field name, along with each value, could be a way forward.

Sorry, I haven't had a chance to work on this until now. I've added support for complex types as well.

exceptionfactory

Thanks for keeping this moving @Freedom9339, it looks close to completion. Can you rebase again and update all version references to 2.7.0-SNAPSHOT?

...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java

…o file size being loaded. Changed output from table view to json.

…ved unnecessary code.

…mat records.

Freedom9339 · 2025-10-09T15:10:54Z

@exceptionfactory I've rebased and made the suggested changes. Though for some reason I don't see the commit in the PR. I see it in the branch in my personal repo from which I submitted the PR from.

exceptionfactory

Thanks for rebasing @Freedom9339. On final review, I noticed a number of duplicative dependencies included in the Content Viewer WAR, many of which come from the parent nifi-hadoop-libraries-nar. After verifying successful processing, I pushed a commit with those changes, along with some minor adjustments to the servlet class for writing records.

I plan to merge soon following successful builds.

exceptionfactory

Thanks again for sticking with this @Freedom9339! +1 merging

- Moved shared classes to `nifi-parquet-shared` - Reduced size of `nifi-parquet-nar` with provided scope for Hadoop libraries Co-authored-by: David Handermann <[email protected]> Signed-off-by: David Handermann <[email protected]>

exceptionfactory requested changes Jun 12, 2025

View reviewed changes

Freedom9339 force-pushed the NIFI-14615 branch from d41133f to 0f825ac Compare June 16, 2025 17:25

exceptionfactory requested changes Jul 9, 2025

View reviewed changes

Freedom9339 force-pushed the NIFI-14615 branch from 0f825ac to 1a7ffae Compare July 22, 2025 15:01

exceptionfactory requested changes Jul 22, 2025

View reviewed changes

Freedom9339 force-pushed the NIFI-14615 branch from 2fa6d86 to 68d5844 Compare July 24, 2025 16:15

exceptionfactory requested changes Jul 25, 2025

View reviewed changes

Freedom9339 force-pushed the NIFI-14615 branch from 5b14082 to 2b931e7 Compare July 31, 2025 14:55

exceptionfactory requested changes Aug 4, 2025

View reviewed changes

Freedom9339 force-pushed the NIFI-14615 branch from 91f0171 to b28554b Compare September 3, 2025 14:02

exceptionfactory requested changes Oct 8, 2025

View reviewed changes

...t-viewer/src/main/java/org/apache/parquet/web/controller/ParquetContentViewerController.java Outdated Show resolved Hide resolved

Freedom9339 added 12 commits October 9, 2025 13:29

NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle.

5d24b48

NIFI-14615 - removed processor and springboot dependency

05aa6ff

NIFI-14615 - Moved shared utils to nifi-parquet-shared. Added limit t…

e029d5c

…o file size being loaded. Changed output from table view to json.

NIFI-14615 - Refactored to use jackson and ByteArrayOutputStream.

aefaeb9

NIFI-14615 - Refactored to use prettyprint writer.

06a3a68

NIFI-14615 - Changed to 2.6.0-SNAPSHOT

547020d

NIFI-14615 - Missed one version update

a7d1186

NIFI-14615 - Moved parquet processors and cv under a single nar. Remo…

ae8bda2

…ved unnecessary code.

NIFI-14615 - Removed unnecessary properties, reverted to manually for…

c128364

…mat records.

NIFI-14615 - Only quote string values

eeeb629

NIFI-14615 - Added logic to handle complex data types.

1d26746

NIFI-14615 - Rebased.

9c4258b

Freedom9339 force-pushed the NIFI-14615 branch from b28554b to 9c4258b Compare October 9, 2025 15:26

NIFI-14615 Streamlined dependencies and record writing

04453ed

exceptionfactory reviewed Oct 11, 2025

View reviewed changes

NIFI-14615 Set properties for HttpComponent versions

a83f54c

exceptionfactory approved these changes Oct 11, 2025

View reviewed changes

exceptionfactory merged commit 4911447 into apache:main Oct 11, 2025
7 checks passed

NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle. #10013

NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle. #10013

Uh oh!

Conversation

Freedom9339 commented Jun 12, 2025

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

exceptionfactory left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Freedom9339 commented Jun 16, 2025

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Freedom9339 commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

exceptionfactory commented Jul 24, 2025

Uh oh!

Freedom9339 commented Jul 24, 2025

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

exceptionfactory left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

exceptionfactory left a comment •

edited

Loading

Freedom9339 commented Jul 22, 2025 •

edited

Loading

exceptionfactory left a comment •

edited

Loading