Skip to content

Conversation

@Freedom9339
Copy link
Contributor

Summary

NIFI-14615: Added a parquet content viewer to the nifi-parquet-bundle. The viewer is only visible if built with the include-hadoop profile. The viewer will format the parquet file content into a fixed width table.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @Freedom9339.

I understand that this uses Spring Boot following the convention of the Standard Content Viewer. However, after evaluating that implementation, this can be implemented using the standard Servlet API. See the following pull request:

#10012

The dependency on the Parquet Processors is a non-starter for the Content Viewer itself.

If you can start by reworking these things, this could be considered for initial review.

If you are uncertain about these elements, it would be better to defer this.

@Freedom9339
Copy link
Contributor Author

@exceptionfactory Thank you for the feedback. I've removed the dependency on nifi-parquet-processors and removed the use of springboot. Thank you.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Freedom9339.

The LICENSE and NOTICE files need some work as they appear to be copied from the standard viewer and do not match the actual content of the NAR.

As for as the rendering itself, I'm not sure whether fixed-length table is the best approach, given the Parquet files may contain complex types. One option to consider could be writing the GenericRecord as JSON, which could simplify the implementation. What do you think?

@Freedom9339
Copy link
Contributor Author

Freedom9339 commented Jul 22, 2025

Thanks for the updates @Freedom9339.

The LICENSE and NOTICE files need some work as they appear to be copied from the standard viewer and do not match the actual content of the NAR.

As for as the rendering itself, I'm not sure whether fixed-length table is the best approach, given the Parquet files may contain complex types. One option to consider could be writing the GenericRecord as JSON, which could simplify the implementation. What do you think?

@exceptionfactory Thank you for the review.

I've made changes to the LICENSE and NOTICE files to match the parquet CV bundle.

I changed the fixed length table view to a JSON view.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Freedom9339, this looks like it is moving in a good direction. I highlighted several additional areas to adjust.

@exceptionfactory
Copy link
Contributor

Thanks for the updates, the logs should be visible from the Actions view. With the recent release of NiFi 2.5.0, the version in the main branch is now 2.6.0-SNAPSHOT, so if you can set that version in the new modules, that should work.

@Freedom9339
Copy link
Contributor Author

Thanks for the updates, the logs should be visible from the Actions view. With the recent release of NiFi 2.5.0, the version in the main branch is now 2.6.0-SNAPSHOT, so if you can set that version in the new modules, that should work.

Thank you, I had tried doing that but it wasn't working for me. In any case, I've updated the release version. Thank You!

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking closer to completion. I noted a few more things regarding packaging and optional query parameters that appear unnecessary.

See the License for the specific language governing permissions and
limitations under the License.

APACHE NIFI SUBCOMPONENTS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These additional licenses mention old versions, and some of the libraries do not appear to be referenced, so this needs to be reviewed and updated to match what actually ends up in the NAR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, this is the 1st time i deal with licenses. I've added additional libraries and reviewed the versions. Please let me know if I missed something.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Freedom9339, there are a few more issues to resolve with writing JSON, but the overall structure looks like it is getting closer to completion.

response.getOutputStream().write(",\n".getBytes());
}

objectMapper.writerWithDefaultPrettyPrinter().writeValue(response.getOutputStream(), objectMapper.readTree(record.toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ObjectWriter instance from writerWithDefaultPrettyPrinter() should be declared once before the while loop and reused. The reference to the response OutputStream should also be declared in a try-with-resources block.

The readTree() call should not be needed, the record should be passed to writeValue()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried removing readTree(), but it does not format the record appropriately without it.
"{\"PassengerId\": 1, \"Survived\": 0,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a snippet of how that looks? I would expect ObjectWriter.writeValue(outputStream, record) to write the record as JSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

This is a sample file I'm using with fake data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I meant sharing a snippet of the code that is not working as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to use the type provided by the parquet schema to determine if the value is a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Freedom9339, I believe more substantive changes are required. This approach only works for scalar values, and would not work for complex types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, The best approach I can think of is how I had it implemented in an earlier iteration where we get the JSON string from the parquet reader and use Jackson to prettify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round trip required for parsing and serializing JSON for each record is less then optimal in that scenario. It sounds like using the lower-level Jackson writer, then iterating through and writing each field name, along with each value, could be a way forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I haven't had a chance to work on this until now. I've added support for complex types as well.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for keeping this moving @Freedom9339, it looks close to completion. Can you rebase again and update all version references to 2.7.0-SNAPSHOT?

@Freedom9339
Copy link
Contributor Author

@exceptionfactory I've rebased and made the suggested changes. Though for some reason I don't see the commit in the PR. I see it in the branch in my personal repo from which I submitted the PR from.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for rebasing @Freedom9339. On final review, I noticed a number of duplicative dependencies included in the Content Viewer WAR, many of which come from the parent nifi-hadoop-libraries-nar. After verifying successful processing, I pushed a commit with those changes, along with some minor adjustments to the servlet class for writing records.

I plan to merge soon following successful builds.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for sticking with this @Freedom9339! +1 merging

@exceptionfactory exceptionfactory merged commit 4911447 into apache:main Oct 11, 2025
7 checks passed
awelless pushed a commit to awelless/nifi that referenced this pull request Nov 5, 2025
- Moved shared classes to `nifi-parquet-shared`
- Reduced size of `nifi-parquet-nar` with provided scope for Hadoop libraries

Co-authored-by: David Handermann <[email protected]>
Signed-off-by: David Handermann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants