Skip to content

Conversation

Copy link

Copilot AI commented Nov 19, 2025

Description

New CLI tool compfilecmp compares two compressed file parts by reading and comparing their block index arrays. Each compressed file contains an index at trailer.indexPos with cumulative expanded sizes per block. The tool sequentially compares these indexes and reports where they diverge.

Implementation

New tool: tools/compfilecmp/

  • Reads CompressedFileTrailer structure from file end (backward compatible with WinCompressedFileTrailer)
  • Extracts block index (array of offset_t at indexPos)
  • Compares indexes entry-by-entry, stops at first difference
  • Reports matching blocks, expanded size, and percentages

Output example:

File 1: 100 blocks, expanded size: 1048576, index position: 524288
File 2: 100 blocks, expanded size: 1048576, index position: 524288
First difference found at block 50:
  File 1 offset: 524288
  File 2 offset: 524300
Files match up to block 50 out of 100 blocks.

Matching expanded size: 524288 bytes
  Percentage of file 1: 50.00%
  Percentage of file 2: 50.00%

Structure alignment: All structures mirror system/jlib/jlzw.cpp exactly. Uses MemoryAttr for index buffers, Owned<> for file handles.

Type of change:

  • This change is a new feature (non-breaking change which adds functionality).

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Code validated against jlzw.cpp implementation. Memory safety verified via smart pointer usage (Owned<>, MemoryAttr). Comprehensive error handling for file I/O operations. Documentation includes test scenarios for identical files, partial matches, and different sizes.

Build and runtime testing requires full HPCC Platform environment with vcpkg dependencies.

Original prompt

Look at the sea compressed file code in system jlib look how a compressed file structure is constructed, at the end of the file format includes an offset to every compressed block. Write a new C++ program that given to physical file parts open both of them and starts to read these lists of offsets and compares them to each other, if they differ, then it stops if they're the if they're the same at advances and keeps comparing the result of the program should be to report how much of the file appears to be the same based on how far the comparison of the index offsets has reached


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Add C++ program to compare compressed file offsets Add compfilecmp tool for comparing compressed file block indexes Nov 19, 2025
Copilot AI requested a review from jakesmith November 19, 2025 08:48
Copilot finished work on behalf of jakesmith November 19, 2025 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants