Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a computed CRC32 value did not match the expected value - when CRC is in data_descriptor #141

Open
nipunn1313 opened this issue Jun 27, 2024 · 2 comments

Comments

@nipunn1313
Copy link
Contributor

Related to #122 but slightly different

I'm working with v0.0.17

Here's a zipfile that reproes
external_modules.zip

The zipfile was generated with https://www.npmjs.com/package/archiver - and seems to unzip fine with other tools.

Using the zipdetails tool, it shows this for the first couple entries.

00000 LOCAL HEADER #1       04034B50
00004 Extract Zip Spec      14 '2.0'
00005 Extract OS            00 'MS-DOS'
00006 General Purpose Flag  0008
      [Bits 1-2]            0 'Normal Compression'
      [Bit  3]              1 'Streamed'
00008 Compression Method    0008 'Deflated'
0000A Last Mod Time         58DB8572 'Thu Jun 27 16:43:36 2024'
0000E CRC                   00000000
00012 Compressed Length     00000000
00016 Uncompressed Length   00000000
0001A Filename Length       001F
0001C Extra Length          0000
0001E Filename              'node_modules/.package-lock.json'
0003D PAYLOAD

008D2 STREAMING DATA HEADER 08074B50
008D6 CRC                   528AD14A
008DA Compressed Length     00000895
008DE Uncompressed Length   0000105F

It seems that the data_descriptor includes the CRC.
I did a bit of digging in my repro and found that async-zip is using the 00000000 CRC for comparison rather than the one in the data_descriptor.

https://github.com/Majored/rs-async-zip/blob/main/SPECIFICATION.md#439
Spec seems to indicate that if header.flags.data_descriptor is set, then crc/lengths should be taken from data descriptor rather than local file header.

@nipunn1313
Copy link
Contributor Author

Like @ldanilek mentioned in #122 - Things parsed in 0.0.9 and now no longer work.

there is a workaround which is to manually bypass crc check.

entry_reader.reader_mut().read_to_end(&mut contents).await?;

instead of

            entry_reader
                .reader_mut()
                .read_to_end_checked(&mut contents)
                .await?;

Don't love bypassing the CRC check though for obvious reasons.

@nipunn1313
Copy link
Contributor Author

It looks like 0.0.9 had code like this

    /// Returns true if the computed CRC32 value of all bytes read so far matches the expected value.
    pub fn compare_crc(&mut self) -> bool {
        let hasher = std::mem::take(&mut self.hasher);
        let final_crc = hasher.finalize();

        if self.meta.general_purpose_flag.data_descriptor {
            self.data_descriptor.expect("Data descriptor was not read").0 == final_crc
        } else {
            self.entry.crc32() == final_crc
        }
    }

which on 0.0.17 no longer seems to work.

https://github.com/Majored/rs-async-zip/blob/main/src/base/read/stream.rs#L177

It seems that the CRC is not read out of the data descriptor.

Then, I think you'd want the CRC check to happen in done() - after parsing the data descriptor - rather than in read_to_end_checked, at least for these data descriptor cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant