-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix updating zips whose entries have descriptors. #475
Fix updating zips whose entries have descriptors. #475
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This not entirely correct as the "real" DataDescriptorSize SHOULD be 12 bytes, and not 16.
The reason for setting it to 16 is due to the unofficial signature 0x08074b50 that MAY be prepended to the descriptors.
I think the real solution for this is to change the constant to 12 bytes and update all references to it and making sure they handle both descriptors with and without the signature.
I think that it currently supports reading both, but the updating code just presumes that the signature is there.
I'm not sure if some cases (the ones where entries are deleted and it moves data about during the update) need/want to be copying the descriptor - given that ZipFile knows what the size etc values are at that point, it could possibly drop the descriptors instead? (section 4.3.9.1 of the appnote suggests not including them when the output is seekable, though maybe thats about creation rather than updating?) Otherwise - check if there is a descriptor signature directly after the file data, and then decide how much to copy based on it being present or not? (or maybe add the signature if it isn't there already - section 4.3.9.4 of the appnote says it should be included)
think ZipInputStream might require the signature?
presumes it's there or not there? (if the subtract 4 calculation is so that it copies the amount of data from a descriptor with no signature i mean) |
I created a report from your test: Notice that there is only one local header, this is because the signature for the second file gets written to the second part of the DataDescriptor Uncompressed Size for the previous entry. Regarding #147, if you look at the descriptor uncompressed size in hex it's
One odd thing I noticed was that the zip64 extra data for the local header is |
Some more discoveries regarding descriptors:
|
Without reverting unrelated code this time hopefully
I added my own solution to your branch, it is a bit heavier, but it should also fix those random zip files that skips the descriptor signature. Feel free to revert the changes if you don't think it is necessary, the rest of the library seems to fully disregard that possibility: // In theory this may not be a descriptor according to PKZIP appnote.
// In practise its always there.
if (intValue != ZipConstants.DataDescriptorSignature)
{
throw new ZipException("Data descriptor signature not found");
} |
I'm not sure.
which you could take to mean that in descriptor mode, the regular values get set to 0, and then because they aren't 0xFFFFFFFF the real values won't be in the Zip64 extra data (which might not serve a purpose in this situation)? My comment about 147 was referring to the local header size fields being non-zero: So having the descriptor flag set, and having the local header values being non-zero, and then putting the sizes in the descriptor as well doesn't seem right from that appnote bit? |
Oh, right, yeah it should either clear the local header sizes (0 or -1) or clear the descriptor bit. My interpretation is that when Zip64 is used, the field should still be set to -1 to indicate that the sizes are in 64-bit mode, but that the zero values should be put in the Zip64 Extra Data fields. If you would set the local header sizes to 0 there would be no way to know the lengths of the descriptor fields, right? In the report generated above we can see that it does work for Zip64:
But that is expected since the values are the same for descriptor-aided and regular entries. |
ah, yes, it needs the extra data field to infer zip64 format, which is then used to determine what size the descriptor field would be. if it's possible to clear the descriptor bit and not have to copy the descriptor then that removes the need for working out how big it is etc, which should simplify things? I'll have a look at what that change might require. |
It does simplify it, but it also makes it a bit more complicated. The code that actually caused the corruption was calculating how many bytes to skip to seek to the next entry, so that calculation would still need to be performed, unless we rewrite all entries with non-descriptor versions when doing an update, but then it's less of an update to the archive and more of a total recreation of the archive. Consider the I/O and CPU required to add a 3 byte text file to a 4 GB archive. Just seeking to the end of the last entry, appending it and then writing the new central directory is pretty fast, but moving all the data could be really expensive. |
My thought was that it could perhaps be changed in the
The other arm @
I think that would cover removing the descriptors in cases where the data has to be moved about due to entries being deleted, but in your example of just adding an extra file to the end of an archive which is otherwise unchanged it would leave everything as it does now? There's also the separate case of |
Yeah, that should be fine. It seems like a larger fix, but if you're up to it... 😄 |
I'll have a look. |
Well, the timescale is roughly "when this is done", I think. I figured this would be a good candidate, but we can also skip it and release it with 1.3.1 later |
Ok, i'll try to have another look at it tommorow. |
perhaps best to stick with this one to start with, and think about the descriptor-removal change (#512) seperately? (512 does have a couple of extra unit tests that might be useful though, will have to remind myself which changes were where) |
My initial instinct was that it was safer to do this first. It seems more like fixing a bug than altering the behaviour. But then again, the changes #512 makes more sense than sticking to the descriptors as they are inferior (and we obviously can seek if we are updating a ZipFile (or creating an altered copy)). That's why I figured it was better to just release 1.3 and pick this up in the next version, which hopefully won't take too long to get to. |
- Unit tests for using ZipFile to update file entries that have descriptors - Fix size calculation in GetDescriptorSize - Handle optional descriptor signature when updating
refs #147, and my comments in there - I'm not sure what the -4's in the descriptor size calculation are for, and removing them does appear to fix some issues that occur when updating archives whose entries contain descriptors.
I added one test for adding an entry, and one test for deleting an entry, which could perhaps be done in one go.
The addition test goes through the code path containing the TODO at
SharpZipLib/src/ICSharpCode.SharpZipLib/Zip/ZipFile.cs
Line 2753 in 32920f9
I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.