Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing text #462

Closed
wants to merge 11 commits into from
Closed

Missing text #462

wants to merge 11 commits into from

Conversation

sftse
Copy link
Contributor

@sftse sftse commented Aug 29, 2024

No description provided.

@sftse
Copy link
Contributor Author

sftse commented Aug 29, 2024

Based on #457 #463

@sftse sftse mentioned this pull request Aug 29, 2024
@sftse
Copy link
Contributor Author

sftse commented Aug 29, 2024

For some reason this example exhibits a record type that I've not been able to locate in the spec, do you have any idea what it could be?
This was found by differentially testing .xls converted to .xlsx with the help of Libreoffice and extracting the text with calamine from both .xls and .xlsx.
The dataset are .xls from the Enron email dataset, dated around 2000.

When reading a PtgStr 2.5.198.89 the cch byte of ShortXLUnicodeString
indicates the number of characters in the string. The error here is twofold:

1. The byte buffer holding the string characters in prematurely truncated
before calling fn read_unicode_string_no_cch() based on cch, although
the correct length in bytes can only be known inside fn read_unicode..()
after checking the fHighByte flag. The fix is to not truncate the buffer at all
pass it in its entirety so that fn read_unicode..() may decide how many
bytes to read.

2. The second error then advances the offset into the buffer based on this
erroneous length, which later leads to crashes.
@sftse sftse changed the title Panics Missing text Aug 30, 2024
@sftse sftse marked this pull request as draft October 7, 2024 08:36
@sftse
Copy link
Contributor Author

sftse commented Oct 25, 2024

Folded into #463

@sftse sftse closed this Oct 25, 2024
@sftse sftse deleted the panics branch October 27, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants