Skip to content

Releases: Unstructured-IO/unstructured

0.18.26

05 Jan 21:41
ae0efca

Choose a tag to compare

0.18.26

Fixes

  • Pin deltalake<1.3.0 to fix ARM64 Docker builds (1.3.0 missing Linux ARM64 wheels)

0.18.25

Fixes

  • Security update: Removed pdfminer.six version constraint and bumped pdfminer.six and urllib3 to address high severity CVEs

0.18.24

30 Dec 17:54
7f2cb4c

Choose a tag to compare

Enhancement

  • Optimize OCRAgentTesseract.extract_word_from_hocr (codeflash)

Fixes

  • Security update: Bumped dependencies to address security vulnerabilities

0.18.22

10 Dec 17:56
afd9118

Choose a tag to compare

0.18.22

Enhancement

Features

Fixes

  • fix(deps): Bump fonttools to address cve by @CyMule in #4125

Full Changelog: 0.18.21...0.18.22

0.18.21

24 Nov 14:55
91a9888

Choose a tag to compare

0.18.21

Enhancement

  • Update save_elements unit test to check crop box padding behavior

Features

Fixes

  • Update unstructured-inference to 1.1.2 to address CVEs

0.18.20

15 Nov 00:14
7c4d0b9

Choose a tag to compare

0.18.20

Enhancement

  • Improve the VoyageAI integration
  • Add voyage-context-3 support
  • Flag extracted elements as such in the metadata for downstream use

Features

Fixes

0.18.18

07 Nov 01:05
b01d35b

Choose a tag to compare

0.18.18

Fixes

  • Prevent path traversal in email MSG attachment filenames Fixed a security vulnerability (GHSA-gm8q-m8mv-jj5m) where malicious attachment filenames containing path traversal sequences could write files outside the intended directory. The fix normalizes both Unix and Windows path separators before sanitizing filenames, preventing cross-platform path traversal attacks in partition_msg functions

0.18.17

Enhancement

Features

Fixes

0.18.16

Enhancement

  • Speed up function _assign_hash_ids by 34% (codeflash)

Features

Fixes

0.18.15

17 Sep 14:27
2d44d73

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.18.14...0.18.15

0.18.14

26 Aug 13:25
fed8942

Choose a tag to compare

0.18.14

Enhancements

  • Speed up function sentence_count by 59% (codeflash)

  • Speed up function check_for_nltk_package by 111% (codeflash)

  • Speed up function under_non_alpha_ratio by 76% (codeflash)

Features

Fixes

0.18.13

13 Aug 23:41
0d20f6a

Choose a tag to compare

0.18.13

Fixes

Parse a wider variety of date formats in email headers The partition_email function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents ValueError exceptions when partitioning emails with these date formats.

0.18.12

28 Jul 19:02
b8c14a7

Choose a tag to compare

What's Changed

  • Prevent large file content in encoding exceptions Replace UnicodeDecodeError with UnprocessableEntityError in encoding detection to avoid storing entire file content in exception objects, which can cause issues in logging and error reporting systems when processing large files.

Full Changelog: 0.18.11...0.18.12