You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGES.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,30 @@
1
1
# Change Log
2
2
3
+
## Changes in version 0.0.26
4
+
5
+
### Fixes:
6
+
7
+
*[282](https://github.com/pymupdf/RAG/issues/282) - Content Duplication with the latest version
8
+
*[281](https://github.com/pymupdf/RAG/issues/281) - Latest version of pymupdf4llm.to_markdown returns empty text for some PDFs.
9
+
*[280](https://github.com/pymupdf/RAG/issues/280) - Cannot extract text when ignore_images=False, can extract otherwise.
10
+
*[278](https://github.com/pymupdf/RAG/issues/278) - Title words are fragmented
11
+
*[249](https://github.com/pymupdf/RAG/issues/249) - Title duplication problem in markdown format
12
+
*[202](https://github.com/pymupdf/RAG/issues/202) - BAD RECT ISSUE
13
+
14
+
### Other Changes:
15
+
16
+
* The table module in package PyMuPDF has been modified: Its method `to_markdown()` will now output markdown-styled cell text. Previously, table cells were extracted as plain text only.
17
+
18
+
* The class `TocHeaders` is now a top-level import and can now be directly used.
19
+
20
+
* Method `to_markdown` has a new parameter `detect_bg_color=True` which guesses the page's background color. If detection is successful, vectors having this fill color are ignored (default). Setting this to `False` will "fill" vectors to always be considered in vector graphics detection.
21
+
22
+
* Text written with a `Type 3` font will now always be considered. Previously, this text was always treated as invisible and was hence suppressed.
23
+
24
+
* The package now contains the license file GNU Affero GPL 3.0 to ease distribution (see LICENSE). It also clarifies that PyMuPDF4LLM is dual licensed under GNU AGPL 3.0 and individual commercial licenses.
25
+
26
+
* There is a new file `versions_file.py` which contains version information. This is used to ensure the presence of a minimum PyMuPDF version at import time.
0 commit comments