Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3f2738f
adding initial dependencies
shanbady Dec 9, 2025
7ac79cd
add new settings
shanbady Dec 9, 2025
73ab82e
fixing ocr debug dir
shanbady Dec 9, 2025
0986a96
adjusting utils to use new converter
shanbady Dec 10, 2025
c5a79ca
setting use_ocr to true
shanbady Dec 10, 2025
7e46195
adding opendataloader converter
shanbady Dec 10, 2025
8600fba
refactored class
shanbady Dec 11, 2025
60ae0de
adding deps
shanbady Dec 11, 2025
24e80be
refactort data loader
shanbady Dec 12, 2025
a28ca6d
make output path and debug mode optional
shanbady Dec 13, 2025
3126eaa
fixing tests
shanbady Dec 13, 2025
fd88c6c
update pdf transcription method
shanbady Dec 13, 2025
287b680
adding tests
shanbady Dec 15, 2025
cb8ed3e
formating
shanbady Dec 15, 2025
1e0f45f
moving dep to separate line
shanbady Dec 15, 2025
5c469b2
switch to sample pdf
shanbady Dec 15, 2025
33d76b9
switch to sample pdf
shanbady Dec 15, 2025
870956d
fix tests
shanbady Dec 15, 2025
d745d48
Merge branch 'main' into shanbady/opendataloader-pdf-converter
shanbady Dec 15, 2025
ed0452d
switching default to 10
shanbady Dec 15, 2025
36f1eef
reformat page
shanbady Dec 17, 2025
cc85e05
removed short block detection
shanbady Dec 17, 2025
63c12b7
consolidate math regexes
shanbady Dec 17, 2025
605695c
docstrings
shanbady Dec 17, 2025
157cc79
move math threshold to settings
shanbady Dec 17, 2025
8c63cc4
Merge branch 'main' into shanbady/opendataloader-pdf-converter
shanbady Dec 17, 2025
e03b66c
fix page block id
shanbady Dec 19, 2025
7f9ba3b
tweak prompt
shanbady Dec 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ RUN apt-get update && \
apt-get install -y --no-install-recommends $(grep -vE "^\s*#" apt.txt | tr "\n" " ") && \
apt-get install libpq-dev postgresql-client -y --no-install-recommends && \
apt-get install poppler-utils -y && \
apt-get install default-jre -y && \
apt-get clean && \
apt-get purge && \
rm -rf /var/lib/apt/lists/*
Expand Down
Empty file.
Loading
Loading