Is it possible to remove unused images when splitting pages? #1661
Replies: 1 comment
-
I was finally able to split the pages by range and delete unused images before exporting the split pages, this returns the pdf bytes for a range of pages in an original PDF with unused images stripped. Now, this works for me because I specifically know how I'm creating the original PDF using jsPDF and know their deterministic structure, it may not be a general solution. I figured that I needed to be able to get at the list of XObject images used in the document, a way to delete them, and a way to get at the raw command stream for the page, from which I could figure out which images are actually used in the split pages and delete the rest. splitPDF(originalPdf, range) takes in a PDFDocument and a {start:startpage, end:endpage} range and returns the bytes for the new split PDF document for saving or other processing. In my case, I just put them in a Blob and save the file (code not provided here).
|
Beta Was this translation helpful? Give feedback.
-
I am working on splitting a PDF document (PDF of music scores generated with a music transcription tool I've built) into individual page ranges, using a common pattern I've seen recommended for doing this sort of thing with pdf-lib:
Unfortunately, what I find in the split files that get written out is that all of the images referenced in the original PDF are present in the split PDF files, and I see entries for them in the context indirectObjects. The split files are essentially all the same size as the original complete PDF.
It looks like copyPages() doesn't filter out the unused images, it just copies the entire set of images referenced in the original PDF document you're copying from.
If I look at the actual operators using a PDF parser, I can see they only reference the images being used for the page range, but the resulting PDF files are all essentially the size of the original PDF file before the split.
I've seen a few posts about issues with file size using copyPages() to split the files, and I'm guessing this is the root cause.
Anyone have a workaround?
I'd far prefer to have this feature in my tool rather than recommending that users use Adobe Acrobat to both split and optimize the PDF files I export from my tool if they want to split them into individual page ranges.
Beta Was this translation helpful? Give feedback.
All reactions