Error referencing a non-existent page destination when writing PDF #2842

ssjkamei · 2024-09-13T07:19:14Z

Errors occur in the PDF writing process.

Environment

Which environment were you using when you encountered the problem?

> python -m platform
Windows-10-10.0.22631-SP0

> python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.3.1, crypt_provider=('cryptography', '41.0.5'), PIL=10.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfWriter, PdfReader

def test_write_pdf():
    filepath = r"C:\test.pdf"
    with open(filepath, "rb") as f:
        pdf_writer = PdfWriter()
        pdf_reader = PdfReader(f, True)
        print(pdf_reader.metadata)
        print(pdf_reader.named_destinations)
        pdf_writer.append(pdf_reader)

Sorry we are unable to provide the PDF.
We are in the process of confirming that we can create a PDF that can be published without any problems.

Traceback

This is the complete traceback I see:

venv\venv\Lib\site-packages\pypdf\_writer.py:2365: in append
    self.merge(
venv\venv\Lib\site-packages\pypdf\_writer.py:2474: in merge
    p = reader.pages[dest["/Page"]]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pypdf._page._VirtualList object at 0x0000025E7C033650>, index = 1

    def __getitem__(
        self, index: Union[int, slice]
    ) -> Union[PageObject, Sequence[PageObject]]:
        if isinstance(index, slice):
            indices = range(*index.indices(len(self)))
            cls = type(self)
            return cls(indices.__len__, lambda idx: self[indices[idx]])
        if not isinstance(index, int):
            raise TypeError("sequence indices must be integers")
        len_self = len(self)
        if index < 0:
            # support negative indexes
            index = len_self + index
        if index < 0 or index >= len_self:
>           raise IndexError("sequence index out of range")
E           IndexError: sequence index out of range

Perhaps the following are causing the problem. 46 0 obj and 20 0 obj do not exist.
I tried to fix them in Adobe Acrobat, but could not figure out how to turn them off.

30 0 obj
<</AcroForm 46 0 R/Dests 20 0 R/Extensions<</ADBE<</BaseVersion/1.7/ExtensionLevel 8>>>>/Metadata 5 0 R/Names 47 0 R/OCProperties<</D<</OFF[]/Order[]/RBGroups[]>>/OCGs[48 0 R 49 0 R 50 0 R]>>/Pages 18 0 R/StructTreeRoot 14 0 R/Type/Catalog>>
endobj

When read in PdfReader, the following will be generated in named_destinations. (pdf has only one page)

{'/__WKANCHOR_2': {'/Title': '/__WKANCHOR_2', '/Page': 0, '/Type': '/XYZ', '/Left': 36, '/Top': 754, '/Zoom': 0.0}, '/__WKANCHOR_4': {'/Title': '/__WKANCHOR_4', '/Page': 0, '/Type': '/XYZ', '/Left': 305, '/Top': 754, '/Zoom': 0.0}, '/__WKANCHOR_6': {'/Title': '/__WKANCHOR_6', '/Page': 0, '/Type': '/XYZ', '/Left': 36, '/Top': 454, '/Zoom': 0.0}, '/__WKANCHOR_8': {'/Title': '/__WKANCHOR_8', '/Page': 1, '/Type': '/XYZ', '/Left': 61, '/Top': 802, '/Zoom': 0.0}, '/__WKANCHOR_a': {'/Title': '/__WKANCHOR_a', '/Page': 1, '/Type': '/XYZ', '/Left': 36, '/Top': 425, '/Zoom': 0.0}, '/__WKANCHOR_c': {'/Title': '/__WKANCHOR_c', '/Page': 2, '/Type': '/XYZ', '/Left': 36, '/Top': 814, '/Zoom': 0.0}, '/__WKANCHOR_e': {'/Title': '/__WKANCHOR_e', '/Page': 2, '/Type': '/XYZ', '/Left': 36, '/Top': 703, '/Zoom': 0.0}}

I was able to avoid the error by adding if len(reader.pages) > dest[“/Page”]: on the PdfWriter side.

pypdf/pypdf/_writer.py

Lines 2471 to 2482 in 8f62120

    
           elif isinstance(dest["/Page"], int): 
        
               # the page reference is a page number normally not a PDF Reference 
        
               # page numbers as int are normally accepted only in external goto 
        
               p = reader.pages[dest["/Page"]] 
        
               assert p.indirect_reference is not None 
        
               try: 
        
                   arr[NumberObject(0)] = NumberObject( 
        
                       srcpages[p.indirect_reference.idnum].page_number 
        
                   ) 
        
                   self.add_named_destination_array(dest["/Title"], arr) 
        
               except KeyError: 
        
                   pass

            elif isinstance(dest["/Page"], int):
                # the page reference is a page number normally not a PDF Reference
                # page numbers as int are normally accepted only in external goto
                if len(reader.pages) > dest["/Page"]:
                    p = reader.pages[dest["/Page"]]
                    assert p.indirect_reference is not None
                    try:
                        arr[NumberObject(0)] = NumberObject(
                            srcpages[p.indirect_reference.idnum].page_number
                        )
                        self.add_named_destination_array(dest["/Title"], arr)
                    except KeyError:
                        pass

A Dests reference is created in the resulting PDF as follows.

15 0 obj
<<
/Dests 16 0 R
>>
endobj
16 0 obj
<<
/Names [ (\057\137\137WKANCHOR\1372) [ 0 /XYZ 36 754 0.0 ] (\057\137\137WKANCHOR\1374) [ 0 /XYZ 305 754 0.0 ] (\057\137\137WKANCHOR\1376) [ 0 /XYZ 36 454 0.0 ] ]
>>

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2024-09-13T07:33:17Z

I'm confused about your code : what do you intend to do? I would have passed the pdf stream or better the pdf file name as the argument at PdfWriter creation
Can you check if this works better?

ssjkamei · 2024-09-13T08:34:03Z

I could not delete the font data, but here is the relevant PDF.
test.pdf

I'm confused about your code : what do you intend to do? I would have passed the pdf stream or better the pdf file name as the argument at PdfWriter creation
Can you check if this works better?

It does not work well.
Originally, it was specified directly to PdfWriter. I followed the code and confirmed that it is converting to PdfReader internally.

The problem is probably the content of named_destinations, so an error occurs when writing out, but I think the root cause is in PdfReader, so I have made the code like this.

closes py-pdf#2842

stefan6419846 added PdfWriter The PdfWriter component is affected is-robustness-issue From a users perspective, this is about robustness needs-pdf The issue needs a PDF file to show the problem labels Sep 13, 2024

stefan6419846 removed needs-pdf The issue needs a PDF file to show the problem labels Sep 13, 2024

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Sep 18, 2024

ROB: merge documents with named destinations with invalid page

ef587dc

closes py-pdf#2842

pubpub-zz mentioned this issue Sep 18, 2024

ROB: merge documents with named destinations with invalid page #2857

Merged

stefan6419846 closed this as completed in #2857 Sep 20, 2024

stefan6419846 closed this as completed in 36e1245 Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error referencing a non-existent page destination when writing PDF #2842

Error referencing a non-existent page destination when writing PDF #2842

ssjkamei commented Sep 13, 2024

pubpub-zz commented Sep 13, 2024

ssjkamei commented Sep 13, 2024

Error referencing a non-existent page destination when writing PDF #2842

Error referencing a non-existent page destination when writing PDF #2842

Comments

ssjkamei commented Sep 13, 2024

Environment

Code + PDF

Traceback

pubpub-zz commented Sep 13, 2024

ssjkamei commented Sep 13, 2024