Skip to content

PPTX parsing: bullet points not grouped correctly under subheadings #1324

@harskuma

Description

@harskuma

Requested feature

While working with PPTX files, I came across a formatting issue that could use some enhancement. Specifically, when a slide contains multiple subheadings, each with their own bullet points, the parsed output doesn’t maintain the correct grouping of bullet points under their respective subheadings.
current version : Docling 2.28.4

For example, consider a slide like this:

Image

Currently, the extracted output looks like this:

Image

As shown in the attached screenshot, all bullet points are getting grouped under the first subheading, and the second subheading appears without its associated content.

Suggested Enhancement:

It would be helpful to enhance the PPTX parsing logic to:

  • Maintain bullet point association with the correct subheading
  • Possibly use text box position, text style, or slide structure hierarchy to infer grouping

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpptxissue related to pptx backend

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions