Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,9 @@ private static void generateOutputs(String inputPdfName, List<List<IObject>> con
imagesDirectory = config.getImageDir();
} else {
String fileName = Paths.get(inputPdfName).getFileName().toString();
String baseName = fileName.substring(0, fileName.length() - 4);
+ int dotIndex = fileName.lastIndexOf('.');
+ String rawBaseName = dotIndex > 0 ? fileName.substring(0, dotIndex) : fileName;
+ String baseName = rawBaseName.replace(" ", "_");
imagesDirectory = config.getOutputFolder() + File.separator + baseName + MarkdownSyntax.IMAGES_DIRECTORY_SUFFIX;
Comment on lines 327 to 331
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a regression test for spaced filenames (and multi-dot names).

Please add/update integration coverage to validate default image directory naming for inputs like "my paper.v1.pdf" (expect "my_paper.v1_images"), since current ImageDirIntegrationTest only covers 1901.03003.pdf.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@java/opendataloader-pdf-core/src/main/java/org/opendataloader/pdf/processors/DocumentProcessor.java`
around lines 327 - 331, Add an integration test in ImageDirIntegrationTest to
cover filenames with spaces and multiple dots (e.g., "my paper.v1.pdf") and
assert that DocumentProcessor constructs imagesDirectory using the sanitized
base name (DocumentProcessor: variable baseName/imagesDirectory) as
"my_paper.v1_images"; update or add tests to instantiate
DocumentProcessor.process (or the relevant method that sets imagesDirectory)
with such inputPdfName and verify the output folder path matches expected
MarkdownSyntax.IMAGES_DIRECTORY_SUFFIX behavior for both spaced and multi-dot
names.

}
StaticLayoutContainers.setImagesDirectory(imagesDirectory);
Expand Down