Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
d1bf4c4
Initial implementation for OCRmyPDF
ZiadAbdElFatah May 11, 2026
af44508
Addressed the review comments
ZiadAbdElFatah May 12, 2026
1170da2
Used pre-existing StreamGobbler for BufferedReader
ZiadAbdElFatah May 12, 2026
f8546b5
Reomved unused variables
ZiadAbdElFatah May 12, 2026
349cee5
Reformated the code to meet JabRef's code guidelines
ZiadAbdElFatah May 12, 2026
4173f5d
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 12, 2026
2cdb734
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 13, 2026
e6ba274
Added GUI to perform OCR
ZiadAbdElFatah May 13, 2026
45a9fbc
Added localized messages
ZiadAbdElFatah May 13, 2026
c08ab99
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 16, 2026
b536820
Merge branch 'JabRef:main' into feature-OCR
ZiadAbdElFatah May 16, 2026
0ea7950
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 18, 2026
9c06e02
Addressed the review comments
ZiadAbdElFatah May 18, 2026
8a92bfe
Reverted unintended change
ZiadAbdElFatah May 18, 2026
8a6e9be
Addressed the review comments
ZiadAbdElFatah May 18, 2026
e0311c6
Solved checkstyle failing check
ZiadAbdElFatah May 18, 2026
d8610e6
Added linking the new OCRed file to the used entry
ZiadAbdElFatah May 18, 2026
b3a58f6
Addressed the review commits
ZiadAbdElFatah May 18, 2026
dd79d9e
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 18, 2026
e0d19e8
Added localized string to JabRef_en.preperties
ZiadAbdElFatah May 18, 2026
796b5d4
Added check for OCRMYPDF availability
ZiadAbdElFatah May 18, 2026
1a07f5d
Added a comment for readability
ZiadAbdElFatah May 18, 2026
b3ea8d6
Reduced the time of checking the avaialabilty of OCRmyPDF
ZiadAbdElFatah May 18, 2026
9c318d0
Added --skip-text to handle partial searchable pdfs
ZiadAbdElFatah May 18, 2026
9c46aac
Update jabgui/src/main/java/org/jabref/gui/linkedfile/OcrLinkedFileAc…
ZiadAbdElFatah May 18, 2026
f3490ea
Apply suggestion from @InAnYan
ZiadAbdElFatah May 18, 2026
a9dfa18
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 18, 2026
f5ec242
Extracted the wait time in a single variable
ZiadAbdElFatah May 18, 2026
239542d
Apply suggestion from @InAnYan
InAnYan May 18, 2026
1793269
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 18, 2026
8acb6c5
Added the PDF file type for the new OCRed file
ZiadAbdElFatah May 18, 2026
afc4b1d
Added some missing strings to JabRef_en.propeties
ZiadAbdElFatah May 18, 2026
9d4bd44
Fix comment formatting for LinkedFile constructor
ZiadAbdElFatah May 18, 2026
72d33fa
Merge branch 'JabRef:main' into feature-OCR
ZiadAbdElFatah May 18, 2026
e843ab8
Addressed some openrewrite comments
ZiadAbdElFatah May 18, 2026
e6b51fe
Added a changelog entry
ZiadAbdElFatah May 18, 2026
11de63b
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 19, 2026
ce515cd
Link OCRed file to entries
ZiadAbdElFatah May 19, 2026
b548fb7
Improve OCRed file linking
ZiadAbdElFatah May 19, 2026
49c9b15
Remove unintended file
ZiadAbdElFatah May 19, 2026
4434905
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 22, 2026
a984da7
Address review comments
ZiadAbdElFatah May 22, 2026
7fd2621
Fix comment formatting in LinkedFile constructor
ZiadAbdElFatah May 22, 2026
c584be5
Fix comment formatting in LinkedFile constructor
ZiadAbdElFatah May 22, 2026
3cca2fb
Merge branch 'main' into feature-OCR
ZiadAbdElFatah May 22, 2026
199a8b4
Address review comment
ZiadAbdElFatah May 22, 2026
84a7f97
fix(ocr): fix failure message
InAnYan May 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
- We grouped the `jabkit` consistency and integrity checks under a new `check` command (`jabkit check consistency`, `jabkit check integrity`). [#15759](https://github.com/JabRef/jabref/pull/15759)
- The `jabkit check consistency` command now supports the `errorformat` output format (`file:line:column: message`), which is the default output format for both `check` subcommands. [#15759](https://github.com/JabRef/jabref/pull/15759)
- The `jabkit check` command now runs both the consistency and integrity checks when given an input file without a subcommand (e.g. `jabkit check references.bib`). [#15759](https://github.com/JabRef/jabref/pull/15759)
- We added OCR feature using OCRmyPDF to extract text from scanned PDFs and create searchable PDFs including the extracted text. [#15712](https://github.com/JabRef/jabref/pull/15712)

### Changed

Expand Down Expand Up @@ -2033,7 +2034,7 @@
- Add/move/remove from group: removed completely (functionality still available through group interface)
- We removed the option to change the column widths in the preferences dialog. [#4546](https://github.com/JabRef/jabref/issues/4546)

## Older versions

Check failure on line 2037 in CHANGELOG.md

View workflow job for this annotation

GitHub Actions / CHANGELOG.md

Missing ref link (all-h2-contain-a-version)

The changelog of JabRef 4.x is available at the [v4.3.1 tag](https://github.com/JabRef/jabref/blob/v4.3.1/CHANGELOG.md).
The changelog of JabRef 3.x is available at the [v3.8.2 tag](https://github.com/JabRef/jabref/blob/v3.8.2/CHANGELOG.md).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ public enum StandardActions implements Action {
EDIT_FILE_LINK(Localization.lang("Edit"), IconTheme.JabRefIcons.EDIT, KeyBinding.OPEN_CLOSE_ENTRY_EDITOR),
DOWNLOAD_FILE(Localization.lang("Download file(s)"), IconTheme.JabRefIcons.DOWNLOAD_FILE),
REDOWNLOAD_FILE(Localization.lang("Redownload file(s)"), IconTheme.JabRefIcons.DOWNLOAD_FILE),
PERFORM_OCR(Localization.lang("Perform OCR and embed text a new PDF file"), KeyBinding.PERFORM_OCR),

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change "Perform OCR" in the message shown to the user. I think most of the users don't know what OCR is.
I think "Make the PDF searchable" or something like this would be better.

RENAME_FILE_TO_PATTERN(Localization.lang("Rename file to defined pattern"), IconTheme.JabRefIcons.AUTO_RENAME),
RENAME_FILE_TO_NAME(Localization.lang("Rename file(s) to configured filename format pattern"), IconTheme.JabRefIcons.RENAME, KeyBinding.RENAME_FILE_TO_NAME),
MOVE_FILE_TO_FOLDER(Localization.lang("Move file(s) to directory"), IconTheme.JabRefIcons.MOVE_TO_FOLDER),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,16 @@ public LinkedFile getFile() {
return linkedFile;
}

public List<BibEntry> getLinkedEntries() {
List<BibEntry> entries = new ArrayList<>();
for (BibEntry entry : databaseContext.getEntries()) {
if (entry.hasFile(linkedFile)) {
entries.add(entry);
}
}
return entries;
}

public ValidationStatus fileExistsValidationStatus() {
return fileExistsValidator.getValidationStatus();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@

import org.jabref.gui.DialogService;
import org.jabref.gui.DragAndDropDataFormats;
import org.jabref.gui.StateManager;
import org.jabref.gui.actions.StandardActions;
import org.jabref.gui.autocompleter.SuggestionProvider;
import org.jabref.gui.copyfiles.CopyLinkedFilesAction;
Expand All @@ -56,6 +57,7 @@
import org.jabref.model.entry.BibEntryTypesManager;
import org.jabref.model.entry.LinkedFile;
import org.jabref.model.entry.field.Field;
import org.jabref.model.util.FileUpdateMonitor;

import com.airhacks.afterburner.views.ViewLoader;
import com.tobiasdiez.easybind.EasyBind;
Expand Down Expand Up @@ -88,6 +90,10 @@ public class LinkedFilesEditor extends HBox implements FieldEditorFX {
private TaskExecutor taskExecutor;
@Inject
private UndoManager undoManager;
@Inject
private FileUpdateMonitor fileUpdateMonitor;
@Inject
private StateManager stateManager;

private LinkedFilesEditorViewModel viewModel;

Expand Down Expand Up @@ -131,7 +137,11 @@ private void initialize() {
preferences,
databaseContext,
bibEntry,
viewModel
viewModel,
taskExecutor,
fileUpdateMonitor,
undoManager,
stateManager
);

new ViewModelListCellFactory<LinkedFileViewModel>()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,20 @@
import java.util.Objects;
import java.util.Optional;

import javax.swing.undo.UndoManager;

import javafx.collections.ObservableList;
import javafx.scene.control.ContextMenu;

import org.jabref.gui.DialogService;
import org.jabref.gui.StateManager;
import org.jabref.gui.fieldeditors.LinkedFileViewModel;
import org.jabref.gui.fieldeditors.LinkedFilesEditorViewModel;
import org.jabref.gui.preferences.GuiPreferences;
import org.jabref.logic.util.TaskExecutor;
import org.jabref.model.database.BibDatabaseContext;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.util.FileUpdateMonitor;

import com.tobiasdiez.easybind.optional.ObservableOptionalValue;
import org.jspecify.annotations.NonNull;
Expand All @@ -25,9 +30,13 @@ public ContextMenuFactory(@NonNull DialogService dialogService,
@NonNull GuiPreferences preferences,
@NonNull BibDatabaseContext databaseContext,
@NonNull ObservableOptionalValue<BibEntry> bibEntry,
@NonNull LinkedFilesEditorViewModel viewModel) {
@NonNull LinkedFilesEditorViewModel viewModel,
@NonNull TaskExecutor taskExecutor,
@NonNull FileUpdateMonitor fileUpdateMonitor,
@NonNull UndoManager undoManager,
@NonNull StateManager stateManager) {
this.menuBuilders = List.of(
new SingleSelectionMenuBuilder(dialogService, databaseContext, bibEntry, preferences, viewModel),
new SingleSelectionMenuBuilder(dialogService, databaseContext, bibEntry, preferences, viewModel, taskExecutor, fileUpdateMonitor, undoManager, stateManager),
new MultiSelectionMenuBuilder(dialogService, databaseContext, bibEntry, preferences, viewModel)
);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,25 @@
import java.util.ArrayList;
import java.util.List;

import javax.swing.undo.UndoManager;

import javafx.collections.ObservableList;
import javafx.scene.control.MenuItem;
import javafx.scene.control.SeparatorMenuItem;

import org.jabref.gui.DialogService;
import org.jabref.gui.StateManager;
import org.jabref.gui.actions.ActionFactory;
import org.jabref.gui.actions.StandardActions;
import org.jabref.gui.copyfiles.CopyLinkedFilesAction;
import org.jabref.gui.fieldeditors.LinkedFileViewModel;
import org.jabref.gui.fieldeditors.LinkedFilesEditorViewModel;
import org.jabref.gui.linkedfile.OcrLinkedFileAction;
import org.jabref.gui.preferences.GuiPreferences;
import org.jabref.logic.util.TaskExecutor;
import org.jabref.model.database.BibDatabaseContext;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.util.FileUpdateMonitor;

import com.tobiasdiez.easybind.optional.ObservableOptionalValue;
import org.jspecify.annotations.NonNull;
Expand All @@ -25,19 +31,31 @@ record SingleSelectionMenuBuilder(
BibDatabaseContext databaseContext,
ObservableOptionalValue<BibEntry> bibEntry,
GuiPreferences preferences,
LinkedFilesEditorViewModel viewModel
LinkedFilesEditorViewModel viewModel,
TaskExecutor taskExecutor,
FileUpdateMonitor fileUpdateMonitor,
UndoManager undoManager,
StateManager stateManager
) implements ContextMenuBuilder {

SingleSelectionMenuBuilder(@NonNull DialogService dialogService,
@NonNull BibDatabaseContext databaseContext,
@NonNull ObservableOptionalValue<BibEntry> bibEntry,
@NonNull GuiPreferences preferences,
@NonNull LinkedFilesEditorViewModel viewModel) {
@NonNull LinkedFilesEditorViewModel viewModel,
@NonNull TaskExecutor taskExecutor,
@NonNull FileUpdateMonitor fileUpdateMonitor,
@NonNull UndoManager undoManager,
@NonNull StateManager stateManager) {
this.dialogService = dialogService;
this.databaseContext = databaseContext;
this.bibEntry = bibEntry;
this.preferences = preferences;
this.viewModel = viewModel;
this.taskExecutor = taskExecutor;
this.fileUpdateMonitor = fileUpdateMonitor;
this.undoManager = undoManager;
this.stateManager = stateManager;
}

@Override
Expand Down Expand Up @@ -75,6 +93,10 @@ public List<MenuItem> buildMenu(@NonNull ObservableList<LinkedFileViewModel> sel
StandardActions.DOWNLOAD_FILE,
new ContextAction(StandardActions.DOWNLOAD_FILE, selectedLinkedFile, databaseContext, bibEntry, preferences, viewModel)));

items.add(factory.createMenuItem(
StandardActions.PERFORM_OCR,
new OcrLinkedFileAction(selectedLinkedFile.getFile(), selectedLinkedFile.getLinkedEntries(), databaseContext, dialogService, preferences, taskExecutor, fileUpdateMonitor, undoManager, stateManager)));

items.add(factory.createMenuItem(
StandardActions.RENAME_FILE_TO_PATTERN,
new ContextAction(StandardActions.RENAME_FILE_TO_PATTERN, selectedLinkedFile, databaseContext, bibEntry, preferences, viewModel)));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ public enum KeyBinding {
// We have to put Entry Editor Previous before, because otherwise the decrease font size is found first
ENTRY_EDITOR_PREVIOUS_PANEL_2("Entry editor, previous panel 2", Localization.lang("Entry editor, previous panel 2"), "shortcut+MINUS", KeyBindingCategory.VIEW),
DELETE_ENTRY("Delete entry", Localization.lang("Delete entry"), "DELETE", KeyBindingCategory.BIBTEX),
PERFORM_OCR("Perform OCR", Localization.lang("Perform OCR"), "", KeyBindingCategory.FILE),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have an appropriate shortcut. Pinging @Siedlerchr for suggestions.

DEFAULT_DIALOG_ACTION("Execute default action in dialog", Localization.lang("Execute default action in dialog"), "shortcut+ENTER", KeyBindingCategory.VIEW),
DOWNLOAD_FULL_TEXT("Download full text documents", Localization.lang("Download full text documents"), "alt+F7", KeyBindingCategory.QUALITY),
OPEN_CLOSE_ENTRY_EDITOR("Open / close entry editor", Localization.lang("Open / close entry editor"), "shortcut+E", KeyBindingCategory.VIEW),
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
package org.jabref.gui.linkedfile;

import java.nio.file.Path;
import java.util.List;
import java.util.Optional;

import javax.swing.undo.UndoManager;

import org.jabref.gui.DialogService;
import org.jabref.gui.StateManager;
import org.jabref.gui.actions.SimpleCommand;
import org.jabref.gui.externalfiles.ImportHandler;
import org.jabref.gui.preferences.GuiPreferences;
import org.jabref.logic.l10n.Localization;
import org.jabref.logic.ocr.OcrEngine;
import org.jabref.logic.ocr.OcrMyPdfEngine;
import org.jabref.logic.ocr.OcrResult;
import org.jabref.logic.util.BackgroundTask;
import org.jabref.logic.util.TaskExecutor;
import org.jabref.model.database.BibDatabaseContext;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.LinkedFile;
import org.jabref.model.util.FileUpdateMonitor;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class OcrLinkedFileAction extends SimpleCommand {
private static final Logger LOGGER = LoggerFactory.getLogger(OcrLinkedFileAction.class);

private final LinkedFile linkedFile;
private final BibDatabaseContext databaseContext;
private final DialogService dialogService;
private final GuiPreferences preferences;
private final TaskExecutor taskExecutor;
private final OcrEngine ocrEngine;
private final List<BibEntry> linkedEntries;
private final ImportHandler importHandler;

public OcrLinkedFileAction(LinkedFile linkedFile,
List<BibEntry> bibEntries,
Comment thread
InAnYan marked this conversation as resolved.
BibDatabaseContext databaseContext,
DialogService dialogService,
GuiPreferences preferences,
TaskExecutor taskExecutor,
FileUpdateMonitor fileUpdateMonitor,
UndoManager undoManager,
StateManager stateManager) {
this.linkedFile = linkedFile;
this.linkedEntries = bibEntries;
this.databaseContext = databaseContext;
this.dialogService = dialogService;
this.preferences = preferences;
this.taskExecutor = taskExecutor;
this.ocrEngine = new OcrMyPdfEngine();
this.importHandler = new ImportHandler(
databaseContext,
preferences,
fileUpdateMonitor,
undoManager,
stateManager,
dialogService,
taskExecutor
);
}

@Override
public void execute() {
Optional<Path> pdfPath = linkedFile.findIn(databaseContext, preferences.getFilePreferences());
if (pdfPath.isEmpty()) {
dialogService.showErrorDialogAndWait(Localization.lang("Could not find a file to OCR"));
return;
}
BackgroundTask<OcrResult> ocrTask = BackgroundTask.wrap(() -> ocrEngine.performOcrAndEmbedText(pdfPath.get()));
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.

ocrTask.titleProperty().set(Localization.lang("Performing OCR"));
ocrTask.showToUser(true);
ocrTask.onSuccess(result -> {
switch (result) {
case OcrResult.Success success -> {
dialogService.notify(Localization.lang("OCR succeeded"));
Path ocredPdf = success.outputFile();
for (BibEntry entry : linkedEntries) {
importHandler.getFileLinker().linkFilesToEntry(entry, List.of(ocredPdf));
}
}
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
case OcrResult.Failure failure -> {
String failureReason = getFailureResult(failure);
dialogService.showErrorDialogAndWait(Localization.lang("OCR failed"), failureReason);
}
}
});
ocrTask.onFailure(exception -> {
LOGGER.error("Unexpected error during OCR", exception);
dialogService.notify(Localization.lang("OCR failed. See the logs for the details"));
});
taskExecutor.execute(ocrTask);
}

String getFailureResult(OcrResult.Failure failure) {
return switch (failure.reason()) {
case NOT_AVAILABLE ->
Localization.lang("OCRmyPDF is not available");
case TIMEOUT ->
Localization.lang("OCR timed out");
case NON_ZERO_EXIT ->
Localization.lang("OCR process failed");
case IO_ERROR ->
Localization.lang("Could not start OCR process");
case INTERRUPTED ->
Localization.lang("OCR was cancelled");
};
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,26 @@
import java.util.Objects;
import java.util.Optional;

import javax.swing.undo.UndoManager;

import javafx.beans.property.SimpleStringProperty;
import javafx.collections.FXCollections;
import javafx.collections.ObservableList;
import javafx.scene.control.ContextMenu;
import javafx.scene.control.MenuItem;

import org.jabref.gui.DialogService;
import org.jabref.gui.StateManager;
import org.jabref.gui.fieldeditors.LinkedFileViewModel;
import org.jabref.gui.fieldeditors.LinkedFilesEditorViewModel;
import org.jabref.gui.preferences.GuiPreferences;
import org.jabref.logic.FilePreferences;
import org.jabref.logic.util.TaskExecutor;
import org.jabref.model.database.BibDatabaseContext;
import org.jabref.model.database.FileDirectories;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.LinkedFile;
import org.jabref.model.util.FileUpdateMonitor;

import com.tobiasdiez.easybind.optional.ObservableOptionalValue;
import org.junit.jupiter.api.BeforeEach;
Expand Down Expand Up @@ -64,7 +69,11 @@ public void setUp() {
guiPreferences,
bibDatabaseContext,
bibEntry,
mock(LinkedFilesEditorViewModel.class)
mock(LinkedFilesEditorViewModel.class),
mock(TaskExecutor.class),
mock(FileUpdateMonitor.class),
mock(UndoManager.class),
mock(StateManager.class)
);
}

Expand Down
1 change: 1 addition & 0 deletions jablib/src/main/java/module-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@
exports org.jabref.logic.git.merge.execution;
exports org.jabref.model.sciteTallies;
exports org.jabref.logic.bibtex.comparator.plausibility;
exports org.jabref.logic.ocr;

// region: AI
exports org.jabref.logic.ai;
Expand Down
25 changes: 25 additions & 0 deletions jablib/src/main/java/org/jabref/logic/ocr/OcrEngine.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
package org.jabref.logic.ocr;

import java.nio.file.Path;

/// Interface for OCR engines.
///
/// Any engine in the future can implement this interface.
public interface OcrEngine {

/// Performs OCR on the given input file and returns the result, whether the path of the OCRed with searchable text file or the error message.
///
/// @param pdfPath the file to perform OCR on.
/// @return the result of the OCR operation with the extracted text or an error message.
OcrResult performOcrAndEmbedText(Path pdfPath);

/// Checks if the OCR engine is available for use.
///
/// @return true if the engine is available, false otherwise.
boolean isAvailable();

/// Gets the name of the OCR engine.
///
/// @return the name of the OCR engine (e.g., "OCRmyPDF", "Tesseract").
String getName();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
package org.jabref.logic.ocr;

/// Enums for the reasons that can lead the OCR process to fail.
public enum OcrFailureReason {
NOT_AVAILABLE, TIMEOUT, NON_ZERO_EXIT, IO_ERROR, INTERRUPTED
}
Loading
Loading