JabKit: Add file output option to cititation commands (and refactoring)#15913
JabKit: Add file output option to cititation commands (and refactoring)#15913JohnnyGoodNews wants to merge 45 commits into
Conversation
… streams from CommandLine
|
Hey @JohnnyGoodNews! 👋 Thank you for contributing to JabRef! We have automated checks in place, based on which you will soon get feedback if any of them are failing. We also use Qodo for review assistance. It will update your pull request description with a review help and offer suggestions to improve the pull request. After all automated checks pass, a maintainer will also review your contribution. Once that happens, you can go through their comments in the "Files changed" tab and act on them, or reply to the conversation if you have further inputs. You can read about the whole pull request process in our contribution guide. Please ensure that your pull request is in line with our AI Usage Policy and make necessary disclosures. |
| @@ -0,0 +1,39 @@ | |||
| --- | |||
| nav_order: 0061 | |||
There was a problem hiding this comment.
| nav_order: 0061 | |
| nav_order: 0062 |
Because probably #15907 will be merged first.
(Please also change filename)
There was a problem hiding this comment.
Thanks, good catch. I didn't see that PR.
koppor
left a comment
There was a problem hiding this comment.
I reviewed with RefactoringMiner. Looks good. Just small comments.
Maybe use @Nullmarked at CliExceptionHandler and other (short) classes instead of one @NonNull.
docker run --pull always --rm -p 6789:6789 -v c:\users\koppor\rmc:/diff tsantalis/refactoringminer diff --url
https://github.com/JabRef/jabref/pull/15913
|
|
||
| Commands rarely catch exceptions or check for valid return values. Exceptions are thrown as `CliException` and subclasses (or `JabRefException`) from the service and helper classes but not handled in the command. | ||
|
|
||
| ## Pros and Cons of the Options |
There was a problem hiding this comment.
For a short ADR, you can just use "Decision Outcome" (and maybe the Consequences) to "discuss"
There was a problem hiding this comment.
It was redundant too - I took out the pro/con of the alternatives.. I left the Confirmation in though, because that was helpful for my review agent.
| return Math.max(consistencyExit, integrityExit); | ||
| } | ||
|
|
||
| private int tryIntegrityCheck() { |
There was a problem hiding this comment.
I would not prefix with "try" - why not just integrityCheck (or runIntegrityCheck)?
There was a problem hiding this comment.
The idea is to handle the exceptions and continue processing. Maybe handleIntegrityCheck or runAndContinueIntegrityCheck? (If there were 3+ subcommands I would probably use something like runAndHandle(ThrowingIntFunction subcommand) - but that seemed overly complex)
There was a problem hiding this comment.
checkIntegrity?
(try-catch is a common way of handling, it doesn't need to be documented in method names)
There was a problem hiding this comment.
I followed your suggestions 👍.
It's probably just my old habit of wrapping impure functions and prefixing with tryFoo when it's only about the exception handling/reducing to allowed ranges.
…dundant info, too)
Code Review by Qodo
1.
|
Review Summary by QodoAdd file output options to citation commands with service layer refactoring
WalkthroughsDescription• Add file output and format options to get-cited-works and get-citing-works commands • Centralize import/export functionality into dedicated service layer (ImportService, ExportService) • Introduce CliException and CliExceptionHandler to reduce try-catch boilerplate • Create CitationFetcherFactory service to encapsulate citation fetcher creation • Refactor commands to use services instead of static methods for better testability Diagramflowchart LR
A["Commands<br/>get-cited-works<br/>get-citing-works"] -->|use| B["ExportService"]
A -->|use| C["CitationFetcherFactory"]
D["Other Commands<br/>Convert, Search, etc."] -->|use| B
D -->|use| E["ImportService"]
B -->|throws| F["ExportServiceException"]
E -->|throws| G["ImportServiceException"]
F -->|extends| H["CliException"]
G -->|extends| H
H -->|handled by| I["CliExceptionHandler"]
File Changes1. jabkit/src/main/java/org/jabref/toolkit/service/ExportService.java
|
| public CitationFetcher getCitationFetcher(CitationFetcherType citationFetcherType) { | ||
| ChatModel chatModel = ChatModelFactory.create(cliPreferences.getAiPreferences()); | ||
| return CitationFetcherType.getCitationFetcher( | ||
| citationFetcherType, | ||
| cliPreferences.getImporterPreferences(), | ||
| cliPreferences.getImportFormatPreferences(), | ||
| cliPreferences.getCitationKeyPatternPreferences(), | ||
| cliPreferences.getGrobidPreferences(), | ||
| cliPreferences.getAiPreferences(), | ||
| chatModel | ||
| ); | ||
| } |
There was a problem hiding this comment.
4. Chatmodel resource leak 🐞 Bug ☼ Reliability
CitationFetcherFactory.getCitationFetcher(...) creates a ChatModel but never closes it; since ChatModel is AutoCloseable, repeated use can leak resources such as threads or HTTP connections.
Agent Prompt
## Issue description
`CitationFetcherFactory.getCitationFetcher(...)` constructs a `ChatModel` but does not close it. `ChatModel` implements `AutoCloseable`, so this is a resource leak.
## Issue Context
Previously, commands used try-with-resources around `ChatModel`. After refactoring, the lifecycle is lost.
A robust fix is to return a handle/wrapper that implements `AutoCloseable` and closes the `ChatModel` (and any other resources) when done, and to use it with try-with-resources in `GetCitedWorks`/`GetCitingWorks`.
## Fix Focus Areas
- jabkit/src/main/java/org/jabref/toolkit/service/CitationFetcherFactory.java[24-35]
- jabkit/src/main/java/org/jabref/toolkit/commands/GetCitedWorks.java[60-80]
- jabkit/src/main/java/org/jabref/toolkit/commands/GetCitingWorks.java[60-80]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
Thank you both for your feedback and time. I addressed the mentioned issues. Should I mark those points as 'resolve conversation'? (Or is that for the reviewers) |
Yeah you can leave them as-is. If it starts getting too cluttered we'll resolve them ourselves. We keep them open until them to track changes over subsequent review iterations. |
| - The `jabkit check` command now runs both the consistency and integrity checks when given an input file without a subcommand (e.g. `jabkit check references.bib`). [#15759](https://github.com/JabRef/jabref/pull/15759) | ||
| - We added OCR feature using OCRmyPDF to extract text from scanned PDFs and create searchable PDFs including the extracted text. [#15712](https://github.com/JabRef/jabref/pull/15712) | ||
| - We Added generic CSV export filter that exports all standard BibTeX fields [#15711](https://github.com/JabRef/jabref/issues/15711) | ||
| - We added support to the JabKit commands `get-cited-works` and `get-citing-works` to output to files in various export formats |
|
|
||
| @Override | ||
| public Integer call() { | ||
| public Integer call() throws ImportServiceException, ExportServiceException { |
There was a problem hiding this comment.
@calixtus This is one "pattern" to investigate. Even thow the handling of "expected" excpetional cases is now non-local, I tend to like it --> more Java'ish control flow (exceptions --> BibTeX file not parseable is rather an exception than normal flow, isn't it?)
There was a problem hiding this comment.
We discussed this and came to the following conclusion:
- Effective Java (J. Bloch) states, that Exceptions should only be thrown for exceptional states. We try to follow the design principles of EJ.
Demonstration by a file not found example: - An unexpected exception may happen if either the path given by the user was not tested on entry (robustness) or was removed during operation (bc eg. cloud connection broke midway).
- If the internal api recieves information, it should expect the information to be valid. File errors are then unexpected an throwing an exception is correct, if the data given was validated before.
@koppor is this correctly summarized?
There was a problem hiding this comment.
is this correctly summarized?
Yeah - and the consequence is that before the existance and access check should be made? -- is this "nice" to do using PicoCLI?
There was a problem hiding this comment.
Just to chime in: PicoCLI support JSR-380 BeanValidation which is nice and reusable (and de-facto industry standard). (https://picocli.info/#_validation) The docs suggest throwing ParameterException in reaction to invalid user input.
Treating the Command classes as "driving/inbound ports" like RestControllers where Validation/Cleanup happens before the actual work is delegated to service/use-case/interaction logic seems appropriate to me.
|
Open from Qodo review:
|
I would be very "naive" here:
Very OK - no req needed. |
| ## Considered Options | ||
|
|
||
| * Use specialized Utility classes for importing/exporting/fetching with static methods | ||
| * Introduce shared JabKit services to separate Commands and logic | ||
| * Keep import/export logic and fetcher creation inside each command |
There was a problem hiding this comment.
Nitpick: Normally we list pros and cons for every possible option, but is no blocker
There was a problem hiding this comment.
Kind of the "slim" MADR format for "easier" ADRs.
Related issues and pull requests
Closes
None, just an adoption of the existing output format capability for the
get-cited/citing-worksCommand in JabKit CLI.PR Description
The added feature is a result of a discussion with @koppor in the search for a good first issue/feature for me to get a better grasp on the codebase. However I would enjoy feedback on the implementation and patterns.
The implementation adds the options for file output and file format to the Commands
get-cited-worksandget-citing-works.Two main refactorings/patterns emerged for me when I struggled with unit testing and comprehension due to quite some boilerplate (
if-elseandtry-catches) in the commands:public staticmethods inJabKit) - see ADR-61CliExceptions andCliExceptionHandlerto wrap errors and their exit codes to reduce the amount oftry-catchblocks that can't really be handled anyways - see ADR-62Intent: Better 3rd party tools integration & workflows by supporting multiple output formats. (The Impact of the feature might be still low until some concrete requirements for integration are established - read: I'm not sure how users are integrating JabKit already)
The Impact of the refactoring is a decrease of duplications and it got a bit easier to read the actual flow in the commands. Also some commands had subtly different behavior which should be better now. I was able to reduce the amount of
Optional/Either-type return values in favor of plain return types and Exception in error cases (I might be biased here in my preference).Screenshot of different output formats:

Single entry library:

jabref-contrib-policy:4.2:reviewed:okSteps to test
get-cited-works:
./gradlew :jabkit:run --args="get-cited-works 10.1109/ICSA59870.2024.00011 --output=get-cited-works.csv --output-format=CSV"get-citing-works:
./gradlew :jabkit:run --args="get-citing-works 10.1109/ICSA59870.2024.00011 --output=get-citing-works.html --output-format=html"(and check the resulting output files)
AI usage
Claude Code: (mostly to challenge my ideas for simpler solutions)
no AI generated code in this PR
Checklist
JabKitclass with arg-vector but not with a bundled os binary (I might need help there)CHANGELOG.mdin a way that can be understood by the average user (if change is visible to the user)