Feature parse MeSH terms in PubMed MEDLINE records#15529
Feature parse MeSH terms in PubMed MEDLINE records#15529LoayTarek5 wants to merge 10 commits intoJabRef:mainfrom
Conversation
…ajor topic flag and update qualifier name handling
…e keyword handling for descriptors and qualifiers
…ualifierName record for improved MeSH term handling
Review Summary by QodoParse MeSH terms into individual heading/qualifier pairs
WalkthroughsDescription• Parse MeSH terms into individual heading/qualifier pairs with major topic markers • Update MeshHeading record to track descriptor major flag and qualifier details • Refactor keyword formatting to use slash-separated heading/qualifier syntax • Add test coverage for MeSH term parsing in plain text importer Diagramflowchart LR
A["MeSH Term Input<br/>e.g. *Kidney/diagnosis"] --> B["Parse Descriptor<br/>and Qualifiers"]
B --> C["Extract Major<br/>Topic Flags"]
C --> D["Format Keywords<br/>heading/qualifier*"]
D --> E["Individual Keyword<br/>Chips in UI"]
File Changes1. jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlineImporter.java
|
Code Review by Qodo
|
| case "MH" -> { | ||
| List<String> meshKeywords = parseMeshTerm(value); | ||
| Character separator = importFormatPreferences.bibEntryPreferences().getKeywordSeparator(); | ||
| String meshString = String.join(separator + " ", meshKeywords); | ||
| fieldConversionMap.merge(StandardField.KEYWORDS, meshString, | ||
| (existing, newVal) -> existing + separator + " " + newVal); | ||
| } |
There was a problem hiding this comment.
1. No mh separator conflict check 📎 Requirement gap ≡ Correctness
The updated MH import path serializes parsed MeSH keywords using the configured keyword separator without detecting whether that separator occurs inside the raw MH value. If the separator appears in an MH line, the resulting keywords field can become ambiguous or corrupted without any warning/substitution.
Agent Prompt
## Issue description
The MEDLINE plain-text importer serializes `MH`-derived keywords using the configured keyword separator but does not detect when that same separator character appears inside the raw `MH` value, which can lead to ambiguous/corrupted keyword boundaries.
## Issue Context
Compliance requires checking `MH - ...` lines for the user-defined keyword separator and either warning the user or applying a safe substitution/escaping strategy before serialization.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[193-199]
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[407-435]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
Can you take a look @ryan-carpenter? |
|
You could add a test for the XML importer's MeSH parsing. Everything else looks good to me. |
Related issues and pull requests
Closes #12532
PR Description
Parse MeSH terms in Medline/PubMed importers (XML and Plain Text) into individual heading/qualifier pairs with major topic markers, matching PubMed's display format.
Steps to test
1- Download PubMed format, then save as a .txt file
2- In JabRef, go to File -> Import, select the saved file
3- Click the imported entry, go to the General tab, and check the Keywords field
You should see individual keyword chips matching PubMed's display format.
4- for XML import open for example https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi db=pubmed&id=23633646&retmode=xml -> right-click -> Save as .xml
6- In JabRef, File → Import → select the file (choose "Medline/PubMed" format)
7-Go to General tab -> check Keywords field
.txt

.xml

The full keywords:
Female; Graves Disease/radiotherapy*; Humans; Hypothyroidism/etiology*; Iodine Radioisotopes/adverse effects; Iodine Radioisotopes/therapeutic use*; Retrospective Studies; Thyrotoxicosis/radiotherapy*; Thyroxine/blood; Treatment Failure; Weight GainChecklist
CHANGELOG.mdin a way that can be understood by the average user (if change is visible to the user)