Skip to content

Feature parse MeSH terms in PubMed MEDLINE records#15529

Open
LoayTarek5 wants to merge 10 commits intoJabRef:mainfrom
LoayTarek5:feature-Parse-MeSH-terms-in-PubMed-MEDLINE-records
Open

Feature parse MeSH terms in PubMed MEDLINE records#15529
LoayTarek5 wants to merge 10 commits intoJabRef:mainfrom
LoayTarek5:feature-Parse-MeSH-terms-in-PubMed-MEDLINE-records

Conversation

@LoayTarek5
Copy link
Copy Markdown
Contributor

Related issues and pull requests

Closes #12532

PR Description

Parse MeSH terms in Medline/PubMed importers (XML and Plain Text) into individual heading/qualifier pairs with major topic markers, matching PubMed's display format.

Steps to test

1- Download PubMed format, then save as a .txt file
2- In JabRef, go to File -> Import, select the saved file
3- Click the imported entry, go to the General tab, and check the Keywords field
You should see individual keyword chips matching PubMed's display format.
4- for XML import open for example https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi db=pubmed&id=23633646&retmode=xml -> right-click -> Save as .xml
6- In JabRef, File → Import → select the file (choose "Medline/PubMed" format)
7-Go to General tab -> check Keywords field

.txt
Capture

.xml
capture_260319_114050
The full keywords: Female; Graves Disease/radiotherapy*; Humans; Hypothyroidism/etiology*; Iodine Radioisotopes/adverse effects; Iodine Radioisotopes/therapeutic use*; Retrospective Studies; Thyrotoxicosis/radiotherapy*; Thyroxine/blood; Treatment Failure; Weight Gain

Checklist

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • I added screenshots in the PR description (if change is visible to the user)
  • I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
  • I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
  • [/] I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Parse MeSH terms into individual heading/qualifier pairs

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Parse MeSH terms into individual heading/qualifier pairs with major topic markers
• Update MeshHeading record to track descriptor major flag and qualifier details
• Refactor keyword formatting to use slash-separated heading/qualifier syntax
• Add test coverage for MeSH term parsing in plain text importer
Diagram
flowchart LR
  A["MeSH Term Input<br/>e.g. *Kidney/diagnosis"] --> B["Parse Descriptor<br/>and Qualifiers"]
  B --> C["Extract Major<br/>Topic Flags"]
  C --> D["Format Keywords<br/>heading/qualifier*"]
  D --> E["Individual Keyword<br/>Chips in UI"]
Loading

Grey Divider

File Changes

1. jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlineImporter.java ✨ Enhancement +17/-8

Enhanced MeSH term parsing with major topic flags

• Extract descriptorMajor flag from DescriptorName MajorTopicYN attribute
• Create MeshHeading.QualifierName records to store qualifier name and major flag
• Refactor addMeshHeading() to format keywords as descriptor/qualifier* pairs
• Add asterisk suffix for major topic descriptors and qualifiers

jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlineImporter.java


2. jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java ✨ Enhancement +45/-7

Add MeSH term parsing for plain text format

• Implement parseMeshTerm() method to split compound MeSH terms into individual keywords
• Handle major topic markers (asterisk prefix) for descriptors and qualifiers
• Separate handling of MH (MeSH) and OT (other terms) fields
• Format keywords as descriptor/qualifier* pairs matching PubMed display format

jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java


3. jablib/src/main/java/org/jabref/logic/importer/fileformat/medline/MeshHeading.java ✨ Enhancement +4/-1

Extend MeshHeading record with major topic tracking

• Add descriptorMajor boolean field to track major topic flag
• Change qualifierNames from List<String> to List<QualifierName>
• Introduce nested QualifierName record with name and major flag fields

jablib/src/main/java/org/jabref/logic/importer/fileformat/medline/MeshHeading.java


View more (4)
4. jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java 🧪 Tests +18/-0

Add test for MeSH term parsing functionality

• Add test meshTermsAreParsedIntoIndividualKeywords() to verify MeSH term parsing
• Test compound term splitting with major topic markers
• Verify correct keyword formatting with slash separators and asterisks

jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java


5. jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlineImporterTestNbib.bib 🧪 Tests +1/-1

Update test fixture for new keyword format

• Update expected keywords to use new slash-separated format with major topic markers
• Change from comma-separated qualifiers to heading/qualifier* syntax

jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlineImporterTestNbib.bib


6. jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlinePlainImporterTestCompleteEntry.bib 🧪 Tests +1/-1

Update test fixture for new keyword format

• Update expected keywords to match new slash-separated format
• Adjust major topic asterisk placement to qualifier position

jablib/src/test/resources/org/jabref/logic/importer/fileformat/MedlinePlainImporterTestCompleteEntry.bib


7. CHANGELOG.md 📝 Documentation +1/-0

Document MeSH term parsing improvement

• Add entry documenting improved MeSH term parsing in Medline/PubMed importers
• Reference issue #12532

CHANGELOG.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects bot commented Apr 11, 2026

Code Review by Qodo

🐞 Bugs (0)   📘 Rule violations (0)   📎 Requirement gaps (1)   🎨 UX Issues (0)
📎\ ≡ Correctness (1)

Grey Divider


Action required

1. No MH separator conflict check 📎
Description
The updated MH import path serializes parsed MeSH keywords using the configured keyword separator
without detecting whether that separator occurs inside the raw MH value. If the separator appears
in an MH line, the resulting keywords field can become ambiguous or corrupted without any
warning/substitution.
Code

jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[R193-199]

+                    case "MH" -> {
+                        List<String> meshKeywords = parseMeshTerm(value);
+                        Character separator = importFormatPreferences.bibEntryPreferences().getKeywordSeparator();
+                        String meshString = String.join(separator + " ", meshKeywords);
+                        fieldConversionMap.merge(StandardField.KEYWORDS, meshString,
+                                (existing, newVal) -> existing + separator + " " + newVal);
+                    }
Evidence
PR Compliance ID 5 requires detecting conflicts when the user-defined keyword separator appears in
MH input lines. The new MH handling joins parsed tokens using the configured separator but
neither checks value for the separator nor applies any warning/substitution strategy in
parseMeshTerm.

Detect and handle conflicts when the user-defined keyword separator appears in PubMed MH input lines
jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[193-199]
jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[407-435]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The MEDLINE plain-text importer serializes `MH`-derived keywords using the configured keyword separator but does not detect when that same separator character appears inside the raw `MH` value, which can lead to ambiguous/corrupted keyword boundaries.
## Issue Context
Compliance requires checking `MH  - ...` lines for the user-defined keyword separator and either warning the user or applying a safe substitution/escaping strategy before serialization.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[193-199]
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[407-435]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Test uses setField📘
Description
The newly added test constructs the expected BibEntry using mutable setField calls rather than
the preferred withField withers. This violates the project’s test construction conventions and
reduces consistency with the rest of the test suite.
Code

jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java[R127-131]

+            BibEntry expectedEntry = new BibEntry();
+            expectedEntry.setField(StandardField.PMID, "12345678");
+            expectedEntry.setField(StandardField.KEYWORDS,
+                    "Kidney Diseases*/diagnosis, Kidney Diseases*/epidemiology, Female, some other term");
+
Evidence
PR Compliance ID 40 requires using BibEntry withers (withField) instead of setField in
tests/construction. The added test uses expectedEntry.setField(...) for both PMID and
KEYWORDS.

AGENTS.md
jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java[127-131]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A newly added unit test constructs `BibEntry` using `setField`, but project conventions require using `withField` for test construction/modification.
## Issue Context
Using `withField` keeps tests consistent with JabRef’s preferred immutable-style `BibEntry` usage.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/importer/fileformat/MedlinePlainImporterTest.java[127-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +193 to +199
case "MH" -> {
List<String> meshKeywords = parseMeshTerm(value);
Character separator = importFormatPreferences.bibEntryPreferences().getKeywordSeparator();
String meshString = String.join(separator + " ", meshKeywords);
fieldConversionMap.merge(StandardField.KEYWORDS, meshString,
(existing, newVal) -> existing + separator + " " + newVal);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. No mh separator conflict check 📎 Requirement gap ≡ Correctness

The updated MH import path serializes parsed MeSH keywords using the configured keyword separator
without detecting whether that separator occurs inside the raw MH value. If the separator appears
in an MH line, the resulting keywords field can become ambiguous or corrupted without any
warning/substitution.
Agent Prompt
## Issue description
The MEDLINE plain-text importer serializes `MH`-derived keywords using the configured keyword separator but does not detect when that same separator character appears inside the raw `MH` value, which can lead to ambiguous/corrupted keyword boundaries.

## Issue Context
Compliance requires checking `MH  - ...` lines for the user-defined keyword separator and either warning the user or applying a safe substitution/escaping strategy before serialization.

## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[193-199]
- jablib/src/main/java/org/jabref/logic/importer/fileformat/MedlinePlainImporter.java[407-435]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentionally scoped out in the discussion of issue #12532.
Keyword separator escaping is handled in #12810 (resolved PR #14637)

@github-actions github-actions bot added the status: changes-required Pull requests that are not yet complete label Apr 11, 2026
@github-actions github-actions bot added status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Apr 11, 2026
@calixtus
Copy link
Copy Markdown
Member

Can you take a look @ryan-carpenter?

@faneeshh
Copy link
Copy Markdown
Contributor

You could add a test for the XML importer's MeSH parsing. Everything else looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse MeSH terms in PubMed MEDLINE records

3 participants