In this project, we aim to build a research interest summarization tool using LDA. The 'R' packages 'RISMed' and 'textmineR' were used.
Process Flow
- Abstract Extraction from PubMed for author ('RISMed' package)
- MeSH dictionary creation: MeSH term extraction and aggregation from all abstracts
- Topic Identification: Unsupervised Topic Modelling with MeSH terms as input (LDA modeling)
- Reverse Mapping to MeSH Dictionary (fuzzy string matching, mapping the bigrams and trigrams back to the MeSH dictionary)
Innovation
- Use of MeSH terms for research interests summarization
- Labelling of LDA algorithm outputs using standard concepts (MeSH term reverse mapping)