Matching nodes in a knowledge graph using Graph Convolutional Networks and investigating the interplay between formal semantics and GCNs.
A detailed description of the motivation and the algorithms is available in the related paper.
When citing, please use the following reference:
Pierre Monnin, Chedy Raïssi, Amedeo Napoli, Adrien Coulet: Discovering alignment relations with Graph Convolutional Networks: A biomedical case study. Semantic Web 13(3): 379-398 (2022)
@article{monninRNC22,
author = {Pierre Monnin and
Chedy Ra{\"{\i}}ssi and
Amedeo Napoli and
Adrien Coulet},
title = {Discovering alignment relations with Graph Convolutional Networks:
{A} biomedical case study},
journal = {Semantic Web},
volume = {13},
number = {3},
pages = {379--398},
year = {2022},
url = {https://doi.org/10.3233/SW-210452},
doi = {10.3233/SW-210452}
}
- In
query_simset.py
- Retrieve individuals to match (instances of classes in
individuals-classes
in the JSON configuration file) - Retrieve similarity links between these individuals (to use in train/valid/test sets).
- Similarity links are described in
similarity-links
in the JSON configuration file - When having the link
(url1, url2)
, we do not add(url2, url1)
for symmetric predicates to avoid the symmetry bias in training
- Similarity links are described in
- In
query_graph.py
- Retrieve the adjacency of the RDF graph (except similarity links previously retrieved in 1.)
- Retrieve predicates and their inverses (or symmetry)
- Must be used with the cache manager resulting from the previous step
- In
similarity_analysis.py
- Output PDF files with histograms depicting the size of similarity clusters and number of them for each model (computed based on the similarity links considered by each model)
- Similarity clusters for each model are computed over all similarity links indifferently considered by the model in an undirected (symmetry) and transitive fashion
- In
n_fold_split.py
- Output a n-fold split of similarity links (after shuffling)
- In
transform_graph.py
- Output a DGL graph from the given RDF graph applying one of the following transformations:
- G0: RDF graph + adding an abstract inverse for each predicate
- G1: RDF graph after owl:sameAs edges contraction (only considering canonical nodes)
- G2: RDF graph with consideration of inverse predicates / symmetry (to avoid adding abstract inverses when not needed)
- G3: RDF graph with links added based on the hierarchy of predicates: if (a, rel1, b) and (rel1, subPropertyOf, rel2), we add (a, rel2, b)
- G4: RDF graph with
rdf:type
links added based on the hierarchy of classes: if (a, type, b) and (b, subClassOf, c), we add (a, type, c) - G5: all transformations of G1 to G4
- The graph is limited to the considered neighborhood of individuals to match based on the number of layers
- In
learning.py
- Output a python dict where each key is the index of the test fold and contains:
logits_history
: python list associating an epoch with its logitstrain_loss_history
: python list associating an epoch with its train lossval_loss_history
: python list associating an epoch with its validation losstest_loss_history
: python list associating an epoch with its test losstemperature_history
: python list associating an epoch with its temperaturemodel
: python list associating an epoch with the parameters of the GCN model
- In
clustering_analysis.py
- Output for each fold:
- A distance analysis based on all links, links whose nodes are in the training set, in the validation set, and in the test set
- A UMAP projection computed on all nodes and displayed for all nodes, nodes in the training set, in the validation set and in the test set.
Only
--umap-colors
similarity clusters are colored (starting at the biggest ones). Only similarity clusters containing more than--umap-size
nodes are displayed (0 to disable) - A plot of the training, validation, and test losses
- A plot of the temperature
- Python3.7
- tqdm
- requests
- pytorch
- dgl
- matplotlib
- scikit-learn
- umap-learn
- pynndescent
(called gold clusterings in the preprint)
Similarity links | owl:sameAs | skos:closeMatch | skos:relatedMatch | skos:related | skos:broadMatch |
---|---|---|---|---|---|
Properties | T / S | T / S | nT / S | nT / S | T / nS |
M0 | X | X | X | X | X |
M1 | X | X | X | X | |
M2 | X | ||||
M3 | X | ||||
M4 | X | ||||
M5 | X | ||||
M6 | X |
- T: transitivity
- S: symmetry
- nT : non-transitivity
- nS : non-symmetry