Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datahub]: Change related datasets suggestions to be more relevant #1082

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Angi-Kinas
Copy link
Collaborator

@Angi-Kinas Angi-Kinas commented Jan 21, 2025

Description

This PR changes the elastic search service to base its search for related dataset suggestions on the title, abstract and keywords of the current dataset.
This makes sure that the suggestions will be more relevant and more similar to the current dataset.

[For Review]: Please make sure you visit different datasets and verify that the suggestions are relevant to the dataset. I tested it with the dev.geo2france backend.

Quality Assurance Checklist

  • Commit history is devoid of any merge commits and readable to facilitate reviews
  • If new logic ⚙️ is introduced: unit tests were added
  • If new user stories 🤏 are introduced: E2E tests were added
  • If new UI components 🕹️ are introduced: corresponding stories in Storybook were created
  • If breaking changes 🪚 are introduced: add the breaking change label
  • If bugs 🐞 are fixed: add the backport <release branch> label
  • The documentation website 📚 has received the love it deserves

Copy link
Contributor

github-actions bot commented Jan 21, 2025

Affected libs: api-repository,
Affected apps: metadata-editor,

  • 🚀 Build and deploy storybook and demo on GitHub Pages
  • 📦 Build and push affected docker images

Copy link
Contributor

github-actions bot commented Jan 21, 2025

📷 Screenshots are here!

],
like: [
{
_index: 'gn-records',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should stay hardcoded here. I found it in the docker file and somewhere else but not as a environment variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can vary across deployments and should not be hardcoded; unfortunately the gn-ui apps don't really have access to this information :/

@coveralls
Copy link

coveralls commented Jan 21, 2025

Coverage Status

coverage: 84.495%. first build
when pulling 6d1db07 on DH-relevant-similar-datasets-suggestions
into a1ec7cf on main.

@Angi-Kinas Angi-Kinas force-pushed the DH-relevant-similar-datasets-suggestions branch from 1ea7aa3 to 18f48e5 Compare January 27, 2025 09:47
dataset detail page to use more similar datasets.
Now ES is based on title, abstract and keywords.
@Angi-Kinas Angi-Kinas force-pushed the DH-relevant-similar-datasets-suggestions branch from 4c26e6d to 6d1db07 Compare February 11, 2025 09:51
default: record.abstract,
},
allKeywords: record.keywords.map(
(keyword) => keyword.label
Copy link
Collaborator

@cmoinier cmoinier Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the order here matter? I tested with the MEL instance (only around 50 records, I don't know if it's enough though), and some records have relevant suggestions IMO, but some others are too biased by the title. For instance, a record about wind turbines in Roubaix had suggestions like "women history in Roubaix", "trees in Roubaix" and "citizen crowdfunding in Roubaix" --> their only common point is Roubaix, I checked their keywords & abstract and they do not have anything in common. What if a datahub only has datasets with city names in their title? Will it suggest random records like this all the time?
I don't know much about elastic search, but could it check that the title match AND something else match, not just the title?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants