Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBS-12364, MBS-12367: Make use of AC MBIDs in website and webservice #2755

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

yvanzo
Copy link
Contributor

@yvanzo yvanzo commented Nov 29, 2022

Problem

MBS-11456 added GIDs to artist credits in the database and MBS back-end.
Neither the website nor the webservice made these usable so far.

Solution

MBS-12364

For the website:

  • Access the artist-credit page using MBID instead of row ID.
  • Redirect row ID and old MBIDs to the canonical/current MBID.

It has some side changes as well:

  • Redirect from /mbid/<MBID> to /artist-credit/<MBID> if applicable
  • Support [artist-credit:<mbid>] link syntax in annotation
  • Count ACs for MBIDs in statistics
  • Move ACs cache keys to use MBIDs

MBS-12367

For the API lookup/browse results:

  • Add id attribute to artist-credit XML element
  • Add artist-credit-id JSON key as a sibling of artist-credit array

Test hints

For manually testing webpages:

  • /mbid/94eda81b-abcc-3365-9a3c-10633d06fc71

  • /artist-credit/94eda81b-abcc-3365-9a3c-10633d06fc71

  • /artist-credit/1599741

  • /artist-credit/37f3d5f8-0bb7-48e2-8f82-99b73f005fcc

    after having added an arbitrary GID redirect with:

    INSERT INTO artist_credit_gid_redirect (gid, new_id)
        VALUES ('37f3d5f8-0bb7-48e2-8f82-99b73f005fcc', 1599741);

For manually testing changes in API JSON results:

  • /ws/2/recording/52bc6f21-2c6d-48d0-be24-6ddb06a8461c?fmt=json&inc=artists
  • /ws/2/recording/52bc6f21-2c6d-48d0-be24-6ddb06a8461c?fmt=json&inc=artist-credits
  • /ws/2/release/a7034f6f-d038-405a-8692-59f7530f6958?fmt=json&inc=artists
  • /ws/2/release/a7034f6f-d038-405a-8692-59f7530f6958?fmt=json&inc=artist-credits
  • /ws/2/release-group/7c4cab8d-dead-3870-b501-93c90fd0a580?fmt=json&inc=artists
  • /ws/2/release-group/7c4cab8d-dead-3870-b501-93c90fd0a580?fmt=json&inc=artist-credits

For manually testing changes in API XML results:

  • /ws/2/recording/52bc6f21-2c6d-48d0-be24-6ddb06a8461c?fmt=xml&inc=artists
  • /ws/2/recording/52bc6f21-2c6d-48d0-be24-6ddb06a8461c?fmt=xml&inc=artist-credits
  • /ws/2/release/a7034f6f-d038-405a-8692-59f7530f6958?fmt=xml&inc=artists
  • /ws/2/release/a7034f6f-d038-405a-8692-59f7530f6958?fmt=xml&inc=artist-credits
  • /ws/2/release-group/7c4cab8d-dead-3870-b501-93c90fd0a580?fmt=xml&inc=artists
  • /ws/2/release-group/7c4cab8d-dead-3870-b501-93c90fd0a580?fmt=xml&inc=artist-credits

I didn’t manually test any browse request, however they use the same code as lookup requests to serialize artist credits, and CI tests have been updated for both lookup and browse requests.

Prospective questions to reviewers

  1. Should artist credits be added to the sitemap?
  2. Should we start logging artist credits on deletion?

Todolist for author

  • Have a release plan for SEARCH-690 (which is half implemented already)
  • Update MB_SOLR_TAG to have an mmd-schema version that validates XML output in CI tests
  • Display “Artist credits” count in statistics (since AC MBIDs are now counted)
  • Add “Details” tab (which likely conflicts with refactoring for MBS-12552)
  • Include AC GIDs (and redirects) in sample data dumps (through EntityDump package)
  • Update the MusicBrainz API (lookup/browse) examples in XML and JSON
  • Update the MusicBrainz API Search examples in XML and JSON

Actions after merge

  1. Release and deploy changes for SEARCH-690 prior to releasing MBS
  2. Make updated WikiDocs pages visible on MusicBrainz website

Since MBS-11456, artist credits can have multiple MBIDs
(through GID redirects) but can still not have relationships.

Update entities.mjs then regenerate entities.json with:

    ./script/generate_entities_json.mjs > entities.json

Setting `mbid` makes:
- annotation to support `[artist_credit:<mbid>]` link notation;
- `/mbid/<mbid>` page to redirect to `/artist-credit/<mbid>` too;
- statistics to include artist credits when counting MBIDs
  (even though artist credit count isn’t displayed, for now).

Setting `mbid:multiple` will make:
- `Data::Role::Load` able to load GID redirects,
  once `Data::ArtistCredit` stopped overriding `_load` method;
- `Data::Role::GIDRedirect` applicable to `Data::ArtistCredit`.

Notes:
- Setting `mbid:relatable` (to `false`) doesn’t make anything
  but is generally set when `mbid` exists;
- `MusicBrainz::Script::EntityDump` doesn’t refer to multiple MBIDs
  when dumping artist credits, for now.
Amend commit 8a841d6 which defined the
method `load_gid_redirects` without importing `object_to_ids`.

This bug was inactive because this method was (is) dead code (for now).
Split the method `get_by_ids` of `Data::ArtistCredit`
(which overwrites the same method of `Data::Entity` it extends),
so that the second part can be used for getting ACs by GID too.

The new subroutine for this second part has the distinctive name
`_get_hash_by_keys` because it doesn’t return an array like
`_get_by_keys` of `Data::Entity` does but it is the same otherwise.

Update the POD accordingly.
In `Data::ArtistCredit`, the method `_get_by_keys` was inherited from
`Data::Entity` but wasn’t used so far and wouldn’t work as-is.

This patch overrides this method so that it works as it is required
by the role `GetByGID` added to `Data::ArtistCredit` in commit
8a841d6 without making use of it.

Update the POD accordingly.
Redirect both row ID and old GID to the canonical/current GID/MBID for
accessing artist credit page.

This patch also changes artist credit cache to use the GID for keys.
This cache is short-lived anyway: 1h on mirrors, 1 day on main server.

It also loads GID and GID redirects in ArtistCredit Moose entity model.

By loading GID it breaks JSON WS/2 which is fixed in separate commits.

Roles are ordered by (even weak) dependency, then alphabetically.

Update tests to differentiate invalid ID into non-existent ID (for
example a too large integer) and incorrectly formatted ID (which is
neither an UUID nor a positive integer). In the latter case, returning
400 (Bad Request) is actually the right thing to do.

Further tests are added in separate commits.
Test that the user is redirected to the artist credit’s page with the
current/canonical MBID when using either its row ID or an old MBID.

Update the basic test to use the canonical MBID instead of the row ID.
Since `'artist-credit'` is a JSON array of artist credit names, adding
artist credit’s MBID under it would require to turn it into an object
which would be a breaking change for everything that uses the WS/2 API.

Instead this patch serialize artist credit’s MBID and add it under the
new key `'artist-credit-id'` at the same level as `artist-credit`.

Tests are updated in a separate commit.
Add an `id` attribute to the `artist-credit` XML elements in WS/2
lookup/browse results, and set it to the artist credit’s MBID.

Tests are updated in a separate commit.
@yvanzo yvanzo added the New feature Non urgent new stuff label Nov 29, 2022
@reosarevok
Copy link
Member

reosarevok commented Nov 30, 2022

Should artist credits be added to the sitemap?

Are you adding any JSON-LD to them? If not, probably not? But let's see what @mwiencek thinks.

Should we start logging artist credits on deletion?

I honestly didn't even remember we have deleted_entity since we barely ever use it - but I don't see why we wouldn't do something like this for ACs too. Looking at a sample row, I imagine the same thing could be done with ACs - unsure what we'd store, but at least the gid, type and full text name, if not artist gids too?:
1e4690df-5693-48c2-a80c-eb1551d7a867 | {"entity_gid": "1e4690df-5693-48c2-a80c-eb1551d7a867", "entity_type": "artist", "last_known_name": "Ernst Rolff", "last_known_comment": ""} | 2013-07-22 13:00:18.742424+00

Should artist credit MBIDs (and redirects) be dumped too?

Aren't they already dumped? The artist_credit_gid_redirect table is already in @CORE_TABLE_LIST. If you mean in EntityDump then I guess so? I see no reason not to either do a handle_rows for redirect or just pass ACs through the core_entity process too.

@reosarevok
Copy link
Member

For _get_hash_by_keys, would it make sense to just not return a hash? find_by_ids does values %{ $self->get_by_ids(@$ids) } which is the same reasoning @mwiencek had in 895e36e when he made _get_by_keys for Data::Entity return a list.

Copy link
Member

@reosarevok reosarevok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a typo "seralization" in the commit message for 26aba5b

Thanks for working on this - it looks sensible to me so far (tested manually based on your suggestions too).

with 'MusicBrainz::Server::Data::Role::EntityModelClass';
with 'MusicBrainz::Server::Data::Role::GetByGID';
with 'MusicBrainz::Server::Data::Role::MainTable';
# Follows GetByGID and MainTable as it requires 'get_by_gid' and '_main_table';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried just consuming them all in one with call instead? I did that for all of Data:: in #2751 and it didn't seem to cause issues (while avoiding the need to order things in weird and specific ways) :)

Copy link
Contributor Author

@yvanzo yvanzo Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I just followed the current coding style for now even though it isn’t perfect. Changing it isn’t straightforward either. For example “the need to order things” is still there when two roles like GIDEntityCache and GIDRedirect goes around the same method get_by_gid.

Since the commit 33f470b the class
`MusicBrainz::Server::WebService::Serializer::JSON::2::ArtistCredit`
is unusable as the method `serialize_id` from `Utils` cannot set `id`
key into `artist-credit` array. That’s why artist credit serialization
has been replaced in commit 4491cf3.

However this class could still be useful if `artist-credit` gets added
to API resources in which case it wouldn’t be a breaking change.
@yvanzo
Copy link
Contributor Author

yvanzo commented Nov 30, 2022

Thanks for your comments I moved the third prospective question about sample data dump to my todolist, and fixed the typo in the last commit message.

Are you adding any JSON-LD to them?

I didn’t think about it.

About JSON-LD, I checked that our currently produced JSON-LD doesn’t include any artist credit data and thus doesn’t need to be updated with AC MBIDs.

For _get_hash_by_keys, would it make sense to just not return a hash?

There are wo reasons for returning a hash:

  • The hash is directly used and returned by get_by_ids .
  • The current implementation builds a hash anyway to set artist count for each artist credit (which isn’t needed in the default _get_by_keys).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New feature Non urgent new stuff
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants