Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop Python 3.7, fix genre shenanigans, parse 'with' and 'w/' #55

Merged
merged 40 commits into from
Apr 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
4a98ba3
internal: fix coverage setup
snejus Mar 17, 2024
8f72e27
genre: handle absence of 'keywords' metadata field
snejus Aug 21, 2023
6920d9f
Update build
snejus Aug 31, 2023
f7b91e1
internal: improve test_lib
snejus Oct 5, 2023
73d987a
Relax typing requirements
snejus Oct 5, 2023
91bca85
Cleanup: remove duplicate constant definitions
snejus Oct 5, 2023
042c530
internal: tidy up album patterns
snejus Oct 5, 2023
2cf5a43
title: Include w/ as featuring artist keyword
snejus Dec 21, 2023
b129ed6
catalognum: allow format like Dystopian LP01
snejus Apr 27, 2024
e97fc3e
album: handle EP/LP when followed by relevant data
snejus Dec 21, 2023
f1dec75
album: keep artist name when followed by underscore
snejus Apr 28, 2024
4778e34
album: handle apostrophes more reliably
snejus Mar 16, 2024
d08df44
album/title: remove 'Various -'
snejus Dec 26, 2023
efa9150
internal: remove underscore from modules
snejus Dec 26, 2023
3527a68
internal: fix methods order in tracks.py
snejus Dec 26, 2023
f1e8451
album: handle album sent to us by the devil himself
snejus Dec 27, 2023
cfc4851
title: remove 'bonus -'
snejus Dec 27, 2023
a5657df
album: do not remove VA when followed by a word or a number
snejus Dec 27, 2023
4d034c1
Update dependencies
snejus Mar 15, 2024
d805393
Drop support for Pythonn 3.7
snejus Mar 15, 2024
c6f59ab
internal: clarify track name usage
snejus Mar 15, 2024
98267ad
internal: simplify some of the ridiculous tracks logic
snejus Mar 15, 2024
9873f59
internal: normalize track delimiter in tracks.py
snejus Mar 15, 2024
d7ace39
Fix Match[str] issue on Python3.8
snejus Mar 18, 2024
109ebc9
internal: fix regex performance
snejus Mar 16, 2024
11153e5
internal: move album in track titles logic to Tracks
snejus Apr 28, 2024
aeb7100
internal: clean up Album logic
snejus Apr 28, 2024
547a85b
internal: split release track names parsing into TrackNames class
snejus Mar 18, 2024
83fe8d2
catalognum: parse a range
snejus Apr 6, 2024
d69f630
internal: clarify the way ft is being handled
snejus Apr 22, 2024
0f0dc4d
comments: return value None when there is no comment
snejus Apr 22, 2024
98e8bc0
Synchronise test JSON files
snejus Apr 22, 2024
43b91f1
internal: standardize types in __init__ and clarify things a bit
snejus Apr 26, 2024
a1f8d01
label: obtain for single releases
snejus Apr 26, 2024
17dcd81
genre: apply matching rules to genres delimited by a dash
snejus Apr 27, 2024
38b12e0
beets: add support for albumtypes list, but constrain beets dependenc…
snejus Apr 28, 2024
1feb71f
Add a few new releases to test
snejus Apr 28, 2024
0649df3
Update dependencies
snejus Apr 28, 2024
f4c73c5
fixup! internal: clean up Album logic
snejus Apr 28, 2024
6b0efdd
Bump the version
snejus Apr 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 23 additions & 6 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python: ["3.7", "3.8", "3.9", "3.10"]
python: ["3.8", "3.9", "3.10", "3.11", "3.12"]
beets: ["1.4.9", "1.5.0", "1.6.0"]
steps:
- uses: actions/checkout@v3
Expand Down Expand Up @@ -36,17 +36,34 @@ jobs:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COVERALLS_FLAG_NAME: python${{ matrix.python }}_beets${{ matrix.beets }}
COVERALLS_PARALLEL: true
- name: Flake8
- name: Lint flake8
run: flake8 . --output-file flake.log --exit-zero
- name: Mypy
run: mypy --strict >> flake.log || true
- name: Pylint
- name: Lint mypy
run: mypy > mypy.log || true
- name: Lint pylint
run: pylint --output pylint.log --exit-zero $(git ls-files '*.py')
- name: Set project version
run: echo PROJECT_VERSION="$(git describe --tags | sed 's/-[^-]*$//')" >> $GITHUB_ENV
- name: SonarCloud Scan
if: ${{ matrix.beets == '1.5.0' && matrix.python == '3.8' }}
uses: SonarSource/sonarcloud-github-action@master
with:
args: -Dsonar.branch.name=${{ github.ref_name }}
args: >
-Dsonar.branch.name=${{ github.ref_name }}
-Dsonar.organization=snejus
-Dsonar.projectKey=snejus_beetcamp
-Dsonar.projectVersion=${{ env.PROJECT_VERSION }}
-Dsonar.coverage.exclusions=tests/*
-Dsonar.exclusions=tests/*
-Dsonar.python.coverage.reportPaths=.reports/coverage.xml
-Dsonar.python.flake8.reportPaths=flake.log
-Dsonar.python.pylint.reportPaths=pylint.log
-Dsonar.python.mypy.reportPaths=mypy.log
-Dsonar.python.version=3.8
-Dsonar.python.xunit.reportPath=.reports/test-report.xml
-Dsonar.sources=beetsplug/bandcamp
-Dsonar.tests=tests
-Dsonar.test.inclusions=tests/*
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
Expand Down
48 changes: 48 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,51 @@
## Unreleased


## [0.18.0] 2024-04-28

### Removed

- Dropped support for `python 3.7`.

### Fixed

- (#52) genre: do not fail parsing a release without any keywords, for example
https://amniote-editions.bandcamp.com/album/ae-mj-0011-the-collective-capsule-vol-1
- (#54) Ensure that our genre matching rules also apply to genres delimited by a dash, not
only space.

### Updated

- `album`:
- handle some edge cases when string **EP** or **LP** is followed with data relevant to
the album
- do not remove artist or label when it is preceded by **` x `** or followed by characters
**`'`**, **`_`** and **`&`**, or words **EP**, **LP** and **deluxe**
- handle apostrophes more reliably
- Do not remove **VA** or **V/A** from the beginning when followed by a word or a number

- `album` / `title`:
- Remove **`Various -`** from album and track names
- Handle this [album sent to us by the devil himself]

- `catalognum`:
- allow catalogue numbers like **Dystopian LP01**
- parse a _range_ of catalogue numbers when it is present, for example
**TFT013SR - TFT-016SR**

- `comments`: use value `None` when there are no comments. In contrast to returning an
empty string, this way during beets import the previous comment on the track will be
kept if the Bandcamp release does not have a description.

- `label`: label is now correctly obtained for single releases when it is available.

- `title`:
- consider **with** and **w/** as markers for collaborating artists
- remove **`bonus -`**
- `Artist - Title (bonus - something)` -> **`Artist - Title (something)`**

[album sent to us by the devil himself]: https://examine-archive.bandcamp.com/album/va-examine-archive-international-sampler-xmn01

## [0.17.2] 2023-08-09

### Fixed
Expand Down
138 changes: 79 additions & 59 deletions beetsplug/bandcamp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,31 @@
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

"""Adds bandcamp album search support to the autotagger."""
from __future__ import absolute_import, division, print_function, unicode_literals

from __future__ import annotations

import logging
import re
from contextlib import contextmanager
from functools import lru_cache, partial
from html import unescape
from operator import itemgetter, truth
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Sequence, Union
from itertools import chain
from operator import itemgetter
from typing import TYPE_CHECKING, Any, Dict, Iterable, Iterator, List, Literal, Sequence

import requests
from beets import IncludeLazyConfig, __version__, config, library, plugins
from beets.autotag.hooks import AlbumInfo, TrackInfo

from beetsplug import fetchart # type: ignore[attr-defined]

from ._metaguru import Metaguru
from ._search import search_bandcamp
from .metaguru import Metaguru
from .search import search_bandcamp

if TYPE_CHECKING:
from beets.autotag.hooks import AlbumInfo, TrackInfo

JSONDict = Dict[str, Any]
CandidateType = Literal["album", "track"]

DEFAULT_CONFIG: JSONDict = {
"include_digital_only_tracks": True,
Expand All @@ -50,6 +56,7 @@
}

ALBUM_URL_IN_TRACK = re.compile(r'<a id="buyAlbumLink" href="([^"]+)')
LABEL_URL_IN_COMMENT = re.compile(r"Visit (https:[\w/.-]+com)")
USER_AGENT = f"beets/{__version__} +http://beets.radbox.org/"


Expand Down Expand Up @@ -80,22 +87,24 @@ def _get(self, url: str) -> str:
return ""
return unescape(response.text)

def guru(self, url, attr):
# type: (str, str) -> Optional[Union[TrackInfo, List[AlbumInfo]]]
def guru(self, url: str) -> Metaguru:
return Metaguru.from_html(self._get(url), config=self.config.flatten())

@contextmanager
def handle_error(self, url: str) -> Iterator[Any]:
"""Return Metaguru for the given URL."""
config = self.config.flatten() if hasattr(self, "config") else DEFAULT_CONFIG
try:
return getattr(Metaguru.from_html(self._get(url), config=config), attr)
yield
except (KeyError, ValueError, AttributeError, IndexError):
self._info("Failed obtaining {} from {}", attr, url)
self._info("Failed obtaining {}", url)
except Exception: # pylint: disable=broad-except
i_url = "https://github.com/snejus/beetcamp/issues/new"
self._exc("Unexpected error obtaining {}, please report at {}", url, i_url)
return None


def _from_bandcamp(clue: str) -> bool:
"""Check if the clue is likely to be a bandcamp url.

We could check whether 'bandcamp' is found in the url, however, we would be ignoring
cases where the publisher uses their own domain (for example https://eaux.ro) which
in reality points to their Bandcamp page. Historically, we found that regardless
Expand All @@ -108,18 +117,24 @@ def _from_bandcamp(clue: str) -> bool:
class BandcampAlbumArt(BandcampRequestsHandler, fetchart.RemoteArtSource):
NAME = "Bandcamp"

def get(self, album, plugin, paths):
# type: (AlbumInfo, plugins.BeetsPlugin, List[str]) -> Iterable[fetchart.Candidate] # noqa
def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
self.config = self._config

def get(self, album: AlbumInfo, *_: Any) -> Iterable[fetchart.Candidate]:
"""Return the url for the cover from the bandcamp album page.

This only returns cover art urls for bandcamp albums (by id).
"""
url = album.mb_albumid
if not _from_bandcamp(url):
self._info("Not fetching art for a non-bandcamp album URL")
else:
image = self.guru(url, "image")
if image:
yield self._candidate(url=image, match=fetchart.Candidate.MATCH_EXACT)
with self.handle_error(url):
if image := self.guru(url).image:
yield self._candidate(
url=image, match=fetchart.Candidate.MATCH_EXACT
)


def urlify(pretty_string: str) -> str:
Expand Down Expand Up @@ -154,8 +169,12 @@ def loaded(self) -> None:
plugin.sources = [bandcamp_fetchart, *plugin.sources]
break

def _find_url(self, item: library.Item, name: str, _type: str) -> str:
"""If the item has previously been imported, `mb_albumid` (or `mb_trackid`
def _find_url_in_item(
self, item: library.Item, name: str, _type: CandidateType
) -> str:
"""Try to extract release URL from the library item.

If the item has previously been imported, `mb_albumid` (or `mb_trackid`
for singletons) contains the release url.

As of 2022 April, Bandcamp purchases (at least in FLAC format) contain string
Expand All @@ -175,24 +194,23 @@ def _find_url(self, item: library.Item, name: str, _type: str) -> str:
self._info("Fetching the URL attached to the first item, {}", url)
return url

m = re.match(r"Visit (https:[\w/.-]+com)", item.comments)
urlified_name = urlify(name)
if m and urlified_name:
label = m.expand(r"\1")
url = "/".join([label, _type, urlified_name])
if (m := LABEL_URL_IN_COMMENT.match(item.comments)) and (
urlified_name := urlify(name)
):
label = m.group(1)
url = f"{label}/{_type}/{urlified_name}"
self._info("Trying our guess {} before searching", url)
return url
return ""

def candidates(self, items, artist, album, va_likely, extra_tags=None):
# type: (List[library.Item], str, str, bool, Any) -> Iterable[AlbumInfo]
"""Return a sequence of AlbumInfo objects that match the
album whose items are provided or are being searched.
"""
def candidates(
self, items: List[library.Item], artist: str, album: str, *_: Any, **__: Any
) -> Iterable[AlbumInfo]:
"""Return a sequence of album candidates matching given artist and album."""
label = ""
if items and album == items[0].album and artist == items[0].albumartist:
label = items[0].label
url = self._find_url(items[0], album, "album")
url = self._find_url_in_item(items[0], album, "album")
if url:
initial_guess = self.get_album_info(url)
if initial_guess:
Expand All @@ -204,17 +222,13 @@ def candidates(self, items, artist, album, va_likely, extra_tags=None):

search = {"query": album, "artist": artist, "label": label, "search_type": "a"}
results = map(itemgetter("url"), self._search(search))
for res in filter(truth, map(self.get_album_info, results)):
yield from res or [None]

def item_candidates(self, item, artist, title):
# type: (library.Item, str, str) -> Iterable[TrackInfo]
"""Return a sequence of TrackInfo objects that match the provided item.
If the track was downloaded directly from bandcamp, it should contain
a comment saying 'Visit <label-url>' - we look at this first by converting
title into the format that Bandcamp use.
"""
url = self._find_url(item, title, "track")
yield from chain.from_iterable(filter(None, map(self.get_album_info, results)))

def item_candidates(
self, item: library.Item, artist: str, title: str
) -> Iterable[TrackInfo]:
"""Return a sequence of singleton candidates matching given artist and title."""
url = self._find_url_in_item(item, title, "track")
label = ""
if item and title == item.title and artist == item.artist:
label = item.label
Expand All @@ -225,9 +239,9 @@ def item_candidates(self, item, artist, title):

search = {"query": title, "artist": artist, "label": label, "search_type": "t"}
results = map(itemgetter("url"), self._search(search))
yield from filter(truth, map(self.get_track_info, results))
yield from filter(None, map(self.get_track_info, results))

def album_for_id(self, album_id: str) -> Optional[AlbumInfo]:
def album_for_id(self, album_id: str) -> AlbumInfo | None:
"""Fetch an album by its bandcamp ID."""
if not _from_bandcamp(album_id):
self._info("Not a bandcamp URL, skipping")
Expand All @@ -244,28 +258,32 @@ def album_for_id(self, album_id: str) -> Optional[AlbumInfo]:
albums = sorted(albums, key=lambda x: pref_to_idx.get(x.media, 100))
return albums[0]

def track_for_id(self, track_id: str) -> Optional[TrackInfo]:
def track_for_id(self, track_id: str) -> TrackInfo | None:
"""Fetch a track by its bandcamp ID."""
if _from_bandcamp(track_id):
return self.get_track_info(track_id)

self._info("Not a bandcamp URL, skipping")
return None

def get_album_info(self, url: str) -> Optional[List[AlbumInfo]]:
def get_album_info(self, url: str) -> List[AlbumInfo] | None:
"""Return an AlbumInfo object for a bandcamp album page.

If track url is given by mistake, find and fetch the album url instead.
"""
html = self._get(url)
if html and "/track/" in url:
m = ALBUM_URL_IN_TRACK.search(html)
if m:
url = re.sub(r"/track/.*", m.expand(r"\1"), url)
return self.guru(url, "albums")

def get_track_info(self, url: str) -> Optional[TrackInfo]:
"""Returns a TrackInfo object for a bandcamp track page."""
return self.guru(url, "singleton")
with self.handle_error(url):
return self.guru(url).albums

def get_track_info(self, url: str) -> TrackInfo | None:
"""Return a TrackInfo object for a bandcamp track page."""
with self.handle_error(url):
return self.guru(url).singleton

def _search(self, data: JSONDict) -> Iterable[JSONDict]:
"""Return a list of track/album URLs of type search_type matching the query."""
Expand All @@ -275,7 +293,7 @@ def _search(self, data: JSONDict) -> Iterable[JSONDict]:
return results[: self.config["search_max"].as_number()]


def get_args(args: List[str]) -> Any:
def get_args() -> Any:
from argparse import Action, ArgumentParser

if TYPE_CHECKING:
Expand All @@ -292,8 +310,13 @@ def get_args(args: List[str]) -> Any:
)

class UrlOrQueryAction(Action):
def __call__(self, parser, namespace, values, option_string=None):
# type: (Any, Namespace, Any, Any) -> None
def __call__(
self,
parser: ArgumentParser,
namespace: Namespace,
values: Any,
option_string: str | None = None,
) -> None:
if values:
if values.startswith("https://"):
target = "release_url"
Expand All @@ -302,7 +325,7 @@ def __call__(self, parser, namespace, values, option_string=None):
del namespace.release_url
setattr(namespace, target, values)

exclusive = parser.add_mutually_exclusive_group()
exclusive = parser.add_mutually_exclusive_group(required=True)
exclusive.add_argument(
"release_url",
action=UrlOrQueryAction,
Expand All @@ -327,17 +350,14 @@ def __call__(self, parser, namespace, values, option_string=None):
type=int,
help="Open search result indexed by INDEX in the browser",
)
if not args:
parser.print_help()
parser.exit()
return parser.parse_args(args=args)

return parser.parse_args()


def main() -> None:
import json
import sys

args = get_args(sys.argv[1:])
args = get_args()

search_vars = vars(args)
index = search_vars.pop("index", None)
Expand Down
Loading
Loading