Release v3.6.0: New span finder component and pipelines for Slovenian · explosion/spaCy

✨ New features and improvements

NEW: span_finder pipeline component to identify overlapping, unlabeled spans (#12507).
Language updates:
- Add initial support for Malay (#12602).
- Update Latin defaults to support noun chunks, update lexical/tokenizer defaults and add example sentences (#12538).
Add option to return scores separately keyed by component name with spacy evaluate --per-component, Language.evaluate(per_component=True) and Scorer.score(per_component=True) (#12540).
Support custom token/lexeme attribute for vectors (#12625).
Support spancat_singlelabel in spacy debug data CLI (#12749).
Typing updates for PhraseMatcher and SpanGroup (#12642, #12714).

We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.

Package	UPOS	Parser LAS	NER F
`sl_core_news_sm`	96.9	82.1	62.9
`sl_core_news_md`	97.6	84.3	73.5
`sl_core_news_lg`	97.7	84.3	79.0
`sl_core_news_trf`	99.0	91.7	90.0

The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.

The Danish pipeline da_core_news_trf has been updated to use vesteinn/DanskBERT with performance improvements across the board.

SpanGroup spans are now required to be from the same doc. When initializing a SpanGroup, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.