Release v3.2.1: doc_cleaner component, new Matcher attributes, bug fixes and more · explosion/spaCy

✨ New features and improvements

NEW: doc_cleaner component for removing doc.tensor,doc._._trf_data or other Doc attributes at the end of the pipeline to reduce size of output docs.
NEW: ENT_ID and ENT_KB_ID to Matcher pattern attributes.
Support kb_id for entities in displaCy from Doc input.
Add Span.sents property for spans spanning over more than one sentence.
Add EntityRuler.remove to remove patterns by id.
Make the Tagger neg_prefix configurable.
Use Language.pipe in Language.evaluate for more efficient processing.
Test suite updates: move regression tests into core test modules with pytest markers for issue numbers, extend tests for languages with alpha support.

Fix issue #9638: Make JsonlCorpus path optional again.
Fix issue #9654: Fix spancat for empty docs and zero suggestions.
Fix issue #9658: Improve error message for incorrect .jsonl paths in EntityRuler.
Fix issue #9674: Fix language-specific factory handling in package CLI.
Fix issue #9694: Convert labels to strings for README in package CLI.
Fix issue #9697: Exclude strings from source vector checks.
Fix issue #9701: Allow Scorer.score_spans to handle predicted docs with missing annotation.
Fix issue #9722: Initialize parser from reference parse rather than aligned example.
Fix issue #9764: Set annotations more efficiently in tagger and morphologizer.

Various documentation updates: init_tok2vec after pretraining, batch contract for listeners.
New additions to the spaCy universe:
- eng-spacysentiment: Sentiment analysis for English.
- Applied Language Technology course: NLP for newcomers using spaCy and Stanza.