Releases · explosion/spaCy

28 Jun 16:17

adrianeboyd

v3.5.4

7a2833b

v3.5.4: Bug fixes for overrides with registered functions and sourced components with listeners

✨ New features and improvements

Extend Typer support to v0.9 (#12631).

🔴 Bug fixes

#12701: Fix issues with component names and listeners for sourced components.
#12623: Support overrides for registered functions in configs.

👥 Contributors

@adrianeboyd, @bdura, @honnibal, @ines, @svlandeg

Contributors

adrianeboyd, honnibal, and 3 other contributors

Assets 2

25 May 08:37

adrianeboyd

v3.3.3

fa9d24e

v3.3.3: Bug fixes for Pydantic and pip

This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0.

✨ New features and improvements

Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).

🔴 Bug fixes

Add typing_extensions requirement due to Pydantic incompatibility with typing_extensions>=4.6.0.
Remove #egg from download URLs due to future deprecation in pip.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @kadarakos, @svlandeg

Contributors

kadarakos, adrianeboyd, and 3 other contributors

Assets 2

25 May 11:16

adrianeboyd

v3.2.6

0fc87f6

v3.2.6: Bug fixes for Pydantic and pip

This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0.

✨ New features and improvements

Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).

🔴 Bug fixes

Add typing_extensions requirement due to Pydantic incompatibility with typing_extensions>=4.6.0.
Remove #egg from download URLs due to future deprecation in pip.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @kadarakos, @svlandeg

Contributors

kadarakos, adrianeboyd, and 3 other contributors

Assets 2

15 May 09:59

adrianeboyd

v3.5.3

512241e

v3.5.3: Speed improvements, bug fixes and more

✨ New features and improvements

Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).
Improve speed for child operators (>+, >-, >++, >--) for the dependency matcher (#12528).
Improve loading speed for tokenizers with a large number of exceptions (#12553).
Support doc.spans for displaCy output in spacy benchmark accuracy / spacy evaluate (#12575).
Add MorphAnalysis.get(default=) argument for user-provided default values similar to dict (#12545).
Only perform vectors checks during initialization if there are sourced components (#12607).

🔴 Bug fixes

#12567: Remove #egg from download URLs due to future deprecation in pip.

📖 Documentation and examples

Various documentation corrections and updates.
New additions to spaCy Universe:
- LatinCy
- parsigs
- spaCysee
- spacy-wasm

👥 Contributors

@adrianeboyd, @andyjessen, @bdura, @davidberenstein1957, @diyclassics, @honnibal, @ines, @kadarakos, @KennethEnevoldsen, @ljvmiranda921, @moxley01, @royashcenazi, @svlandeg, @tanloong, @victorialslocum

Contributors

kadarakos, adrianeboyd, and 13 other contributors

Assets 2

12 Apr 07:50

adrianeboyd

v3.5.2

aea4a96

v3.5.2: Pretraining improvements, bug fixes for spans and spancat and more

✨ New features and improvements

Add support for floret vectors in spacy pretrain (#12435).
Save final model as model-last.bin for spacy pretrain (#12459).
Support Span input for displacy.parse_deps (#12477).
Extend support to CuPy 12.0 for cupy install extras.

🔴 Bug fixes

#12398: Fix entity linker failure on sentence-crossing entities.
#12405: Fix sentence indexing bug in Span.sents.
#12469: Fix scores attribute for spancat_singlelabel.
#12484: Fix Span.sents when the final sentence is the last token in a Doc.
#12486: Fix pickle for the ngram suggester.
#12493: Include Span.kb_id and Span.id strings in Doc and DocBin serialization.

📖 Documentation and examples

Various documentation corrections and updates.
New addition to spaCy Universe:
- Sentimental Onix

👥 Contributors

@adrianeboyd, @BLKSerene, @honnibal, @ines, @kadarakos, @prajakta-1527, @rmitsch, @shadeMe, @sloev, @svlandeg, @thomashacker, @willfrey

Contributors

shadeMe, sloev, and 10 other contributors

Assets 2

10 Mar 09:02

adrianeboyd

v3.5.1

8153bd5

v3.5.1: spancat for multi-class labeling, fixes for textcat+transformers and more

💥 We'd love to hear more about your experience with spaCy! Take our survey here.

✨ New features and improvements

NEW: spancat_singlelabel pipeline component for multi-class and non-overlapping span classification. The spancat_singlelabel component predicts at most one label for each suggested span and adds a new setting allow_overlap to restrict the output to non-overlapping spans (#11365).
Extend to mypy v1.0 (#12245).
Use transformer + CNN for efficient GPU textcat with spacy init config (#11900).
Support trainable lemmatizer in spacy debug data (#11419).
Add new operators to dependency matcher for left/right immediate child/parent nodes (>+, >-, <+, <-) (#12334).
Add spacy.PlainTextCorpusReader.v1 for plain text input (#12122).
Add alignment_mode and span_id to Span.char_span() (#12145, #12196).
Use string formatting types in logging calls (#12215).

🔴 Bug fixes

#12017: Improve speed for top_k>1 in trainable lemmatizer.
#12048: Make test_cli_find_threshold() test more robust.
#12227: Fix return type of registry.find().
#12272: Fix speed regression for Matcher patterns with extension attributes.
#12287: Add grc to languages with lexeme norms in spacy-lookups-data.
#12320: Make generation of empty KnowledgeBase instances configurable.
#12343: Fix error message for displacy auto_select_port.
#12347: Fix length check for knowledge base in entity linker, add InMemoryLookupKB.is_empty.
#12365: Fix types for Lexeme.orth and Lexeme.lower.
#12366: Raise error for non-default vectors with PretrainVectors.
#12368: Partially address pending deprecation of pkg_resources.
Various improvements and fixes for the test suite (#12148, #12157, #12210, #12303, #12372).

📖 Documentation and examples

Many website updates to improve accessibility.
Various documentation corrections and updates.
New projects:
- Span labeling datasets
- Comparing embedding layers in spaCy from the technical report Multi hash embeddings in spaCy

👥 Contributors

@adrianeboyd, @andyjessen, @danieldk, @essenmitsosse, @honnibal, @ines, @itssimon, @kadarakos, @kwhumphreys, @ljvmiranda921, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @shadeMe, @svlandeg, @tanloong, @thomashacker, @victorialslocum

Contributors

danieldk, shadeMe, and 17 other contributors

Assets 2

20 Jan 09:56

adrianeboyd

v3.5.0

dec8150

v3.5.0: New CLI commands, language updates, bug fixes and much more

✨ New features and improvements

NEW: New apply CLI command to annotate new documents with a trained pipeline (#11376).
NEW: New benchmark CLI command to benchmark pipelines. The new benchmark speed subcommand measures the speed of a pipeline, the benchmark accuracy subcommand is a new alias for evaluate (#11902).
NEW: New find-threshold CLI command to identify an optimal threshold for classification models (#11280).
NEW: New FUZZY Matcher operator for fuzzy matches based on Levenshtein edit distance. In addition, the FUZZY and REGEX operators are now supported in combination with IN/NOT_IN. (#11359).
Language updates for Ancient Greek, Dutch, Russian, Slovenian and Ukrainian (#11345, #11162, #11426, #11753, #11811, #11997, more details below).
Allow up to typer v0.7.x (#11720), mypy 0.990 (#11801) and typing_extensions v4.4.x (#12036).
New spacy.ConsoleLogger.v3 with expanded progress tracking (#11972).
Improved scoring behavior for textcat with spacy.textcat_scorer.v2 (#11696 and #11971) and spacy.textcat_multilabel_scorer.v2 (#11820).
Improved customizability of the knowledge base used for entity linking, with the default implementation being the new InMemoryLookupKB (#11268).
Optional before_update callback that is invoked at the start of each training step (#11739).
Improve performance of SpanGroup (#11380).
Improve UX around displacy.serve when the default port is in use (#11948).
Patch a security vulnerability in extracting tar files (#11746).
Add equality definition for vectors (#11806).
Allow interpolation of variables in directory names in projects (#11235).
Update default component configs to use the latest tok2vec version (#11618).

🔴 Bug fixes

#11382: Fix lookup behavior for the French and Catalan lemmatizers.
#11385: Ensure that downstream components can train properly on a frozen tok2vec or transformer layer.
#11762: Support local file system remotes for projects.
#11763: Raise an error when unsupported values are used for textcat.
#11834: Ensure Vocab.to_disk respects the exclude setting for lookups and vectors.
#12009: Fix a few typing issues for SpanGroup and Span objects.
#12098: Correctly handle missing annotations in the edit tree lemmatizer.

⚠️ Backwards incompatibilities and model updates

The following changes may require you to update code that is using the relevant functionality:

An error is now raised when unsupported values are given as input to train a textcat or textcat_multilabel model - ensure that values are 0.0 or 1.0 as explained in the docs.
As KnowledgeBase is now an abstract class, you should call the constructor of the new InMemoryLookupKB instead when you want to use spaCy's default KB implementation. If you've written a custom KB that inherits from KnowledgeBase, you'll need to implement its abstract methods, or alternatively inherit from InMemoryLookupKB instead.

The following changes may influence the output of your language pipeline or trained models:

Updates to language defaults:
- Extended support for Slovenian (#11162).
- Switch Russian and Ukrainian lemmatizers to pymorphy3 (#11345, #11811).
- Support for editorial punctuation in Ancient Greek (#11426).
- Update to Russian tokenizer exceptions (#11753).
- Small fix in the list of Dutch stop words (#11997).
Updates to model defaults:
- Use the latest tok2vec defaults in all components (#11618).
- Improve the default attributes used for the textcat and textcat_multilabel components (#11698).
- Update the default scorer for textcat and textcat_multilabel to fix a bug related to threshold for textcat and to make it possible to score multiple textcat/textcat_multilabel components in a single pipeline with custom scorers. If no custom scorers are used, the cat_p/r/f scores will now only reflect the final component's labels and performance (#11696, #11820).
- Correct the token_acc score to report the intended measure (# correct tokens / # predicted tokens, the same as in spaCy v2). The token_acc scores for v3.5 will be lower for the same performance because they were incorrectly inflated in v3.0-v3.4. The token_p/r/f scores should remain unchanged (#12073).

The following functionality will be changed in the near future - so it's best to start updating your scripts now to make them more generic:

From v4 onwards, we'll rename the master branch to main.

📦 Trained pipelines updates

The CNN pipelines add IS_SPACE as a tok2vec feature for tagger and morphologizer components to improve tagging of non-whitespace vs. whitespace tokens.
The transformer pipelines require spacy-transformers v1.2, which uses the exact alignment from tokenizers for fast tokenizers instead of the heuristic alignment from spacy-alignments. For all trained pipelines except ja_core_news_trf, the alignments between spaCy tokens and transformer tokens may be slightly different. More details about the spacy-transformers changes in the v1.2.0 release notes.

📖 Documentation and examples

We've ported our website from Gatsby to Next 🥳
Updated the documentation on supported languages.
Added a note about experimental M1 GPU support to the installation quickstart.
Included documentation for the biluo_to_iob and iob_to_biluo functions.
Fixed model links in the v3.4 usage documentation.
Removed "new" tags of functionality from spaCy v2.x.
Various small additions, spelling and typo fixes.
spaCy Universe additions:
- greCy: Providing Ancient Greek models
- spacy-pythainlp: Add Thai support for spaCy
New projects:
- Accelerate NER with Speedster (experimental)

👥 Contributors

@aaronzipp, @adrianeboyd, @albertvillanova, @ArchiDevil, @cfuerbachersparks, @damian-romero, @danieldk, @darigovresearch, @DSLituiev, @essenmitsosse, @gremur, @honnibal, @ines, @jmyerston, @JosPolfliet, @kadarakos, @koaning, @kwhumphreys, @ljvmiranda921, @MarcoGorelli, @orglce, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @ryndaniels, @shadeMe, @svlandeg, @thomashacker, @TrellixVulnTeam, @wannaphong, @zhiiw, @zrpxx

Contributors

danieldk, shadeMe, and 31 other contributors

Assets 2

16 Dec 07:53

adrianeboyd

v3.0.9

c83dfa2

v3.0.9: Bug fixes and future NumPy compatibility

This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.

🔴 Bug fixes

#11331, #11701: Clean up warnings in spaCy and its test suite.
#11845: Don't raise an error in displaCy for unset spans keys.
#11864: Add smart_open requirement and update deprecated options.
#11899: Fix spacy init config --gpu for environments without spacy-transformers.
#11933: Update for compatibility with NumPy v1.24+ integer conversions.
#11935: Restore missing error messages for beam search.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @polm, @svlandeg

Contributors

polm, adrianeboyd, and 3 other contributors

Assets 2

16 Dec 07:55

adrianeboyd

v2.3.9

a70b5c1

v2.3.9: Compatibility with NumPy v1.24+

This release addresses future compatibility with NumPy v1.24+.

🔴 Bug fixes

#11940: Update for compatibility with NumPy v1.24+ integer conversions.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @svlandeg

Contributors

adrianeboyd, honnibal, and 2 other contributors

Assets 2

14 Dec 15:06

adrianeboyd

v3.4.4

77833bf

v3.4.4: Bug fixes and future NumPy compatibility

This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.

🔴 Bug fixes

#11845: Don't raise an error in displaCy for unset spans keys.
#11860: Fix spancat for docs with zero suggestions.
#11864: Add smart_open requirement and update deprecated options.
#11899: Fix spacy init config --gpu for environments without spacy-transformers.
#11933: Update for compatibility with NumPy v1.24+ integer conversions.
#11934: Add strings when initializing from labels in EditTreeLemmatizer.
#11935: Restore missing error messages for beam search.

👥 Contributors

@adrianeboyd, @danieldk, @honnibal, @ines, @polm, @svlandeg

Contributors

danieldk, polm, and 4 other contributors

Assets 2

Releases: explosion/spaCy

v3.5.4: Bug fixes for overrides with registered functions and sourced components with listeners

✨ New features and improvements

🔴 Bug fixes

👥 Contributors

Contributors

v3.3.3: Bug fixes for Pydantic and pip

✨ New features and improvements

🔴 Bug fixes

👥 Contributors

Contributors

v3.2.6: Bug fixes for Pydantic and pip

✨ New features and improvements

🔴 Bug fixes

👥 Contributors

Contributors

v3.5.3: Speed improvements, bug fixes and more

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

Contributors

v3.5.2: Pretraining improvements, bug fixes for spans and spancat and more

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

Contributors

v3.5.1: spancat for multi-class labeling, fixes for textcat+transformers and more

✨ New features and improvements

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

Contributors

v3.5.0: New CLI commands, language updates, bug fixes and much more

✨ New features and improvements

🔴 Bug fixes

⚠️ Backwards incompatibilities and model updates

📦 Trained pipelines updates

📖 Documentation and examples

👥 Contributors

Contributors

v3.0.9: Bug fixes and future NumPy compatibility

🔴 Bug fixes

👥 Contributors

Contributors

v2.3.9: Compatibility with NumPy v1.24+

🔴 Bug fixes

👥 Contributors

Contributors

v3.4.4: Bug fixes and future NumPy compatibility

🔴 Bug fixes

👥 Contributors

Contributors