ERROR :Aborting and saving the final best model. Encountered exception: MemoryError((4572849, 288), dtype('float32')) #9290

taghouti-ghofrane · 2021-09-25T11:00:30Z

taghouti-ghofrane
Sep 25, 2021

How to reproduce the behaviour

i Create a Train data and i was trying to Trin the model
an error that i dont know how to fixe it is :
thinc.backends.numpy_ops.NumpyOps.seq2col
File "thinc\backends\numpy_ops.pyx", line 80, in thinc.backends.numpy_ops.NumpyOps.alloc
numpy.core._exceptions.MemoryError: Unable to allocate 4.91 GiB for an array with shape (4572849, 288) and data type float32

⚠ Aborting and saving the final best model. Encountered exception:
MemoryError((4572849, 288), dtype('float32'))

Your Environment

Python Version Used: 3.9.0
spaCy Version Used:3 .1.3

Answered by taghouti-ghofrane

Sep 27, 2021

for the size of my new batch in v2 == 1110 KO == (1 136 398 octets) well i used long texts cause as i know text without entities is good for teaching the model so the model understand that token is not an entity ... maybe i should add some sentences with empty entities in the Train data. ( This is what i know with Spacy 2 in Train data format ) Le lun. 27 sept. 2021 à 06:55, polm ***@***.***> a écrit :

View full answer

polm · 2021-09-26T04:47:23Z

polm
Sep 26, 2021

It sounds like you just ran out of memory. How much RAM do you have?

0 replies

taghouti-ghofrane · 2021-09-26T07:30:39Z

taghouti-ghofrane
Sep 26, 2021
Author

16266MB RAM

1 reply

polm Sep 26, 2021

OK. How much training data do you have? Are you sure you didn't just run out of memory?

If you have more data than you can fit in memory you can use a custom corpus reader to work around that.

taghouti-ghofrane · 2021-09-26T08:17:58Z

taghouti-ghofrane
Sep 26, 2021
Author

my Trianing data == 12,0 Mo Le dim. 26 sept. 2021 à 09:37, polm ***@***.***> a écrit :

…

OK. How much training data do you have? Are you sure you didn't just run out of memory? If you have more data than you can fit in memory you can use a custom corpus reader <https://spacy.io/usage/training#custom-code-readers-batchers> to work around that. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9290 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK4GZVSVA4IKZK366ZN3LRTUD3EVBANCNFSM5EYLHIGQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- [image: photo] *Taghouti Ghofrane* DATA SCIENTIST +21626803526 Create your own email signature <https://www.wisestamp.com/create-own-signature/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=> ‌

0 replies

taghouti-ghofrane · 2021-09-26T08:32:29Z

taghouti-ghofrane
Sep 26, 2021
Author

i think i have the some prb here
#7593

1 reply

polm Sep 26, 2021

What does "12,0 Mo" mean? Is that 12MB? Even if it is that doesn't tell me much - what kind of model are you training, is it one big document or lots of little documents, etc.

The issue you linked to is about a potential memory leak with Transformers. But your memory error is from numpy, which is for CPU training, not GPU training, so I don't think it's related.

Can you share your training config?

mbrunecky · 2021-09-26T20:08:36Z

mbrunecky
Sep 26, 2021

There was some posting about using batch sizes to control resource utilization (not to run out of memory).
I am using ner and spancat, and I have (in same cases) run out of 64GB my machine has (I am mostly out of 6GB of my GPU). The remedy was reducing batch_size (which I read merely controls the size of validation batch in training) and also reduce size of 'minibatch' for training:

[training.batcher.size]
@schedules = "compounding.v1"
start = 50
stop = 500

Using this technique, I was able to reduce NER GPU resource utilization to the extent of being able to 'train' using my 6GB GPU. But the problem is that reducing batch sizes increases overhead (my GPU test took longer than using CPU with 'default' batcher sizes).

With 'spancat' it gets even worse, memory usage goes up with the span sizes. Because I use span sizes 1 - 14, my runs eat memory like crazy.

0 replies

taghouti-ghofrane · 2021-09-26T21:21:54Z

taghouti-ghofrane
Sep 26, 2021
Author

Okay , For my Training config : ``` [paths] train = null dev = null vectors = null init_tok2vec = null [system] gpu_allocator = null seed = 0 [nlp] lang = "en" pipeline = ["tok2vec","ner"] batch_size = 1000 disabled = [] before_creation = null after_creation = null after_pipeline_creation = null tokenizer = ***@***.***":"spacy.Tokenizer.v1"} [components] [components.ner] factory = "ner" incorrect_spans_key = null moves = null update_with_oracle_cut_size = 100 [components.ner.model] @architectures = "spacy.TransitionBasedParser.v2" state_type = "ner" extra_state_tokens = false hidden_width = 64 maxout_pieces = 2 use_upper = true nO = null [components.ner.model.tok2vec] @architectures = "spacy.Tok2VecListener.v1" width = ${components.tok2vec.model.encode.width} upstream = "*" [components.tok2vec] factory = "tok2vec" [components.tok2vec.model] @architectures = "spacy.Tok2Vec.v2" [components.tok2vec.model.embed] @architectures = "spacy.MultiHashEmbed.v2" width = ${components.tok2vec.model.encode.width} attrs = ["NORM","PREFIX","SUFFIX","SHAPE"] rows = [5000,2500,2500,2500] include_static_vectors = false [components.tok2vec.model.encode] @architectures = "spacy.MaxoutWindowEncoder.v2" width = 96 depth = 4 window_size = 1 maxout_pieces = 3 [corpora] [corpora.dev] @readers = "spacy.Corpus.v1" path = ${paths.dev} max_length = 0 gold_preproc = false limit = 0 augmenter = null [corpora.train] @readers = "spacy.Corpus.v1" path = ${paths.train} max_length = 0 gold_preproc = false limit = 0 augmenter = null [training] dev_corpus = "corpora.dev" train_corpus = "corpora.train" seed = ${system.seed} gpu_allocator = ${system.gpu_allocator} dropout = 0.1 accumulate_gradient = 1 patience = 1600 max_epochs = 0 max_steps = 20000 eval_frequency = 200 frozen_components = [] annotating_components = [] before_to_disk = null [training.batcher] @batchers = "spacy.batch_by_words.v1" discard_oversize = false tolerance = 0.2 get_length = null [training.batcher.size] @schedules = "compounding.v1" start = 100 stop = 1000 compound = 1.001 t = 0.0 [training.logger] @Loggers = "spacy.ConsoleLogger.v1" progress_bar = false [training.optimizer] @optimizers = "Adam.v1" beta1 = 0.9 beta2 = 0.999 L2_is_weight_decay = true L2 = 0.01 grad_clip = 1.0 use_averages = false eps = 0.00000001 learn_rate = 0.001 [training.score_weights] ents_f = 1.0 ents_p = 0.0 ents_r = 0.0 ents_per_type = null [pretraining] [initialize] vectors = ${paths.vectors} init_tok2vec = ${paths.init_tok2vec} vocab_data = null lookups = null before_init = null after_init = null [initialize.components] [initialize.tokenizer] ``` Else : my Trainingdata is based in several of big docs and their annotations , well i changed tha Training data just with sentences and their annotation and it worked ... Model is Trained butr still confused with the error that i had first , cause with spacy 2.2.4 i trained the same Train data and i didnt get the prb

2 replies

mbrunecky Sep 27, 2021

In my experience, Spacy 2 and Spacy 3 NER behave differently, despite claims they are 'same' (sans a fix in excluding some 'impossible' entities). I am consistently getting about 5% worse NER scores with Spacy 3 (I mean 0.82 instead of 0.87), despite 6 months of trying to 'resolve' the problem. I even tried to check some of the hyperparameters, especially w/regards to tok2vec, and they seem to be the same.
So I am not surprised the memory usage differs.
In my case, I did all my Spacy 2 training using 6GB GPU. With Spacy 3 I can't use the same GPU at all - I run out of GPU memory unless I reduce the batches to impossibly small.

polm Sep 27, 2021

@taghouti-ghofrane OK, glad you figured out how to train! A lot has changed between 2.2.4 and 3.1.2, so it's hard to say exactly why it doesn't work anymore. What was your batch size in v2? Maybe 1000 is too large if you have very long documents?

I will say that in general very long documents (more than a few thousand words) do cause problems some times, and we haven't tested them as thoroughly as shorter paragraph-length documents. A big part of that is that there's not much benefit to using really long documents as context, so if there's a natural way to make your documents shorter that may be better even if you figured out how to train on long documents.

taghouti-ghofrane · 2021-09-27T07:24:43Z

taghouti-ghofrane
Sep 27, 2021
Author

for the size of my new batch in v2 == 1110 KO == (1 136 398 octets) well i used long texts cause as i know text without entities is good for teaching the model so the model understand that token is not an entity ... maybe i should add some sentences with empty entities in the Train data. ( This is what i know with Spacy 2 in Train data format ) Le lun. 27 sept. 2021 à 06:55, polm ***@***.***> a écrit :

…

@taghouti-ghofrane <https://github.com/taghouti-ghofrane> OK, glad you figured out how to train! A lot has changed between 2.2.4 and 3.1.2, so it's hard to say exactly why it doesn't work anymore. What was your batch size in v2? Maybe 1000 is too large if you have very long documents? I will say that in general very long documents (more than a few thousand words) do cause problems some times, and we haven't tested them as thoroughly as shorter paragraph-length documents. A big part of that is that there's not much benefit to using really long documents as context, so if there's a natural way to make your documents shorter that may be better even if you figured out how to train on long documents. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9290 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK4GZVUJJGLL2K6KFMPTW6LUD72MTANCNFSM5EYLHIGQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- [image: photo] *Taghouti Ghofrane* DATA SCIENTIST +21626803526 Create your own email signature <https://www.wisestamp.com/create-own-signature/?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own&srcid=> ‌

1 reply

polm Sep 27, 2021

It is important to have examples without entities, assuming your real data will have examples without entities (which is usually the case). However that doesn't really have anything to do with document length - you can have lots of short documents, some with entities and some without.

The main thing is that 1. your training data should be as much like your real data as possible 2. for performance reasons, it's easier if you cut your inputs down to not be larger than paragraph size or so.

taghouti-ghofrane · 2021-09-27T09:19:44Z

taghouti-ghofrane
Sep 27, 2021
Author

Got it thank you for your time ^^ Le lun. 27 sept. 2021 à 10:11, polm ***@***.***> a écrit :

…

It is important to have examples without entities, assuming your real data will have examples without entities (which is usually the case). However that doesn't really have anything to do with document length - you can have lots of short documents, some with entities and some without. The main thing is that 1. your training data should be as much like your real data as possible 2. for performance reasons, it's easier if you cut your inputs down to not be larger than paragraph size or so. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9290 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK4GZVQ44ULIHSGCOAKLDSDUEAYMHANCNFSM5EYLHIGQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR :Aborting and saving the final best model. Encountered exception: MemoryError((4572849, 288), dtype('float32')) #9290

{{title}}

Replies: 8 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ERROR :Aborting and saving the final best model. Encountered exception: MemoryError((4572849, 288), dtype('float32')) #9290

How to reproduce the behaviour

Your Environment

Replies: 8 comments · 5 replies

taghouti-ghofrane Sep 26, 2021 Author

taghouti-ghofrane Sep 26, 2021 Author

taghouti-ghofrane Sep 26, 2021 Author

taghouti-ghofrane Sep 26, 2021 Author

taghouti-ghofrane Sep 27, 2021 Author

taghouti-ghofrane Sep 27, 2021 Author

Replies: 8 comments 5 replies

taghouti-ghofrane
Sep 26, 2021
Author

taghouti-ghofrane
Sep 26, 2021
Author

taghouti-ghofrane
Sep 26, 2021
Author

taghouti-ghofrane
Sep 26, 2021
Author

taghouti-ghofrane
Sep 27, 2021
Author

taghouti-ghofrane
Sep 27, 2021
Author