memory leak #32

lancecarlson · 2013-05-07T13:11:41Z

I have not been able to figure out where exactly in the chain I get this memory leak, but whenever I pipe request's JSONStream from couchdb _all_docs (about 85k docs), it eats up my memory like crazy. Any one else experience this? From my limited experience creating custom stream objects, the memory should stay fairly consistent as it is parsing I think?

Example code:

var down = JSONStream.parse(["rows", true, "doc"])
request.pipe(down)

The text was updated successfully, but these errors were encountered:

dominictarr · 2013-05-07T15:33:30Z

cool, I was doing this with npm data set, which is only 30k, and at first it didn't work then I fixed it so that it did.

https://github.com/dominictarr/JSONStream/blob/master/index.js#L76

made it work, but to be honest - I was only guessing.
maybe you can make a similar hack that solves your case and still passes the tests?

lancecarlson · 2013-05-07T17:08:57Z

I'm currently running 0.6.4 which includes that line, but I'm still having serious memory leak issues.

I did run an interesting experiment though. I think the biggest culprit is the underlying jsonparse library. When running the same data set against a custom streamer plus the parser, I observed similar memory leak behavior:

request = require('request'),
Stream = require('stream').Stream,
Parser = require('jsonparse');

var p = new Parser();
p.onValue = function(value) {
  console.log(value)
}

var down = new Stream()
down.writable = true
down.write = function(data) {
  p.write(data)
  return true
}
down.end = function() {
  console.log('end')
}

var host = process.env.DB_HOST
var path = '_all_docs?include_docs=true'
var url = host + '/' + dbName + '/' + path
request(url).pipe(down)

Memory leak steadily increases from 50 to 260MB's, then midway through it jumps to 500-600MB's. I feel like the json parser queues up data for something then does something else with the queued data afterwards.

I'm going to post a ticket with that project, but do you have any ideas why this might be happening?

dominictarr · 2013-05-07T17:16:31Z

well, that is by design actually! it's json-parse is trying to collect the whole object!

we usually want to throw out the other stuff...
I don't really have time to work on this, perhaps we can run a https://www.bountysource.com/ bounty on this problem?

lancecarlson · 2013-05-07T17:21:57Z

Do you know why it was implemented this way?

Bountysource.com looks really awesome! Browsing it now.. :)

dominictarr · 2013-05-07T18:12:30Z

hmm, I guess that just seemed like the right idea at the time...

lancecarlson · 2013-05-07T18:14:26Z

BTW, I'm looking at the Sax parser gist referenced on the jsonparse project README. From my initial testing, it has much better performance characteristics. Do you think it would be possible to port JSONStream to using the sax parser instead?

https://gist.github.com/creationix/1821394

dominictarr · 2013-05-07T18:19:55Z

hmm, you'd just have to match the paths... noting when you go into or come out of an object, and whether that is into a path your are interested in... (and only collecting a whole object when it matches the end of your path!)

seems reasonable - If that had been exposed at the time I probably would have done it like that...

Will be happy to merge if you where to undertake this rewrite!

lancecarlson · 2013-05-07T18:25:32Z

Looking into this right now. Will keep you posted!

lancecarlson · 2013-05-07T19:07:12Z

BTW, do you think it would be useful to have another fixture that has maybe 20k objects or more to more easily expose big data memory leaks?

dominictarr · 2013-05-07T20:06:12Z

absolutely, but it's best to generate it, rather than check it in though.
also checkout this module : https://npm.im/memwatch

this script generates a large test object:

https://github.com/dominictarr/JSONStream/blob/master/test/gen.js

nickpoorman · 2013-08-21T17:16:40Z

I just updated all the dependencies and devDependencies in package.json and ran the gen.js test at 5 million and it passed without running out of memory.

alessioalex · 2013-11-30T17:34:09Z

@nickpoorman would you mind making a pull request with the updated dependencies? thanks

dominictarr · 2014-08-22T21:14:26Z

GOOD NEWS.

I have a fix to the memory leak on the memory-leak branch. To test,
download all npm docs (currently a 800mb json object)

curl 'http://isaacs.iriscouch.com/registry/_all_docs?include_docs=true' > npm.json

then, if you pipe that to JSONStream, and filter out a deep path (this part is important)

cd JSONStream
./index.js rows.*.doc.versions.*._npmUser < npm.json

and then use top to monitor memory usage...
on the master branch, this used 30% of memory and then eventually became very sluggish (GC) and then crashed.
on the memory-leak branch this sits at 1% of memory and is fast!

The only problem is that the tests for @PaulMougel 's #33
break, so I we need to get that working before I publish this.

felipesabino · 2014-12-16T22:39:25Z

Is this still an issue with JSONStream?

dominictarr · 2014-12-17T20:32:03Z

there is still a little bit of work to do here, as described in my last comment.

PaulMougel · 2014-12-18T08:32:19Z

Sorry about that, let me look into it.

PaulMougel · 2014-12-18T09:00:21Z

It's been a long time since I looked at this chunk of code; I don't quite remember why j would be greater than the stack's length.

This dumb diff

diff --git a/index.js b/index.js
index 6ecd87b..9d1bd5c 100755
--- a/index.js
+++ b/index.js
@@ -68,7 +68,8 @@ exports.parse = function (path, map) {
           if (!c) return
           if (check(nextKey, c.key)) {
             i++;
-            this.stack[j].value = null
+            if (j < this.stack.length)
+              this.stack[j].value = null
             break
           }
           j++

makes the tests on doubledot1.js and doubledot2.js pass, but multiple_objects.js fail anyways:

*** test/multiple_objects.js ***
END

equal: 0 == 3

@dominictarr any idea?

davidrapin · 2015-02-10T08:07:55Z

I just had this issue, any workarounds more than welcome!

davidrapin · 2015-02-10T09:38:55Z

Since this issue is quite old, still not fixed, and caused by a badly-documented and buggy dependency (creationix/jsonparse#8), I would recommend to use a more tested library, like https://www.npmjs.com/package/clarinet or https://www.npmjs.com/package/stream-json

dominictarr · 2015-02-10T21:04:04Z

Okay, so I have decided that the memory leak is more important than some of the little use-cases (multiple objects and decent operator) people have contributed. In the interest of parsing large objects I have merged my memory leak fixing changes, but disabled the tests for the features #33 and #19, and bumped the major version. Of course I would be happy to merge a pull request that brings these features back.

dominictarr · 2015-02-10T21:05:50Z

Oh never mind... I can't publish a new npm version because of npm/npm#7260
Apparently this should be fixed in the next version of npm though.

santigimeno · 2015-02-11T12:40:39Z

@dominictarr I've been investigating why multiple_objects.js test was failing in the memory-leak branch and the problem is not with the functionality itself, that it is still working, but with the root event. After the changes the root event is not emitted anymore in most of cases. For example if we add a root listener to the parser in test/test.js, running it in master the event is emitted but not in the memory-leak branch.
I can for the moment provide a patch that removes the checking of the number of times the root event has been called, so multiple_objects.js test passes again.

dominictarr · 2015-02-11T20:55:11Z

@santigimeno great work. I think we should drop features (like root object) if we need to solve the memory problem. People who need those features can just use JSONStream@0

santigimeno · 2015-02-12T10:22:14Z

@dominictarr agreed

- After releasing v1.0.0 this functionality is not working correctly anymore. - As commented in dominictarr#32 (comment) people needing this feature should stick to [email protected] versions.

Macil · 2016-09-16T02:02:00Z

I tried to start using JSONStream in an application where I'm passing ~100mb of json around, but I would always run into out of memory crashes. I've switched away from JSONStream by changing the json to be newline-separated and used readline and JSON.parse on each line and haven't had any more issues.

Wait for writes to finish before doing more writes.

vvo · 2016-12-17T12:55:38Z

Hey @dominictarr any news on this? I would love to be able to keep using JSONStream but it's unfortunately not feasible because of the memory leak. It seems like everything is almost in place to be able to have a better memory footprint? Or should we all move to https://github.com/uhop/stream-json?

dominictarr · 2016-12-18T13:10:45Z

@vvo this is fixed as well as it can be. it depends on the path being used, it's not really a memory leak if you ask it to parse literally everything, so there are still certain ways to run out of memory with JSONStream, but it's not JSONStream's fault.

JSONStream doesn't even parse anything, it just wraps a nicer api around json-parse. There have been other streaming parsers implemented since then - as linked above, but I don't actually know wether they have better memory/cpu performance.

What I would love to see is to make JSONStream take pluggable back ends (json-parse, clarinet, stream-json would be options) then we could actually do a side-by-side comparison, and choose the best one.

Would very gladly take a PR for this!

vvo · 2016-12-19T09:46:25Z

@dominictarr I was completely doing the "wrong" thing on my end. Basically I have an http request that I pipe into JSONStream, then for every object I would queue it to be synced to a backend. But the backend is slower than the http request so I quickly run out of memory.

I guess I would have to wrap my mind around how to handle this difference of processing power using streams.

Anyway, thanks a lot for this module, very efficient!

dominictarr · 2016-12-19T16:15:06Z

@vvo sounds like you need some backpressure! this is what streams are all about! you just need to pause the stream when there are too many items in the processing queue. I usually use https://github.com/pull-stream/pull-paramap for something like that (although sorry to drop a completely different stream api on you, I've switched to using pull-streams because they make things like back pressure somewhat easier to reason about). I'm sure you can find a node-stream version if you dig around a bit.

jcrugzz · 2016-12-19T19:38:18Z

@vvo https://github.com/mafintosh/parallel-transform should work for your use case. You just have to be wary of what the highWaterMark is set to but I believe the default for object streams is sane these days.

hayes · 2017-01-10T04:16:52Z

This may be relevant here, its not the same leak, but there does seem to be an excessive memory use bug in jsonparse that might be affecting some people reading this thread. see creationix/jsonparse#31

lancecarlson mentioned this issue May 7, 2013

Memory leak creationix/jsonparse#8

Open

santigimeno mentioned this issue Mar 9, 2015

Remove root event #54

Merged

Macil added a commit to Macil/browserify-hmr that referenced this issue Sep 16, 2016

Drop JSONStream because of memory leak issue: dominictarr/JSONStream#32

6a4a4bf

Wait for writes to finish before doing more writes.

This was referenced Jan 7, 2021

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#15

Open

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#16

Open

snyk-bot mentioned this issue May 5, 2021

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#18

Open

snyk-bot mentioned this issue May 27, 2021

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#19

Open

snyk-bot mentioned this issue Sep 3, 2021

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#20

Open

joseroubert08 mentioned this issue Nov 15, 2022

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#23

Open

snyk-bot mentioned this issue Nov 22, 2022

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#24

Open

joseroubert08 mentioned this issue Jan 10, 2023

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#25

Open

joseroubert08 mentioned this issue Nov 28, 2023

[Snyk] Security upgrade browserify-hmr from 0.2.2 to 0.3.6 joseroubert08/browserify-handbook#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leak #32

memory leak #32

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

nickpoorman commented Aug 21, 2013

alessioalex commented Nov 30, 2013

dominictarr commented Aug 22, 2014

felipesabino commented Dec 16, 2014

dominictarr commented Dec 17, 2014

PaulMougel commented Dec 18, 2014

PaulMougel commented Dec 18, 2014

davidrapin commented Feb 10, 2015

davidrapin commented Feb 10, 2015

dominictarr commented Feb 10, 2015

dominictarr commented Feb 10, 2015

santigimeno commented Feb 11, 2015

dominictarr commented Feb 11, 2015

santigimeno commented Feb 12, 2015

Macil commented Sep 16, 2016

vvo commented Dec 17, 2016

dominictarr commented Dec 18, 2016

vvo commented Dec 19, 2016

dominictarr commented Dec 19, 2016

jcrugzz commented Dec 19, 2016 •

edited

Loading

hayes commented Jan 10, 2017

memory leak #32

memory leak #32

Comments

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

lancecarlson commented May 7, 2013

lancecarlson commented May 7, 2013

dominictarr commented May 7, 2013

nickpoorman commented Aug 21, 2013

alessioalex commented Nov 30, 2013

dominictarr commented Aug 22, 2014

felipesabino commented Dec 16, 2014

dominictarr commented Dec 17, 2014

PaulMougel commented Dec 18, 2014

PaulMougel commented Dec 18, 2014

davidrapin commented Feb 10, 2015

davidrapin commented Feb 10, 2015

dominictarr commented Feb 10, 2015

dominictarr commented Feb 10, 2015

santigimeno commented Feb 11, 2015

dominictarr commented Feb 11, 2015

santigimeno commented Feb 12, 2015

Macil commented Sep 16, 2016

vvo commented Dec 17, 2016

dominictarr commented Dec 18, 2016

vvo commented Dec 19, 2016

dominictarr commented Dec 19, 2016

jcrugzz commented Dec 19, 2016 • edited Loading

hayes commented Jan 10, 2017

jcrugzz commented Dec 19, 2016 •

edited

Loading