Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should LMDB be used in a Flask app #301

Closed
katadonutskuri opened this issue May 22, 2021 · 1 comment
Closed

How should LMDB be used in a Flask app #301

katadonutskuri opened this issue May 22, 2021 · 1 comment

Comments

@katadonutskuri
Copy link

katadonutskuri commented May 22, 2021

Affected Operating Systems

  • Linux

Affected py-lmdb Version

1.2.1

py-lmdb Installation Method

pipenv install ...

Describe Your Problem

Hi, so I was looking through the open issues, and it seems LMDB has some kinks with multiprocess/threads:

In a python flask app running on gunicorn with multiple workers, what's the recommended way to use LMDB? According to issue #230, this would be a problem if multiple requests hit the LMDB endpoint, am I correct in this assumption? So let's say I have a class:

class LMDBProvider():
    def __init__():
        # set lmdb dirpath
        # open env, txn etc

Then I store that in the g variable (https://flask.palletsprojects.com/en/2.0.x/appcontext/)

def init_lmdb():
    if 'lmdb' not in g:
        g.lmdb = LMDBProvider()
    return g.lmdb

Would this setup be an issue? I currently have something similar to this, and occasionally (very) get an error with lmdb.open(path, read_only=True, max_dbs=2). The error occurs because the path is invalid, but it is able to open it previously. This happens when I do multiprocessing, I'd say maybe 1/20 times it happens.

... lmdb.open(
lmdb.InvalidParameterError: mdb_txn_begin: Invalid argument

I haven't confirmed if it's an LMDB issue, or just the way I have it setup in Flask. Any advice would be appreciated. Thanks!

@jnwatson
Copy link
Owner

This looks like you're initializing lmdb correctly. #269 only matters if you're setting the mapsize dynamically. #230 only matters for multi-threading.

It isn't clear to me whether init_lmdb can be run from more than one thread at the same time. If it can, then the if statement should be surrounded by a lock.

In multiprocessing, if you're using a process pool, the pool logic recycles processes to avoid process creation overhead. If you're not closing the LMDB environment when your task is done, then you might have two instances of the same file open twice in the same process. This is catastrophic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants