Multithreading issue with vcrpy - Inconsistent recording of requests #849

stanBienaives · 2024-07-02T14:59:36Z

Description:

I am encountering an issue with vcrpy when using it in a multithreaded context. The cassette file does not record all the requests when using ThreadPoolExecutor with more than one worker. The issue is not present when using single-threaded execution.

Test Code:

import vcr  # type: ignore
from concurrent.futures import ThreadPoolExecutor
import requests

vcr = vcr.VCR(record_mode='once', filter_headers=['authorization'], ignore_hosts=['api.smith.langchain.com'])

def test_multithreading():
    with vcr.use_cassette('./test_multithread.yaml', match_on=['method', 'scheme', 'host', 'port', 'path', 'query', 'body']):
        def get_google(num):
            return requests.get(f"https://www.google.com?q={num}")
        with ThreadPoolExecutor(max_workers=2) as executor:
            responses = executor.map(get_google, range(2))

    with open('./test_multithread.yaml', 'r') as f:
        data = f.read()
        assert data.count('https://www.google.com') == 2

def test_singlethreading():
    with vcr.use_cassette('./test_singlethread.yaml', match_on=['method', 'scheme', 'host', 'port', 'path', 'query', 'body']):
        def get_google(num):
            return requests.get(f"https://www.google.com?q={num}")
        with ThreadPoolExecutor(max_workers=1) as executor:
            responses = executor.map(get_google, range(2))

    with open('./test_singlethread.yaml', 'r') as f:
        data = f.read()
        assert data.count('https://www.google.com') == 2

Observed Behavior:

test_singlethreading passes as expected, with the cassette recording both requests.
test_multithreading fails because the cassette file does not record both requests, resulting in a mismatch.

Environment:

Python 3.12.3
vcrpy==6.0.1
requests==2.31.0

Steps to Reproduce:

Run the provided test code.
Observe that test_multithreading fails while test_singlethreading passes.

Expected Behavior:

Both test_multithreading and test_singlethreading should pass, with the cassette files correctly recording all the requests made during the tests.

The text was updated successfully, but these errors were encountered:

stanBienaives · 2024-07-02T16:47:43Z

interesting enough:

Adding a delay makes the test pass:

def test_multithreading_record_with_delay():
    with vcr.use_cassette('./test_multithread_with_delay.yaml', match_on=['method', 'scheme', 'host', 'port', 'path', 'query', 'body']):
        def get_google(num):
            time.sleep(num)
            return requests.get(f"https://www.google.com?q={num}")
        with ThreadPoolExecutor(max_workers=2) as executor:
            responses  = executor.map(get_google, range(2))

    with open('./test_multithread_with_delay.yaml', 'r') as f:
        data = f.read()
        assert data.count('https://www.google.com') == 2

simon-weber · 2024-07-16T21:00:49Z

I suspect this has to do with race conditions involving force_reset. For example, vcr's getresponse will unpatch, call the underlying getresponse, then repatch. I'm guessing that's happening is a timeline like:

thread A tries getresponse
thread A needs to pass through the request and unpatches
thread B tries getresponse. the patches aren't currently applied, so the request goes through unnoticed.
thread A repatches

Afterwards, you're left with a recorded request from thread A and a missing one from thread B.

kevin1024 · 2024-07-16T23:48:31Z

@simon-weber I think that’s exactly what’s happening.

I guess we could put a mutex around the unpatch but you would lose some of the benefit of multithreading to begin with.

simon-weber · 2024-07-17T19:02:29Z

I'm not sure there's an easy locking fix inside vcrpy. Locking in force_reset doesn't work since it's not called if patches aren't applied.

What about if patches were left in place but dynamically disabled with a threadlocal? Wrapt has a way to do this and seems like it could pretty easily replace mock.patch.object.

kevin1024 · 2024-07-17T23:18:49Z

Oh good point. Yes, I think that strategy could work.

ddorian added a commit to ddorian/vcrpy that referenced this issue Nov 13, 2024

Add mutex lock for multi-threading environments kevin1024#849

05f3fa2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading issue with vcrpy - Inconsistent recording of requests #849

Multithreading issue with vcrpy - Inconsistent recording of requests #849

stanBienaives commented Jul 2, 2024

stanBienaives commented Jul 2, 2024

simon-weber commented Jul 16, 2024

kevin1024 commented Jul 16, 2024

simon-weber commented Jul 17, 2024

kevin1024 commented Jul 17, 2024

Multithreading issue with vcrpy - Inconsistent recording of requests #849

Multithreading issue with vcrpy - Inconsistent recording of requests #849

Comments

stanBienaives commented Jul 2, 2024

Description:

Test Code:

Observed Behavior:

Environment:

Steps to Reproduce:

Expected Behavior:

stanBienaives commented Jul 2, 2024

simon-weber commented Jul 16, 2024

kevin1024 commented Jul 16, 2024

simon-weber commented Jul 17, 2024

kevin1024 commented Jul 17, 2024