Skip to content

Commit 853fba0

Browse files
Enforce 25Kb limit for infinite transcription
Current implementation breaks when a new stream is created, even under 5 min limit. This is due to the missing logic to handle 25KB stream size limit [1] Updated the 'generator' function to yield data as soon as API limit is reached. [1] - GoogleCloudPlatform#12053
1 parent 974b758 commit 853fba0

File tree

1 file changed

+19
-1
lines changed

1 file changed

+19
-1
lines changed

Diff for: speech/microphone/transcribe_streaming_infinite_v2.py

+19-1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@
4040
STREAMING_LIMIT = 240000 # 4 minutes
4141
SAMPLE_RATE = 16000
4242
CHUNK_SIZE = int(SAMPLE_RATE / 10) # 100ms
43+
# 25KB API limit. Increasing this will throw error
44+
MAX_STREAMING_CHUNK = 25 * 1024
4345

4446
RED = "\033[0;31m"
4547
GREEN = "\033[0;32m"
@@ -213,7 +215,23 @@ def generator(self: object) -> object:
213215
except queue.Empty:
214216
break
215217

216-
yield b"".join(data)
218+
# Enforce max streaming chunk size supported by the API
219+
combined_size = sum(len(chunk) for chunk in data)
220+
if combined_size <= MAX_STREAMING_CHUNK:
221+
yield b"".join(data)
222+
else:
223+
run_chunks = []
224+
run_size = 0
225+
for chunk in data:
226+
if len(chunk) + run_size > MAX_STREAMING_CHUNK:
227+
yield b"".join(run_chunks)
228+
run_chunks = [chunk]
229+
run_size = len(chunk)
230+
else:
231+
run_chunks.append(chunk)
232+
run_size += len(chunk)
233+
if run_chunks:
234+
yield b"".join(run_chunks)
217235

218236

219237
def listen_print_loop(responses: object, stream: object) -> None:

0 commit comments

Comments
 (0)