You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing significant performance drops when running Tesseract OCR within a container with GVisor runtime "runsc".
When tesseract is run with standard docker "runc" runtime the run time is as follows:
real 0m 3.83s
user 0m 10.35s
sys 0m 0.20s
However the same tesseract run with the GVisor "runsc" runtime is 5x slower:
real 0m 19.60s
user 1m 4.21s
sys 0m 0.46s
We have found out that the run time with the "runsc" runtime can be influenced by setting the OpenMP (a library used by Tesseract) parallelism level using the OMP_THREAD_LIMIT environment variable:
OMP_THREAD_LIMIT=1:
real 0m 7.44s
user 0m 6.38s
sys 0m 0.71s
OMP_THREAD_LIMIT=2:
real 0m 6.83s
user 0m 10.56s
sys 0m 0.43s
OMP_THREAD_LIMIT=3:
real 0m 6.86s
user 0m 14.91s
sys 0m 0.38s
OMP_THREAD_LIMIT=4:
real 0m 19.60s
user 1m 3.79s
sys 0m 0.50s
It seems that tesseract runs fastest when OpenMP utilizes 2 threads. However the resource utilization is not optimal - the decrease in run time significantly impacts overall CPU utilization (total run time of 6.38s of 1 thread vs 10.56s of 2 threads). The CPU on this machine has 4 CPU threads.
Expected behaviour in this case would be for GVisor to minimally impact the run time and resource utilization of Tesseract OCR.
Steps to reproduce
Configure docker to utilize "runsc" runtime.
Provision a container with Tesseract OCR installed. Dockerfile:
FROM alpine:latest
RUN apk add --no-cache qpdf tesseract-ocr tesseract-ocr-data-slk tesseract-ocr-data-eng
ARG user=app
ARG group=app
ARG uid=1000
ARG gid=1000
RUN addgroup -g ${gid} ${group}
RUN adduser -G ${group} -u ${uid} -s /bin/sh -h /app ${user} -D
WORKDIR /app
# Switch to user
USER ${uid}:${gid}
ENTRYPOINT ["/bin/sh"]
Execute Tesseract OCR within the container using the following command line:
OMP_THREAD_LIMIT=4 time tesseract test-0.png - -l eng
Note: test-0.png can be any screenshot containing english text.
runsc version
runsc version release-20250319.0
spec: 1.1.0-rc.1
docker version (if using docker)
Client: Docker Engine - Community
Version: 27.5.1
API version: 1.47
Go version: go1.22.11
Git commit: 9f9e405
Built: Wed Jan 22 13:41:48 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.5.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.11
Git commit: 4c9b3b0
Built: Wed Jan 22 13:41:48 2025
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.25
GitCommit: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
runc:
Version: 1.2.4
GitCommit: v1.2.4-0-g6c52b3f
docker-init:
Version: 0.19.0
GitCommit: de40ad0
uname
Linux docker 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
The text was updated successfully, but these errors were encountered:
It looks like the KVM platform is behaving as expected, so this is indeed the same issue as #11431. The fact that for KVM OMP_THREAD_LIMIT at 4 is slightly slower than 3 is not necessarily a bug -- the gVisor sentry may be doing things in the background while user threads are executing, which means more than 4 threads will be trying to share 4 CPU cores, which loses efficiency.
Please chime in and help out in #11431, I've gone into more detail there about what I believe the issue to be related to.
Description
We are experiencing significant performance drops when running Tesseract OCR within a container with GVisor runtime "runsc".
When tesseract is run with standard docker "runc" runtime the run time is as follows:
However the same tesseract run with the GVisor "runsc" runtime is 5x slower:
We have found out that the run time with the "runsc" runtime can be influenced by setting the OpenMP (a library used by Tesseract) parallelism level using the OMP_THREAD_LIMIT environment variable:
OMP_THREAD_LIMIT=1:
OMP_THREAD_LIMIT=2:
OMP_THREAD_LIMIT=3:
OMP_THREAD_LIMIT=4:
It seems that tesseract runs fastest when OpenMP utilizes 2 threads. However the resource utilization is not optimal - the decrease in run time significantly impacts overall CPU utilization (total run time of 6.38s of 1 thread vs 10.56s of 2 threads). The CPU on this machine has 4 CPU threads.
Expected behaviour in this case would be for GVisor to minimally impact the run time and resource utilization of Tesseract OCR.
Steps to reproduce
Note: test-0.png can be any screenshot containing english text.
runsc version
docker version (if using docker)
Client: Docker Engine - Community Version: 27.5.1 API version: 1.47 Go version: go1.22.11 Git commit: 9f9e405 Built: Wed Jan 22 13:41:48 2025 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 27.5.1 API version: 1.47 (minimum version 1.24) Go version: go1.22.11 Git commit: 4c9b3b0 Built: Wed Jan 22 13:41:48 2025 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.25 GitCommit: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb runc: Version: 1.2.4 GitCommit: v1.2.4-0-g6c52b3f docker-init: Version: 0.19.0 GitCommit: de40ad0
uname
Linux docker 6.8.0-55-generic #57-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 12 23:42:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
The text was updated successfully, but these errors were encountered: