Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facelandmarker slow on Android apps but fast with TFLite model benchmark tool #5872

Open
arrufat opened this issue Feb 28, 2025 · 0 comments
Assignees
Labels
gpu MediaPipe GPU related issues platform::android Android Solutions task:face landmarker Issues related to Face Landmarker: Identify facial features for visual effects and avatars. type:performance Execution Time and memory heap, stackoverflow and garbage collection related

Comments

@arrufat
Copy link

arrufat commented Feb 28, 2025

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Android 15

MediaPipe Tasks SDK version

0.10.21

Task name (e.g. Image classification, Gesture recognition etc.)

Face landmark detection

Programming Language and version (e.g. C++, Python, Java)

Kotlin

Describe the actual behavior

The face landmarker model takes about 30-70 ms to run on a Pixel 9 Pro

Describe the expected behaviour

I would expect the model to run in real-time (on Desktop, using Wasm it takes 15-20 ms)

Standalone code/steps you may have used to try to get what you need

I tried the code sample code from https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/face_landmarker/android

Other info / Complete Logs

To check how fast the model can run, I used the TFLite Model Benchmark Tool.
I unzipped the face_landmarker.task and benchmarked both the face_detector.tflite and the face_landmarks_detector.tlite models:

~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_detector.tflite \
      --use_gpu=true"'
~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_landmarks_detector.tflite \
      --use_gpu=true"'

~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_detector.tflite \
      --use_gpu=false"'
~ $ adb shell am start -S \
          -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
          --es args '"--graph=/data/local/tmp/face_landmarks_detector.tflite \
      --use_gpu=false"'

Results:

adb logcat | grep "Inference timings in us"
02-28 23:15:03.904 30233 30233 I tflite  : Inference timings in us: Init: 2770755, First inference: 19349, Warmup (avg): 4108.97, Inference (avg): 3611.28
02-28 23:15:22.704 30277 30277 I tflite  : Inference timings in us: Init: 4607458, First inference: 31031, Warmup (avg): 12077.7, Inference (avg): 11406.

02-28 23:15:43.189 30337 30337 I tflite  : Inference timings in us: Init: 16248, First inference: 10541, Warmup (avg): 11986.1, Inference (avg): 15703.2
02-28 23:15:53.231 30379 30379 I tflite  : Inference timings in us: Init: 86527, First inference: 59826, Warmup (avg): 60634.6, Inference (avg): 72700.4

So, on GPU:

  • face detector: 3611.28 µs
  • face landmarker: 11406.9 µs
  • total = 15018.18 µs

And on CPU:

  • face detector: 15703.2 µs
  • face landmarker: 72700.4 µs
  • total = 88403.6 µs

In my app, I am initializing like this:

        val baseOptions = BaseOptions.builder()
            .setModelAssetPath(context.getString("face_landmarker.task"))
            .setDelegate(Delegate.GPU)
            .build()
        val options = FaceLandmarker.FaceLandmarkerOptions.builder()
            .setBaseOptions(baseOptions)
            .setMinFaceDetectionConfidence(0.5f)
            .setMinTrackingConfidence(0.5f)
            .setMinFacePresenceConfidence(0.5f)
            .setNumFaces(1)
            .setOutputFacialTransformationMatrixes(false)
            .setOutputFaceBlendshapes(false)
            .setRunningMode(RunningMode.VIDEO)
            .build()
        val faceLandmarker = FaceLandmarker.createFromOptions(context, options)

And then using it like this:

        val bitmap = image.toBitmap()  // where image is an `ImageProxy` from the camera
        val mpImage = BitmapImageBuilder(bitmap).build()
        val timestampMs = SystemClock.uptimeMillis()
        val result = faceLandmarker.detectForVideo(mpImage, timestampMs)
        val detectTimeMs = SystemClock.uptimeMillis() - timestampMs

I would expect to see a bit more than 15 ms because the mpImage is 640×480 and needs to be resized to 192×192 and 256×256 for the detector and the landmarker, respectively.
However, the gap between the TFLite model benchmark tool (15ms) and the actual app (30-70ms) seems too large.
Am I initializing the model properly, is there something I am missing that's hampering the performance?

Thanks in advance.

@kuaashish kuaashish added platform::android Android Solutions task:face landmarker Issues related to Face Landmarker: Identify facial features for visual effects and avatars. gpu MediaPipe GPU related issues type:performance Execution Time and memory heap, stackoverflow and garbage collection related labels Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu MediaPipe GPU related issues platform::android Android Solutions task:face landmarker Issues related to Face Landmarker: Identify facial features for visual effects and avatars. type:performance Execution Time and memory heap, stackoverflow and garbage collection related
Projects
None yet
Development

No branches or pull requests

2 participants