Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output values are not changing for different inputs #64

Open
vmelentev opened this issue May 8, 2024 · 13 comments
Open

Output values are not changing for different inputs #64

vmelentev opened this issue May 8, 2024 · 13 comments

Comments

@vmelentev
Copy link

vmelentev commented May 8, 2024

Hi, I am using a movenet model from tfhub.dev with FrameProcessor and VisionCamera to try and apply human pose estimation to a person. It doesn't appear as though it is tracking my movements as the outputs in the console are always the same. This appears to be the case with all models I try to use.

Here is the link to the model

Here is the code I am using to resize the frame:

function getArrayFromCache(size) {
    'worklet'
    if (global[CACHE_ID] == null || global[CACHE_ID].length !== size) {
      global[CACHE_ID] = new Uint8Array(size);
    }
    return global[CACHE_ID];
  }

function resize(frame, width, height) {
    'worklet'
    const inputWidth = frame.width;
    const inputHeight = frame.height;
    const arrayData = frame.toArrayBuffer();

    const outputSize = width * height * 3; // 3 for RGB
    const outputFrame = getArrayFromCache(outputSize);

    for (let y = 0; y < height; y++) {
      for (let x = 0; x < width; x++) {
        // Find closest pixel from the source image
        const srcX = Math.floor((x / width) * inputWidth);
        const srcY = Math.floor((y / height) * inputHeight);

        // Compute the source and destination index
        const srcIndex = (srcY * inputWidth + srcX) * 4; // 4 for BGRA
        const destIndex = (y * width + x) * 3;           // 3 for RGB

        // Convert from BGRA to RGB
        outputFrame[destIndex] = arrayData[srcIndex + 2];   // R
        outputFrame[destIndex + 1] = arrayData[srcIndex + 1]; // G
        outputFrame[destIndex + 2] = arrayData[srcIndex];     // B
      }
    }

    return outputFrame;
  }

Here is my frame processor function:

  const frameProcessor = useFrameProcessor((frame) => {
    'worklet'
    if (model == null) return

    const newFrame = resize(frame, 192, 192)

    const outputs = model.runSync([newFrame])
    outputs = outputs[0]
    console.log(outputs[1])
  }, [model])

Here is the output in the console:


 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904
 LOG  0.46377456188201904

For each frame the camera sees the result is always the same.

Does anyone know how to resolve this issue?

Thank you

@mrousavy
Copy link
Owner

mrousavy commented May 8, 2024

Please format your code properly.

@willadamskeane
Copy link

I had a similar issue - in my case, the input size didn't match what the model was expecting. I'd also check that the model accepts uint8 input.
You can verify on https://netron.app

@vmelentev
Copy link
Author

I had a similar issue - in my case, the input size didn't match what the model was expecting. I'd also check that the model accepts uint8 input. You can verify on https://netron.app

Hi, the frame input size and type (uint8) is correct. If it weren't, I wouldn't get console outputs above and I would get errors such as 'Invalid input size/type'.

My issue is that the output is not changing regardless of the input. If I understand correctly this model is meant to detect different features of the human body (nose, eyes, elbows, knees ect) and output values based on where they appear on the screen, which doesn't appear to be the case as the output values are always the same.

@mrousavy
Copy link
Owner

Does your newFrame contain new data each time?

@Silvan-M
Copy link

Hi! Seemingly have the same problem. The resized image does change, however not the output of the tflite model.
I get the same when running your /example in this repo with the following output:

 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 LOG  Running inference on 640 x 480 yuv Frame
 LOG  Result: 25
 ...

@mrousavy
Copy link
Owner

Well if the resized image changes but the output values don't then it might be an issue with your TFLite model? I am not sure if this is an issue here in this library...

@Silvan-M
Copy link

Ok, I can confirm it was an issue with the input size as @willadamskeane suggested. For some reason, it does not output an error on wrong input size (e.g. 151x150 instead of 150x150 px using the vision-camera-resize-plugin).

If this is considered expected behaviour, from my end the issue can be closed.

@s54BU
Copy link

s54BU commented May 28, 2024

Hi all, after some experimentation it appears as though my code for resizing the frame does not work properly and does not put the frame into the correct format, yet it wasn't throwing an error for some reason. I have resolved this issues by switching to using the vision-camera-resize-plugin which @Silvan-M suggested and it now works. Thank you for your help

@JEF1056
Copy link

JEF1056 commented Jun 10, 2024

@Silvan-M @s54BU Do either of you mind sharing your working code? I'm encountering the same behavior where the frame is updating but the results aren't. I've been using this model which should be the same as yours and have already been using vision-camera-resize-plugin.
My code is more or less as follows:

// ... imagine some model loading code here, poseModel is set in state somewhere
const poseModel = await loadTensorflowModel(
  require("../assets/movenet_multipose.tflite")
);

const maxPixels = 512;
// The longer side of the frame is resized to maxPixels, while maintaining the aspect ratio of the original frame.
let width, height;

if (frame.width > frame.height) {
  width = maxPixels;
  height = (frame.height / frame.width) * maxPixels;
} else {
  height = maxPixels;
  width = (frame.width / frame.height) * maxPixels;
}

// Resize the frame for more efficient inference
const resized = resize(frame, {
  scale: {
    width: width,
    height: height
  },
  pixelFormat: "rgb",
  dataType: "uint8"
});

const inference = poseModel.runSync([resized]);

The expected input according to the model page should be in the shape [1, height, width, 3], according to the documentation, but the data output by resize() being pushed into poseModel.runSync() is [1, height * width * 3]

@Silvan-M
Copy link

Silvan-M commented Jun 10, 2024

@JEF1056, sure no problem! But I basically just used the example of the repo and changed it in such a way, that I can use as standalone application (without outer library, but library as import).

Also I don't use movenet, I used efficientdet. Looking at movenet multipose I find it interesting that they don't require a specific input size (only a multiple of 32) and when putting it into netron I get 1x1x1x3 as input (see screenshot). Not sure how this works, maybe someone else here has an idea if this works with this plugin.

You mentioned that the plugin returns [1, height * width * 3] this shouldn't be a problem since that's also the case for the example, which has [1, 320, 320, 3] and it seems to work well.

Your code looks good, however I could see a problem, since they require a width and height of a multiple of 32. In your code this is only given for the larger image side, the other side is likely not a multiple of 32, so make sure that both width and height are a multiple of 32.

My example (adapted from /example): example-tflite.zip

Screenshot 2024-06-10 at 11 49 20

@JEF1056
Copy link

JEF1056 commented Jun 10, 2024

After taking a look it seems that the 1x1x1x3 refers to a dynamic shape in tensorflow lite (e.g. 1 x null x null x 3) which would require you to resize the input shape first. It seems there was an issue thread here around adding support for this kind of behavior in this library but no specific API was made available for it

Unfortunately I don't have the C++ / native code experience to write a cohesive API around TfLiteInterpreterResizeInputTensor specifically for this libary- I might take a shot at writing some kind of wrapper with the patch in that PR, any thoughts? @mrousavy (would be happy to sponsor you to get a small change for this in)

@mrousavy
Copy link
Owner

Hey - yea I can add automatic tensor resizing if you tell me when that method needs to be called. Should be like 4-8 hours of effort max.

@JEF1056
Copy link

JEF1056 commented Jun 11, 2024

Thanks for the quick response; It needs to get called after model loading but before memory allocation and model inference, so likely just before this line: https://github.com/mrousavy/react-native-fast-tflite/blob/main/cpp/TensorflowPlugin.cpp#L171

The best way to expose it as an API would probably be

loadTensorflowModel(source: ModelSource, delegate?: TensorflowModelDelegate, inputShape?: number[])

Where if inputShape is undefined, resize isn't called at all

Alternatively, the entire memory allocation could be done after the inputBuffer is created, like just before this line:
https://github.com/mrousavy/react-native-fast-tflite/blob/main/cpp/TensorflowPlugin.cpp#L236
Though i'm not exactly sure how you would infer the input size from a buffer directly (since an image has 3 dimensions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants