-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add florence 2 #560
Add florence 2 #560
Conversation
…into add-florence-2
…into add-florence-2
…into add-florence-2
|
inference/core/workflows/core_steps/models/foundation/florence_2.py
Outdated
Show resolved
Hide resolved
inference/core/workflows/core_steps/models/foundation/florence_2.py
Outdated
Show resolved
Hide resolved
@@ -41,8 +41,10 @@ def serialise_sv_detections(detections: sv.Detections) -> dict: | |||
detection_dict[X_KEY] = x1 + detection_dict[WIDTH_KEY] / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nature of this change makes me worrying about integrity of the whole thing after the change.
If you don't have confidence, then output is not really sv.Detections, the same with class name being empty string. Those problems must be addressed at the block level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, just realised that there is this from_lmm(...)
constructor for sv.Detections, but still - objects produced by the block will be incompatible with other blocks due to optional property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that different vision tasks have different information. I dont really see how this is different than a an sv detection for bounding boxes having a box vs. a detection for classification or detection for segmentation. We already support optional properties in supervision and multimodal models are bound to have these incomplete detections so i dont really see a way around it. I think all blocks should assume incompleteness and we may consider a separate method for validating supervision blocks for necessary info on the receiving end rather than generation end of a supervision detection.
] | ||
|
||
|
||
class BlockManifest(WorkflowBlockManifest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must be adjusted to be aligned with main
- we got rid of asyncio and introduced versioning patterns. Please apply (take a look at other blocks, in doubts I can help)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: What happens when LMM output does not match expectations - any error handling would be possible / error would be raised / empty output will be yielded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean by doesnt match expectation. Microsoft has already handled output validation and error handling so we can assume their preprocessor returns valid output data or empty data.
return await self.run_locally( | ||
images=images, vision_task=vision_task, prompt=prompt, model_id=model_id | ||
) | ||
elif self._step_execution_mode is StepExecutionMode.REMOTE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not up to date with new models that are to be supported on the platform, in priv please provide me the info about models to be hosted on the platform and the way on how we want to host them - in particular - are we going to let this model to be run only locally / do we let people train the model etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like paligemma, florence 2 requires intensive compute to generate results in a reasonable amount of time so its only supported on local/dedicated deployments. we dont have a dedicated endpoint at this time. we currently support training loras outside of roboflow and uploading to the platform (and loading through this block)
return [ | ||
OutputDefinition(name="parent_id", kind=[BATCH_OF_PARENT_ID_KIND]), | ||
OutputDefinition(name="root_parent_id", kind=[BATCH_OF_PARENT_ID_KIND]), | ||
OutputDefinition(name="image", kind=[BATCH_OF_IMAGE_METADATA_KIND]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this output is no longer used given sv.Detections are there under predictions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw output is the raw text from the model, structured output is the formatted dict generated by the microsoft provided output text processor, predictions is sv detections processed by our from_lmm method in supervision
@@ -313,6 +313,24 @@ def __hash__(self) -> int: | |||
docs=DETECTION_KIND_DOCS, | |||
) | |||
|
|||
DETECTION_KIND_DOCS = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain why this additional kind is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each sv detection wont necessarily fall exactly into one of the current types i.e. missing class, class_id, etc so I created the batch of Detection kind as a catch all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR to be approved:
- integration tests to be created - as must have I would like to see how output is practically used by other block
- there is high chance that UQL operations on detections are incompatible with sv.Detections produced as output of this step
- deployment on hosted platform to be clarified
Hi there,
|
shall we close in favour of #661 |
Description
This PR adds support for florence2 in workflows. It allows for a wide variety of computer vision tasks with prompts and finetuned loras. The node returns both raw outputs from the model, and parsed supervision detections.
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?