Skip to content

[RFC] Propose a new GenAIExample - visual search and QA #352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

llin60
Copy link

@llin60 llin60 commented Apr 15, 2025

This RFC proposes a new GenAIExample that integrates a multi-modal search engine with a visual QA assistant, so that the QnA assistant could be a better helper given the search results as visual context. The search engine and VQA assistant can also work independently as well.

This application serves as an excellent use case for industries such as surveillance, smart cities, and other domains requiring efficient analysis of large-scale visual data.

@yinghu5
Copy link
Collaborator

yinghu5 commented Apr 17, 2025

Hi @llin60
Thank you a lot for new example, will escalate for further discussion.
Just for reference:
Visual QnA example: https://github.com/opea-project/GenAIExamples/tree/main/VisualQnA
MultimodalQnA: https://github.com/opea-project/GenAIExamples/tree/main/MultimodalQnA, which may comprehend a mix of textual, visual, and audio facts drawn from the document contents.

@yinghu5 yinghu5 requested review from Copilot and yinghu5 April 17, 2025 01:19
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment on lines +77 to +80
curl http://localhost:6000/v1/embeddings
-X POST
-d '{"input":"traffic jam"}'
-H 'Content-Type: application/json'
Copy link
Preview

Copilot AI Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl example for the embeddings endpoint omits the HTTP method and headers, which may mislead users expecting a complete API request (e.g., missing '-X POST' and 'Content-Type: application/json'). Consider updating the example for consistency with the other endpoints.

Suggested change
curl http://localhost:6000/v1/embeddings
-X POST
-d '{"input":"traffic jam"}'
-H 'Content-Type: application/json'
curl -X POST http://localhost:6000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input":"traffic jam"}'

Copilot uses AI. Check for mistakes.

Comment on lines +86 to +91
curl http://localhost:7000/v1/retrieval
-X POST
-d "{"embedding":${text_embedding},"search_type":"similarity", "k":4}"
-H 'Content-Type: application/json'
```

Copy link
Preview

Copilot AI Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl command for the retrieval endpoint is missing important flags (such as '-X POST' and required headers) that are needed for a proper API call. Please revise the example to include these details for clarity.

Suggested change
curl http://localhost:7000/v1/retrieval
-X POST
-d "{"embedding":${text_embedding},"search_type":"similarity", "k":4}"
-H 'Content-Type: application/json'
```
curl http://localhost:7000/v1/retrieval \
-X POST \
-d '{"embedding":"<text_embedding_placeholder>","search_type":"similarity","k":4}' \
-H 'Content-Type: application/json'
# Replace <text_embedding_placeholder> with the actual text embedding value.

Copilot uses AI. Check for mistakes.

Comment on lines +96 to +99
curl http://localhost:8888/v1/dbsearch_qna
-X POST
-d '{"text":"traffic jam"}'
-H 'Content-Type: application/json'
Copy link
Preview

Copilot AI Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl example for the combined search and Q&A endpoint omits the HTTP method and payload details, which might lead to confusion. Consider including the POST method and a sample payload to match the other examples.

Suggested change
curl http://localhost:8888/v1/dbsearch_qna
-X POST
-d '{"text":"traffic jam"}'
-H 'Content-Type: application/json'
curl http://localhost:8888/v1/dbsearch_qna \
-X POST \
-d '{"text":"traffic jam", "context_images": ["image1.jpg", "image2.jpg"], "k": 5}' \

Copilot uses AI. Check for mistakes.

@llin60
Copy link
Author

llin60 commented Apr 17, 2025

Hi, thank you for the info. I've studied the existing examples for multi-modal applications. It seems that they process visual data by converting to text. However, in the application we are proposing, we need to store the visual data authentically, as the original images/videos are the targets in interest. Details can be found in the documentation.

@yinghu5 yinghu5 requested review from lvliang-intel and Spycsh and removed request for tomlenth April 18, 2025 03:21
@yinghu5 yinghu5 added the A0 need to scrub label Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0 need to scrub
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants