Skip to content

Feat/package and device compatibility#3

Open
paulasquin wants to merge 23 commits intoapple:mainfrom
paulasquin:feat/package_and_device_compatibility
Open

Feat/package and device compatibility#3
paulasquin wants to merge 23 commits intoapple:mainfrom
paulasquin:feat/package_and_device_compatibility

Conversation

@paulasquin
Copy link
Copy Markdown

@paulasquin paulasquin commented Feb 8, 2024

Refacto, Packaging & Apple Silicon compatibility

  • Add poetry-style packaging
  • Refacto code in Object Oriented Programming
  • Add typing
  • Add tests
  • Add mps compatibility (tested on M3 Max 64Go)
  • Add gradio app

To squash before merge

image

Solved issues

Nonsense inner thougts

In Apple Silicon, we are (were) getting nonsense from the model.generate methods

Payload

  • Instruction: make the frame red
  • Image:
glasses

Expected:

  • Out:
If the frame of the glasses in the image were made red, the overall appearance of the scene would change significantly.The red frame would draw more attention to the glass and create a stronger contrast with the black frame.
  • Res:
glasses

Obtained

  • Out
Pres flash togful calledgot At commitilli split sent supports fir card projects course bunch mixture enc halery racc developed curves enjoydog memory seek Inside Wh sam closure served supports fir tripifest towardinn household finishing exact meaning ordinary treat drop whose invert Rem follow til Otherwise stal frames sequence lifted accomp entire variation government carriage uses eratrim condition Wild throne phys mutong B woods racc developed Le rename Ada laugh applying dess squ cit reference rad type refresh spr rud embedded agricult foot ax steps God close These
  • Res:
    ~same as input

Fix

Latest llava weights that you can get from hugging face with git clone https://huggingface.co/liuhaotian/LLaVA-Lightning-7B-delta-v1-1 are just not working.
Solved using saved weights by tsujuifu, stored in GoogleDrive
-> A lot of time lost out of this. This is due to delta-vs-full LLava?

  • Out
The image would feature a close-up view of a pair of black eyeglasses with a gold or metallic frame, placed on a gray background.The frame would be red, drawing attention to the glasses and making them the focal point of the image.
  • Res
glasses

@paulasquin paulasquin marked this pull request as ready for review February 10, 2024 11:42
@xiaoqian-shen
Copy link
Copy Markdown

I also faced issues when trying to reproduce the results. Although no errors were displayed, the quality of the editing was not good as the paper. Could you please share the environment file so I can verify the versions of the critical packages?

@xiaoqian-shen
Copy link
Copy Markdown

I fix the problem by using your provided checkpoint in google drive. Thanks!

@paulasquin
Copy link
Copy Markdown
Author

Hello @xiaoqian-shen

Indeed I suggest to use the models from my HuggingFace, which is from Tsu-Jui Fu's Google Drive link.
I do not have clear understanding of why original package weights aren't working.

Even if this isn't needed for you anymore, here are the package version if it can help others:
I'm sharing poetry run python -m pip freeze instead of poetry.lock file for readability

freeze.txt

@xiaoqian-shen
Copy link
Copy Markdown

Thanks for your reply! May I ask are you available to reproduce the result of MagicBrush in Table 2?

@GitHub1712
Copy link
Copy Markdown

GitHub1712 commented Feb 17, 2024

My trained mgie_7b also not working. Was able to train and export mllm.pt and unet.pt but if running demo, ckpt has no 'emb' and my ckpt´s 'model.embed_tokens.weight' have different tensor size. So running training worked but result model not. With tsujuifu´s weights demo works.

@paulasquin
Copy link
Copy Markdown
Author

Thanks for your reply! May I ask are you available to reproduce the result of MagicBrush in Table 2?

Hello @xiaoqian-shen
I have sometimes slight differences but I get mainly same level of quality, and a few times I got ugly results (phone and beach photos mainly)

Here are my before/after on the demo images

0-in
0-out
1-in
1-out
2-in
2-out
3-in
3-out
4-in
4-out
5-in
5-out
6-in
6-out
7-in
7-out
8-in
8-out
9-in
9-out
10-in
10-out
11-in
11-out
12-in
12-out
13-in
13-out
14-in
14-out
15-in
15-out
16-in
16-out
17-in
17-out
18-in
18-out
19-in
19-out

@lzw-lzw
Copy link
Copy Markdown

lzw-lzw commented Mar 18, 2024

Thank you for your contribution. I wonder where can I find the ipr2ipr.pkl/tsv data in the code, that is, the summarized image-text pair, or do I need to construct it myself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants