Skip to content

Conversation

@tukwila
Copy link
Contributor

@tukwila tukwila commented Sep 8, 2025

Summary

Details

I hope data file can support ShareGPT as benchmark test data such as: ShareGPT_V3_unfiltered_cleaned_split.json; In this PR, user can abstract testing prompts from origin file and filter human prompts (10 < words < 1000) to save into local file, refer to:

image image
  • [ ]

Test Plan

Related Issues

  • Resolves #

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@tukwila tukwila changed the title support ShareGPT dataset as data file draft: support ShareGPT dataset as data file Sep 8, 2025
@tukwila tukwila changed the title draft: support ShareGPT dataset as data file support ShareGPT dataset as data file Sep 8, 2025
@tukwila tukwila force-pushed the support_sharegpt branch 3 times, most recently from 1cf7e56 to e98bd0e Compare September 9, 2025 04:21
@tukwila
Copy link
Contributor Author

tukwila commented Sep 9, 2025

@sjmonson
Copy link
Collaborator

This seems external to the GuideLLM. Can you please move all code and documentation to /contrib/sharegpt_preprocess.

@tukwila
Copy link
Contributor Author

tukwila commented Sep 12, 2025

This seems external to the GuideLLM. Can you please move all code and documentation to /contrib/sharegpt_preprocess.

Done

@sjmonson
Copy link
Collaborator

Sorry I forgot about this PR due to the sudden flurry of new PRs. Can you also move the changes in docs/datasets.md to contrib/sharegpt_preprocess/README.md.

@tukwila
Copy link
Contributor Author

tukwila commented Sep 16, 2025

Sorry I forgot about this PR due to the sudden flurry of new PRs. Can you also move the changes in docs/datasets.md to contrib/sharegpt_preprocess/README.md.

Done

Copy link
Collaborator

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the requirements.txt supposed to include all dependencies? I had to install datasets and transformers for it to work.

It may be beneficial to also note that you need to run it with the HF_TOKEN value set.

Once I addressed these it appears to have worked.

@tukwila
Copy link
Contributor Author

tukwila commented Sep 17, 2025

Is the requirements.txt supposed to include all dependencies? I had to install datasets and transformers for it to work.

It may be beneficial to also note that you need to run it with the HF_TOKEN value set.

Once I addressed these it appears to have worked.

yes, i updated and retest it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants