Skip to content

How many samples were used to train Flan-T5? #92

@rohan-mehta

Description

@rohan-mehta

Hey all, possibly silly question. I see that the huggingface collection has many millions of samples, and the google blog post suggests that the collection has 15M samples: https://ai.googleblog.com/2023/02/the-flan-collection-advancing-open.html

On the other hand mixtures.py suggests that ~350K samples is the default maximum: https://github.com/google-research/FLAN/blob/main/flan/v2/mixtures.py#L27

How many samples were actually used to fine tune T5 and produce Flan-T5?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions