You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: datasheet.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Organization: Georgia Institute of Technology
19
19
20
20
1.**For what purpose was the dataset created?** Was there a specific task in mind? Was there a specific gap that needed to be filled? Please provide a description.
21
21
22
-
The DiffusionDB project was inspired by important needs in research focused on diffusion models and prompt engineering. As large text-to-image models are relatively new, there is a pressing need to understand how these models work, how to write effective prompts, and how to design tools to help users generate images. To tackle these critical challenges, we present DiffusionDB, the first large-scale prompt dataset with 2 million real prompt-image pairs.
22
+
The DiffusionDB project was inspired by important needs in research focused on diffusion models and prompt engineering. As large text-to-image models are relatively new, there is a pressing need to understand how these models work, how to write effective prompts, and how to design tools to help users generate images. To tackle these critical challenges, we present DiffusionDB, the first large-scale prompt dataset with 14 million real prompt-image pairs.
23
23
24
24
2.**Who created this dataset (e.g. which team, research group) and on behalf of which entity (e.g. company, institution, organization)**?
25
25
@@ -45,7 +45,7 @@ of Technology.
45
45
46
46
2.**How many instances are there in total (of each type, if appropriate)?**
47
47
48
-
There are 2 million instances in total in the dataset.
48
+
There are 14 million instances in total in the dataset.
49
49
50
50
3.**Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?** If the dataset is a sample, then what is the larger set? Is the sample representative of the larger set (e.g. geographic coverage)? If so, please describe how this representativeness was validated/verified. If it is not representative of the larger set, please describe why not (e.g. to cover a more diverse range of instances, because instances were withheld or unavailable).
0 commit comments