Skip to content

Commit 944f96d

Browse files
committed
Update index file
Signed-off-by: Jay Wang <jay@zijie.wang>
1 parent c2bbddf commit 944f96d

File tree

2 files changed

+102
-43
lines changed

2 files changed

+102
-43
lines changed

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,9 @@ We use a modularized file structure to distribute DiffusionDB. The 2 million ima
5555
./
5656
├── diffusiondb-large-part-1
5757
│   ├── part-000001
58-
│   │   ├── 3bfcd9cf-26ea-4303-bbe1-b095853f5360.png
59-
│   │   ├── 5f47c66c-51d4-4f2c-a872-a68518f44adb.png
60-
│   │   ├── 66b428b9-55dc-4907-b116-55aaa887de30.png
58+
│   │   ├── 0a8dc864-1616-4961-ac18-3fcdf76d3b08.webp
59+
│   │   ├── 0a25cacb-5d91-4f27-b18a-bd423762f811.webp
60+
│   │   ├── 0a52d584-4211-43a0-99ef-f5640ee2fc8c.webp
6161
│   │   ├── [...]
6262
│   │   └── part-000001.json
6363
│   ├── part-000002
@@ -66,9 +66,9 @@ We use a modularized file structure to distribute DiffusionDB. The 2 million ima
6666
│   └── part-010000
6767
├── diffusiondb-large-part-2
6868
│   ├── part-010001
69-
│   │   ├── 3bfcd9cf-26ea-4303-bbe1-b095853f5360.png
70-
│   │   ├── 5f47c66c-51d4-4f2c-a872-a68518f44adb.png
71-
│   │   ├── 66b428b9-55dc-4907-b116-55aaa887de30.png
69+
│   │   ├── 0a68f671-3776-424c-91b6-c09a0dd6fc2d.webp
70+
│   │   ├── 0a0756e9-1249-4fe2-a21a-12c43656c7a3.webp
71+
│   │   ├── 0aa48f3d-f2d9-40a8-a800-c2c651ebba06.webp
7272
│   │   ├── [...]
7373
│   │   └── part-000001.json
7474
│   ├── part-010002
@@ -107,7 +107,9 @@ The data fields are:
107107

108108
To help you easily access prompts and other attributes of images without downloading all the Zip files, we include two metadata tables `metadata.parquet` and `metadata-large.parquet` for DiffusionDB 2M and DiffusionDB Large, respectively.
109109

110-
The shape of `metadata.parquet` is (2000000, 13) and the shape of `metatable-large.parquet` is (14000000, 13). Two tables share the same schema, and each row represents an image. We store these tables in the Parquet format because Parquet is column-based: you can efficiently query individual columns (e.g., prompts) without reading the entire table. Below are three random rows from `metadata.parquet`.
110+
The shape of `metadata.parquet` is (2000000, 13) and the shape of `metatable-large.parquet` is (14000000, 13). Two tables share the same schema, and each row represents an image. We store these tables in the Parquet format because Parquet is column-based: you can efficiently query individual columns (e.g., prompts) without reading the entire table.
111+
112+
Below are three random rows from `metadata.parquet`.
111113

112114
| image_name | prompt | part_id | seed | step | cfg | sampler | width | height | user_name | timestamp | image_nsfw | prompt_nsfw |
113115
|:-----------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------:|-----------:|-------:|------:|----------:|--------:|---------:|:-----------------------------------------------------------------|:--------------------------|-------------:|--------------:|

0 commit comments

Comments
 (0)