The dataset includes model outputs from three variants (Small-p, Small-i, Large-i) on three public datasets:
Each sample includes:
- Original input image
- Generated digital ink in
.inkml
format - Results from three different inference modes
Download the dataset with the following command:
git clone https://huggingface.co/datasets/Derendering/InkSight-Derenderings
The structure of the dataset is as follows:
├── <Dataset> # IMGUR5K, IAM, or HierText
│ └── images_sample/ # Original input images
│ ├── <exampleId>.png
│ └── ...
├── <model>_<dataset>_inkml/ # Model outputs (e.g. large-i_IMGUR5K_inkml)
│ ├── d+t/ # Derender with Text mode
│ ├── vanilla/ # Vanilla Derendering mode
│ └── r+d/ # Recognize and Derender mode
where <exampleId>
is the unique identifier of the image and <model>
is one of small-p
, small-i
, or large-i
, and <dataset>
is one of IMGUR5K
, IAM
, or HierText
.
The digital ink traces are stored in .inkml
format, which is a standard format for representing digital ink data. The format includes the following annotation fields:
-
application:
- Value: "InkSight"
- Indicates the model/system that generated the ink
-
sourceDataset:
- Values: "HierText", "IMGUR5K", or "IAM"
- Original dataset where the input image comes from
-
inferenceMode:
- Values: "Derender with Text", "Vanilla", or "Recognize and Derender"
- Indicates which model inference mode was used:
- "Derender with Text": Uses OCR text for guidance
- "Vanilla": Direct derendering without text input
- "Recognize and Derender": Combines recognition and derendering
-
exampleId:
- Example: "0d44ea16f816055c_76"
- Unique identifier matching the original dataset
-
textField:
- Contains the text content
- For "Derender with Text" mode: OCR text used for guidance
- For "Recognize and Derender": Recognized text from the model
- d+t (Derender with Text): Uses OCR (Google Cloud Vision API) to recognize text in the image before derendering and feed it as guidance to the model.
- vanilla: Direct derendering without text guidance.
- r+d (Recognize and Derender): Asks the model to recognize text in the image and then derender the text.
Examples are provided in the colab notebook. The utils functions are organized in the utils
folder. We also provide an example script to visualize samples from the dataset with different inference modes, use the demo script visualize_dataset.py
:
# Show 3 samples from HierText dataset using Small-i model
python visualize_dataset.py --dataset HierText --num_samples 3 --model Small-i
If you use this dataset, please cite:
@article{mitrevski2024inksight,
title={InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write},
author={Mitrevski, Blagoj and Rak, Arina and Schnitzler, Julian and Li, Chengkun and Maksai, Andrii and Berent, Jesse and Musat, Claudiu},
journal={arXiv preprint arXiv:2402.05804},
year={2024}
}