InkSight Dataset Documentation

Dataset Overview

The dataset includes model outputs from three variants (Small-p, Small-i, Large-i) on three public datasets:

IMGUR5K
IAM
HierText

Each sample includes:

Original input image
Generated digital ink in .inkml format
Results from three different inference modes

Download the Dataset

Download the dataset with the following command:

git clone https://huggingface.co/datasets/Derendering/InkSight-Derenderings

Dataset Structure and Format

The structure of the dataset is as follows:

├── <Dataset>                      # IMGUR5K, IAM, or HierText
│   └── images_sample/            # Original input images
│       ├── <exampleId>.png
│       └── ...
├── <model>_<dataset>_inkml/      # Model outputs (e.g. large-i_IMGUR5K_inkml)
│   ├── d+t/                      # Derender with Text mode
│   ├── vanilla/                  # Vanilla Derendering mode
│   └── r+d/                      # Recognize and Derender mode

where <exampleId> is the unique identifier of the image and <model> is one of small-p, small-i, or large-i, and <dataset> is one of IMGUR5K, IAM, or HierText.

The digital ink traces are stored in .inkml format, which is a standard format for representing digital ink data. The format includes the following annotation fields:

application:
- Value: "InkSight"
- Indicates the model/system that generated the ink
sourceDataset:
- Values: "HierText", "IMGUR5K", or "IAM"
- Original dataset where the input image comes from
inferenceMode:
- Values: "Derender with Text", "Vanilla", or "Recognize and Derender"
- Indicates which model inference mode was used:
  - "Derender with Text": Uses OCR text for guidance
  - "Vanilla": Direct derendering without text input
  - "Recognize and Derender": Combines recognition and derendering
exampleId:
- Example: "0d44ea16f816055c_76"
- Unique identifier matching the original dataset
textField:
- Contains the text content
- For "Derender with Text" mode: OCR text used for guidance
- For "Recognize and Derender": Recognized text from the model

Inference Modes

d+t (Derender with Text): Uses OCR (Google Cloud Vision API) to recognize text in the image before derendering and feed it as guidance to the model.
vanilla: Direct derendering without text guidance.
r+d (Recognize and Derender): Asks the model to recognize text in the image and then derender the text.

Usage

Examples are provided in the colab notebook. The utils functions are organized in the utils folder. We also provide an example script to visualize samples from the dataset with different inference modes, use the demo script visualize_dataset.py:

# Show 3 samples from HierText dataset using Small-i model
python visualize_dataset.py --dataset HierText --num_samples 3 --model Small-i

Citation

If you use this dataset, please cite:

@article{mitrevski2024inksight,
  title={InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write},
  author={Mitrevski, Blagoj and Rak, Arina and Schnitzler, Julian and Li, Chengkun and Maksai, Andrii and Berent, Jesse and Musat, Claudiu},
  journal={arXiv preprint arXiv:2402.05804},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset.md

dataset.md

InkSight Dataset Documentation

Dataset Overview

Download the Dataset

Dataset Structure and Format

Inference Modes

Usage

Citation

Files

dataset.md

Latest commit

History

dataset.md

File metadata and controls

InkSight Dataset Documentation

Dataset Overview

Download the Dataset

Dataset Structure and Format

Inference Modes

Usage

Citation