kinescript

kinescript is a lightweight CLI tool that turns local screen interactions into a structured dataset for multimodal AI training. It records your on-screen actions (mouse and keyboard) and captures screenshots, then processes them with OCR and optional LLM steps to produce JSONL examples.

🤔 Why kinescript?

Creating high-quality multimodal datasets from real desktop workflows is often painful:

Manual and repetitive: Taking screenshots and writing notes for each click/keypress is tedious and error-prone.
Inconsistent formats: Logs, images, and annotations end up in different places and formats.
Heavy alternatives: Full-fledged RPA/UIs or custom capture apps are complex to set up when you just need reproducible traces.

kinescript provides a simpler path:

🚀 Simple & focused: A small CLI with two commands: record and process.
Event-driven capture: Listens for mouse/keyboard events and snapshots the screen on each event.
Structured outputs: Stores screenshots and an actions.jsonl log you can post-process deterministically.
OCR + LLM-ready: Extracts text with Tesseract OCR and leaves a clean placeholder for LLM-based generation (OpenAI, Gemini, Ollama later).

✨ Features

CLI (Typer): Clear UX for record and process commands.
Recording (mss, pynput): Captures screenshots and logs mouse/keyboard actions.
Processing (Pillow, pytesseract, pandas): Runs OCR on images and joins with action logs.
Dataset output: Produces a final dataset.jsonl suitable for multimodal training pipelines.
Pluggable LLM step: Placeholder designed for provider-agnostic integration (OpenAI/Gemini/Ollama) later.

🏗️ Architecture

kinescript runs locally as a Python process that (1) records events and screenshots, and (2) processes recorded sessions into structured data.

+------------------------------+            +-----------------------------+
|        Recording Step        |            |       Processing Step       |
|   (kinescript record)        |            |    (kinescript process)     |
+------------------------------+            +-----------------------------+
| - Mouse/keyboard listeners   |            | - Load actions.jsonl        |
| - Screenshot on each event   |  images →  | - OCR (pytesseract)         |
| - Write actions.jsonl        |  actions → | - Join image+action+OCR     |
+--------------+---------------+            +---------------+-------------+
               |                                                |
               v                                                v
        session directory                               dataset.jsonl

🚀 Getting Started

Prerequisites

Python 3.9+
Tesseract OCR installed on your system
- macOS: brew install tesseract
- Ubuntu/Debian: sudo apt-get install tesseract-ocr
- Windows: install Tesseract and ensure it is on PATH (see project docs)

Installation

Using uv (recommended):

# create a virtualenv
uv venv
# activate it
source .venv/bin/activate  # Windows: .venv\Scripts\activate
# editable install
uv pip install -e .

Using pip:

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e .

Usage

Record a session:

kinescript record --output-dir ./sessions/session-001
# Stop recording with ESC key

Process a recorded session:

kinescript process \
  --input-dir ./sessions/session-001 \
  --output-file ./dataset.jsonl

You can also run via the module entry point:

python -m kinescript record --output-dir ./sessions/session-001
python -m kinescript process --input-dir ./sessions/session-001 --output-file ./dataset.jsonl

Output Layout

sessions/session-001/
├─ images/
│  ├─ 0001.png
│  ├─ 0002.png
│  └─ ...
└─ actions.jsonl

🧭 Roadmap (high level)

Selective window capture and region filters
Configurable event filters and sampling strategies
Annotation/preview UI (optional)
LLM integration (provider-agnostic; OpenAI/Gemini/Ollama)
Export adapters for popular training formats

🙌 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines, development setup, and workflow.

📄 License

Licensed under the Apache License, Version 2.0. See LICENSE for full text.

🧰 Local LLM (Ollama) Setup

You can generate Q/A locally using Ollama.

1) Install

macOS: brew install ollama
Linux: follow your distro guide or curl -fsSL https://ollama.com/install.sh | sh
Windows: use the official installer

2) Prepare and run a model

ollama pull llama3.1
ollama serve  # if needed (often runs automatically in background)

Default endpoint is http://localhost:11434.

3) Integrate with kinescript

You can tweak defaults via environment variables. Place a .env file at the project root to load automatically.

Example (.env):

# Provider: ollama | openai | gemini
KINESCRIPT_LLM_PROVIDER=ollama

# Ollama settings
KINESCRIPT_OLLAMA_BASE_URL=http://localhost:11434
KINESCRIPT_OLLAMA_MODEL=llama3.1
KINESCRIPT_OLLAMA_TIMEOUT=30

# Max number of Q/A pairs
KINESCRIPT_QA_MAX_PAIRS=3

Then run processing with Q/A generation:

kinescript process \
  --input-dir ./sessions/session-001 \
  --output-file ./dataset.jsonl

🔁 Using Other LLM Providers (.env-based)

processor.py lets you choose an LLM provider via environment variables. A .env file at the repo root will be picked up automatically.

Select provider

KINESCRIPT_LLM_PROVIDER=ollama   # or openai, gemini

OpenAI example

KINESCRIPT_LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...  # required
KINESCRIPT_OPENAI_MODEL=gpt-4o-mini
KINESCRIPT_OPENAI_TIMEOUT=30
# Optional: shared temperature for all providers
KINESCRIPT_LLM_TEMPERATURE=0.2

Google Gemini example

KINESCRIPT_LLM_PROVIDER=gemini
GOOGLE_API_KEY=AIza...  # required
KINESCRIPT_GEMINI_MODEL=gemini-1.5-flash
KINESCRIPT_GEMINI_TIMEOUT=30
# Optional: shared temperature
KINESCRIPT_LLM_TEMPERATURE=0.2

How it works

.env → environment variables → KINESCRIPT_LLM_PROVIDER decides which provider to call (Ollama/OpenAI/Gemini).
If the response is not valid JSON, the parser safely returns an empty list.
Number of Q/A pairs is limited by KINESCRIPT_QA_MAX_PAIRS.

Notes

When using OpenAI/Gemini, ensure you understand platform costs and rate limits.
In corporate networks or behind proxies, requests may fail due to network restrictions.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/kinescript		src/kinescript
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

kinescript

🤔 Why kinescript?

✨ Features

🏗️ Architecture

🚀 Getting Started

Prerequisites

Installation

Usage

Output Layout

🧭 Roadmap (high level)

🙌 Contributing

📄 License

🧰 Local LLM (Ollama) Setup

1) Install

2) Prepare and run a model

3) Integrate with kinescript

🔁 Using Other LLM Providers (.env-based)

Select provider

OpenAI example

Google Gemini example

How it works

Notes

About

Uh oh!

Releases

Packages

Languages

License

sanspareilsmyn/kinescript

Folders and files

Latest commit

History

Repository files navigation

kinescript

🤔 Why kinescript?

✨ Features

🏗️ Architecture

🚀 Getting Started

Prerequisites

Installation

Usage

Output Layout

🧭 Roadmap (high level)

🙌 Contributing

📄 License

🧰 Local LLM (Ollama) Setup

1) Install

2) Prepare and run a model

3) Integrate with kinescript

🔁 Using Other LLM Providers (.env-based)

Select provider

OpenAI example

Google Gemini example

How it works

Notes

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages