Skip to content

adithyaakrishna/moondream-ts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a5ec2d3 · Nov 16, 2024

History

2 Commits
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024
Nov 16, 2024

Repository files navigation

Moondream TypeScript Client

A lightweight TypeScript client for the Moondream AI vision-language model. This client provides an easy-to-use interface for interacting with the Moondream model, supporting both image captioning and visual question answering.

Features

  • Image captioning
  • Visual question answering
  • Streaming support for real-time responses
  • Support for multiple image input types (ImageData, HTMLImageElement, File)
  • Configurable settings via environment variables or constructor options
  • Both CommonJS and ESM builds
  • TypeScript support out of the box

Installation

Clone the repository:

git clone https://github.com/yourusername/moondream-ts.git
cd moondream-ts

# Using pnpm (recommended)
pnpm install

# Build the project
pnpm build

Usage

Basic Usage

import { VL } from './dist';

// Initialize the client
const vl = new VL();

// Generate a caption for an image
const captionResult = await vl.caption(imageFile);
console.log(captionResult.caption);

// Ask a question about an image
const queryResult = await vl.query(imageFile, "What is in this image?");
console.log(queryResult.answer);

Streaming Responses

// Stream caption tokens
const streamResult = await vl.caption(imageFile, 'normal', true);
for await (const token of streamResult.caption) {
  process.stdout.write(token);
}

// Stream query response
const queryStream = await vl.query(
  imageFile, 
  "What is in this image?", 
  true
);
for await (const token of queryStream.answer) {
  process.stdout.write(token);
}

Configuration

You can configure the client either through environment variables or constructor options.

Environment Variables

Create a .env file in your project root:

MOONDREAM_BASE_URL=http://localhost:3000
MOONDREAM_MAX_TOKENS=2048

Constructor Options

const vl = new VL({
  baseUrl: 'http://localhost:3000',
  timeout: 5000
});

Advanced Usage

// Custom sampling settings
const result = await vl.caption(imageFile, 'normal', false, {
  maxTokens: 100
});

// Pre-encode image for multiple queries
const encodedImage = await vl.encodeImage(imageFile);
const caption = await vl.caption(encodedImage);
const answer = await vl.query(encodedImage, "What colors do you see?");

Development

Setup Development Environment

  1. Clone and install dependencies:
git clone https://github.com/yourusername/moondream-ts.git
cd moondream-ts
pnpm install
  1. Start development:
pnpm dev

Running Tests

# Run tests once
pnpm test

# Run tests in watch mode
pnpm test:watch

Linting and Formatting

# Run ESLint
pnpm lint

# Format code with Prettier
pnpm format

API Reference

VL Class

Constructor

new VL(config?: ClientConfig)

Methods

caption()
async caption(
  image: ImageData | HTMLImageElement | File | EncodedImage,
  length?: string,
  stream?: boolean,
  settings?: SamplingSettings
): Promise<CaptionOutput>
query()
async query(
  image: ImageData | HTMLImageElement | File | EncodedImage,
  question: string,
  stream?: boolean,
  settings?: SamplingSettings
): Promise<QueryOutput>
encodeImage()
async encodeImage(
  image: ImageData | HTMLImageElement | File | EncodedImage
): Promise<EncodedImage>

Types

interface ClientConfig {
  baseUrl?: string;
  timeout?: number;
}

interface SamplingSettings {
  maxTokens?: number;
}

interface CaptionOutput {
  caption: string | AsyncGenerator<string, void, unknown>;
}

interface QueryOutput {
  answer: string | AsyncGenerator<string, void, unknown>;
}

About

Moondream TS Client

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published