Skip to content

Unified data models and interfaces for syntactic and semantic frame ontologies.

License

aaronstevenwhite/glazing

Repository files navigation

Glazing

PyPI version Python versions CI Documentation License DOI

Unified data models and interfaces for syntactic and semantic frame ontologies.

Features

  • πŸš€ One-command setup: glazing init downloads and prepares all datasets
  • πŸ“¦ Type-safe models: Pydantic v2 validation for all data structures
  • πŸ” Unified search: Query across all datasets with consistent API
  • πŸ”— Cross-references: Automatic mapping between resources with confidence scores
  • 🎯 Fuzzy search: Find data with typos, spelling variants, and inconsistencies
  • 🐳 Docker support: Use via Docker without local installation
  • πŸ’Ύ Efficient storage: JSON Lines format with streaming support
  • 🐍 Modern Python: Full type hints, Python 3.13+ support

Installation

Via pip

pip install glazing

Via Docker

Build and run Glazing in a containerized environment:

# Build the image
git clone https://github.com/aaronstevenwhite/glazing.git
cd glazing
docker build -t glazing:latest .

# Initialize datasets (persisted in volume)
docker run --rm -v glazing-data:/data glazing:latest init

# Use the CLI
docker run --rm -v glazing-data:/data glazing:latest search query "give"
docker run --rm -v glazing-data:/data glazing:latest search query "transfer" --fuzzy

# Interactive Python session
docker run --rm -it -v glazing-data:/data --entrypoint python glazing:latest

See the installation docs for more Docker usage examples.

Quick Start

Initialize all datasets (one-time setup, ~54MB download):

glazing init

Then start using the data:

from glazing.search import UnifiedSearch

# Automatically uses default data directory after 'glazing init'
search = UnifiedSearch()
results = search.search("give")

for result in results[:5]:
    print(f"{result.dataset}: {result.name} - {result.description}")

CLI Usage

Search across datasets:

# Search all datasets
glazing search query "abandon"

# Search specific dataset
glazing search query "run" --dataset verbnet

# Find data with typos or spelling variants
glazing search query "realize" --fuzzy
glazing search query "organize" --fuzzy --threshold 0.8

Resolve cross-references:

# Extract cross-reference index (one-time setup)
glazing xref extract

# Find cross-references
glazing xref resolve "give.01" --source propbank
glazing xref resolve "give-13.1" --source verbnet

# Find data with variations or inconsistencies
glazing xref resolve "realize.01" --source propbank --fuzzy

Python API

Load and work with individual datasets:

from glazing.framenet.loader import FrameNetLoader
from glazing.verbnet.loader import VerbNetLoader

# Loaders automatically use default paths and load data after 'glazing init'
fn_loader = FrameNetLoader()  # Data is already loaded
frames = fn_loader.frames

vn_loader = VerbNetLoader()  # Data is already loaded
verb_classes = list(vn_loader.classes.values())

Cross-reference resolution:

from glazing.references.index import CrossReferenceIndex

# Automatic extraction on first use (cached for future runs)
xref = CrossReferenceIndex()

# Resolve references for a PropBank roleset
refs = xref.resolve("give.01", source="propbank")
print(f"VerbNet classes: {refs['verbnet_classes']}")
print(f"Confidence scores: {refs['confidence_scores']}")

# Find data with variations or inconsistencies
refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
print(f"Found match with fuzzy search: {refs['verbnet_classes']}")

Fuzzy search in Python:

from glazing.search import UnifiedSearch

# Find data with typos or spelling variants
search = UnifiedSearch()
results = search.search_with_fuzzy("organize", fuzzy_threshold=0.8)

for result in results[:5]:
    print(f"{result.dataset}: {result.name} (score: {result.score:.2f})")

Supported Datasets

  • FrameNet 1.7: Semantic frames and frame elements
  • PropBank 3.4: Predicate-argument structures
  • VerbNet 3.4: Verb classes with thematic roles
  • WordNet 3.1: Synsets and lexical relations

Documentation

Full documentation available at https://glazing.readthedocs.io.

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/aaronstevenwhite/glazing
cd glazing
pip install -e ".[dev]"

Citation

If you use Glazing in your research, please cite:

@software{glazing2025,
  author = {White, Aaron Steven},
  title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
  year = {2025},
  url = {https://github.com/aaronstevenwhite/glazing},
  doi = {10.5281/zenodo.17467082}
}

License

This package is licensed under an MIT License. See LICENSE file for details.

Links

Acknowledgments

This project was funded by a National Science Foundation (BCS-2040831) and builds upon the foundational work of the FrameNet, PropBank, VerbNet, and WordNet teams.