A Python library for building and querying property graphs using DuckDB.
PQG provides a middle ground between full-featured property graph databases (like Neo4j) and fully decomposed RDF stores. It uses a single-table design backed by DuckDB, making it fast, lightweight, and easy to use.
Key Features:
- Simple & Fast: Single table design with DuckDB backend
- Python Native: Work with Python dataclasses - no query language needed
- Automatic Decomposition: Complex nested objects automatically become nodes and edges
- Spatial Support: Built-in geographic data handling and GeoJSON export
- Multiple Formats: Export to Parquet, Graphviz, GeoJSON
- Type-Safe: Leverages Python type hints for data validation
import duckdb
from pqg import PQG, Base, Edge
from dataclasses import dataclass
from typing import Optional
# Define your data model
@dataclass
class Person(Base):
name: Optional[str] = None
age: Optional[int] = None
# Create a graph
db = duckdb.connect()
graph = PQG(db, source="my_graph")
graph.registerType(Person)
graph.initialize()
# Add nodes
alice = Person(pid="alice", name="Alice", age=30)
bob = Person(pid="bob", name="Bob", age=25)
graph.addNode(alice)
graph.addNode(bob)
# Add relationships
friendship = Edge(s="alice", p="knows", o=["bob"])
graph.addEdge(friendship)
db.commit()
# Query
for subject, predicate, obj in graph.getRelations():
print(f"{subject} {predicate} {obj}")- Getting Started Guide - Installation and basic concepts
- Tutorial 1: Basic Usage - Creating nodes and edges
- Tutorial 2: Complex Objects - Working with nested data
- Tutorial 3: Querying - Powerful graph queries
- Tutorial 4: Visualization - Creating visualizations
- User Guide - Comprehensive reference for all features
- CLI Reference - Command-line tool documentation
- iSamples Format - Domain-specific implementation
Every node has these properties:
row_id- Auto-incrementing integer primary keypid- Unique identifier (globally unique but not primary key)otype- Type/class of the nodelabel- Human-readable namedescription- Longer descriptionaltids- Alternative identifiers- Plus any custom properties you define
Edges follow the Subject-Predicate-Object pattern:
s- Subject row_id (integer reference to source node)p- Predicate (relationship type)o- Object row_ids (integer array referencing target nodes)n- Named graph (optional)
Note: The API accepts PIDs (strings) when creating edges, but internally PQG uses integer row_id references for performance. This conversion is automatic and transparent.
All data lives in one table, making queries fast and storage efficient:
CREATE TABLE node (
row_id INTEGER PRIMARY KEY DEFAULT nextval('row_id_sequence'),
pid VARCHAR UNIQUE NOT NULL,
otype VARCHAR,
label VARCHAR,
-- Edge fields
s INTEGER,
p VARCHAR,
o INTEGER[],
-- Your custom fields
name VARCHAR,
age INTEGER,
...
);When you add complex nested objects, PQG automatically:
- Separates them into individual nodes
- Creates edges to represent relationships
- Uses property names as relationship types
This means you work with natural Python objects while PQG handles the graph structure!
git clone https://github.com/isamplesorg/pqg.git
cd pqg
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
pqg --helpgit clone https://github.com/isamplesorg/pqg.git
cd pqg
uv sync
uv run pqg --helpRequirements: Python >= 3.11
Build interconnected knowledge bases with typed entities and relationships:
@dataclass
class Concept(Base):
term: Optional[str] = None
definition: Optional[str] = None
concept1 = Concept(pid="ml", term="Machine Learning")
concept2 = Concept(pid="ai", term="Artificial Intelligence")
# Link concepts
graph.addEdge(Edge(s="ml", p="part_of", o=["ai"]))Track samples, experiments, and publications:
@dataclass
class Sample(Base):
sample_id: Optional[str] = None
collection_date: Optional[str] = None
collector: Optional[str] = None # PID of Person (not Person object)
@dataclass
class Publication(Base):
title: Optional[str] = None
doi: Optional[str] = NoneTrack how data flows through transformations:
@dataclass
class Dataset(Base):
name: Optional[str] = None
version: Optional[str] = None
# Show lineage
graph.addEdge(Edge(s="processed_data", p="derived_from", o=["raw_data"]))Work with geographic data and relationships:
@dataclass
class Location(Base):
name: Optional[str] = None
latitude: Optional[float] = None
longitude: Optional[float] = None
# Export to GeoJSON
graph.asParquet(Path("locations.parquet"))
# Then: pqg geo locations.parquet > map.geojsonPQG includes CLI tools for exploring graphs:
# List node types
pqg types my_graph.parquet
# List relationship types
pqg predicates my_graph.parquet
# View a specific node
pqg node my_graph.parquet node_001
# View with all connections
pqg node my_graph.parquet node_001 --expand
# Show as tree
pqg tree my_graph.parquet node_001
# Export spatial data
pqg geo my_graph.parquet > output.geojsonSee the CLI Reference for complete documentation.
vs. Neo4j / Full Graph Databases:
- Lighter weight, no server required
- Easier deployment and maintenance
- File-based storage (Parquet)
- Perfect for analytics and data science
vs. RDF / Triple Stores:
- More intuitive Python API
- Better performance for analytical queries
- Simpler data model
- Native support for complex datatypes
vs. Relational Databases:
- Natural representation of connected data
- Easier to query relationships
- Automatic handling of nested objects
- Better for exploratory analysis
PQG is designed for analytical workloads:
- Fast reads: Columnar storage with DuckDB
- Efficient storage: Parquet compression
- Scalable: Handle millions of nodes/edges
- In-memory or on-disk: Flexible deployment
For production applications requiring high transaction throughput, consider a dedicated graph database.
Contributions are welcome! Please:
- Check existing issues or create a new one
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
See the LICENSE file for details.
If you use PQG in your research, please cite:
@software{pqg,
title = {PQG: Property Graph in DuckDB},
author = {iSamples Contributors},
url = {https://github.com/isamplesorg/pqg},
year = {2024}
}PQG is developed as part of the iSamples project.
- Documentation: docs/
- Issues: GitHub Issues
- Discussions: GitHub Discussions