Skip to content

Backends

Jordan Matelsky edited this page May 3, 2022 · 4 revisions

At this time, all backends support all features. You do not need to interact directly with a backend beyond specifying one for your Grand Graph objects:

import grand
from grand.backends import SQLBackend

grand.Graph(backend=SQLBackend())

There are currently four supported backends:

NetworkXBackend

import grand
from grand.backends import NetworkXBackend

grand.Graph(backend=NetworkXBackend())

If you're aiming for performance, you can most likely ignore this one; this is a backend that simply passes all operations through to a NetworkX graph.

Best For:

This is most relevant for performance benchmarking and feature-consistency checks.

Warnings:

None

DataFrameBackend

import grand
from grand.backends import DataFrameBackend

grand.Graph(backend=DataFrameBackend())

Data operations are performed on pandas-like dataframes.

Best For:

Because the data never leave dataframe format, you can use dask.dataframes or modin.pandas.dataframes in order to improve distributed computing capabilities.

Warnings:

None

NetworkitBackend

import grand
from grand.backends.networkit import NetworkitBackend

grand.Graph(backend=NetworkitBackend())

This backend uses the Networkit library as a basis for a graph. Because networkit does not support metadata or named nodes, a separate metadata store is also employed to keep track of this information.

Best For:

Manipulating or creating a graph using other library dialects, and then ejecting the backend graph to run high-speed, performant graph algorithms.

Warnings

No native metadata support, so there is some overhead when generating edges and nodes in order to index them in a separate datastore.

IGraphBackend

import grand
from grand.backends.igraph import IGraphBackend

grand.Graph(backend=IGraphBackend())

This backend uses the igraph-python library as a basis for a graph.

Best For:

Manipulating or creating a graph using other library dialects, and then ejecting the backend graph to run high-speed, performant graph algorithms.

Warnings

Note that metadata and named-node support in IGraph is not perfect, and you may encounter bizarre behavior if many of your nodes have the same names. There is a slight overhead for indexing operations on named nodes.

SQLBackend

import grand
from grand.backends import SQLBackend

grand.Graph(backend=SQLBackend())

This backend relays operations to a SQL database. Interestingly, it is faster to ingest data into the SQLBackend than into the NetworkXBackend, so for large data ingests from an edgelist, it may be advantageous to use a SQLBackend instead of vanilla NetworkX, even if you don't care about other Grand features.

Best For:

Quick ingests of data and fast operations on the structure of a graph out-of-memory.

Warnings:

Data IO is slower and highly dependent upon where the SQL database lives: If you're using a file on disk (sqlite):

import grand
from grand.backends import SQLBackend

grand.Graph(backend=SQLBackend("sqlite:///my-file.db"))

...you may find that the operations are slower than if you're using a true SQL database service, or an in-memory sqlite (indicated by passing no string to the SQLBackend constructor).

DynamoDBBackend

import grand
from grand.backends import DynamoDBBackend

grand.Graph(backend=DynamoDBBackend())

This backend relays operations to a DynamoDB database. All metadata attributes are "promoted" to top-level attributes in the table, so DynamoDB scan and query operations work on any metadata attribute in your nodes or edges. This means that even on Very Large Graphs, attribute queries are still quite speedy.

Best For:

Extremely large graphs (>10s of GB to TB). Also perfect for compatibility with GrandIso-Cloud, which is arguably the fastest subgraph monomorphism library for graphs of this size.

Warnings:

All data IO is done in single atomic calls to the server, so adding a billion edges takes a long time. Fixes for this are currently under experimentation.