-
Notifications
You must be signed in to change notification settings - Fork 6
Backends
At this time, all backends support all features. You do not need to interact directly with a backend beyond specifying one for your Grand Graph objects:
import grand
from grand.backends import SQLBackend
grand.Graph(backend=SQLBackend())
There are currently four supported backends:
import grand
from grand.backends import NetworkXBackend
grand.Graph(backend=NetworkXBackend())
If you're aiming for performance, you can most likely ignore this one; this is a backend that simply passes all operations through to a NetworkX graph.
This is most relevant for performance benchmarking and feature-consistency checks.
None
import grand
from grand.backends import DataFrameBackend
grand.Graph(backend=DataFrameBackend())
Data operations are performed on pandas-like dataframes.
Because the data never leave dataframe format, you can use dask.dataframes or modin.pandas.dataframes in order to improve distributed computing capabilities.
None
import grand
from grand.backends.networkit import NetworkitBackend
grand.Graph(backend=NetworkitBackend())
This backend uses the Networkit library as a basis for a graph. Because networkit does not support metadata or named nodes, a separate metadata store is also employed to keep track of this information.
Manipulating or creating a graph using other library dialects, and then ejecting the backend graph to run high-speed, performant graph algorithms.
No native metadata support, so there is some overhead when generating edges and nodes in order to index them in a separate datastore.
import grand
from grand.backends.igraph import IGraphBackend
grand.Graph(backend=IGraphBackend())
This backend uses the igraph-python library as a basis for a graph.
Manipulating or creating a graph using other library dialects, and then ejecting the backend graph to run high-speed, performant graph algorithms.
Note that metadata and named-node support in IGraph is not perfect, and you may encounter bizarre behavior if many of your nodes have the same names. There is a slight overhead for indexing operations on named nodes.
import grand
from grand.backends import SQLBackend
grand.Graph(backend=SQLBackend())
This backend relays operations to a SQL database. Interestingly, it is faster to ingest data into the SQLBackend than into the NetworkXBackend, so for large data ingests from an edgelist, it may be advantageous to use a SQLBackend instead of vanilla NetworkX, even if you don't care about other Grand features.
Quick ingests of data and fast operations on the structure of a graph out-of-memory.
Data IO is slower and highly dependent upon where the SQL database lives: If you're using a file on disk (sqlite):
import grand
from grand.backends import SQLBackend
grand.Graph(backend=SQLBackend("sqlite:///my-file.db"))
...you may find that the operations are slower than if you're using a true SQL database service, or an in-memory sqlite (indicated by passing no string to the SQLBackend constructor).
import grand
from grand.backends import DynamoDBBackend
grand.Graph(backend=DynamoDBBackend())
This backend relays operations to a DynamoDB database. All metadata attributes are "promoted" to top-level attributes in the table, so DynamoDB scan
and query
operations work on any metadata attribute in your nodes or edges. This means that even on Very Large Graphs, attribute queries are still quite speedy.
Extremely large graphs (>10s of GB to TB). Also perfect for compatibility with GrandIso-Cloud, which is arguably the fastest subgraph monomorphism library for graphs of this size.
All data IO is done in single atomic calls to the server, so adding a billion edges takes a long time. Fixes for this are currently under experimentation.