Succinct is a data store that enables queries directly on a compressed representation of data. This repository maintains the Java implementations of Succinct's core algorithms, and applications that exploit them, such as a Apache Spark binding for Succinct.
Succinct is built using Apache Maven. To build Succinct and its component modules, run:
mvn clean package
The Succinct-Core module contains Java implementation of Succinct's core algorithms. See a more descriptive description of the core module here.
To build your application with Succinct-Core, you can link against this library using Maven by adding the following dependency information to your pom.xml file:
<dependency>
<groupId>amplab</groupId>
<artifactId>succinct-core</artifactId>
<version>0.1.8</version>
</dependency>
We provide Apache Spark and Apache Spark SQL interfaces for Succinct, which
expose a compressed, queryable RDD SuccinctRDD
, enabling manipulation of
unstructured data, and a SuccinctKVRDD
for querying semi-structured data
(key-value pairs, text and json documents, etc.). We also expose Succinct
as a DataSource in Apache Spark SQL as an experimental feature. More details on
the integration with Apache Spark can be found here.
To build your application to run with Succinct on Apache Spark, you can link against this library using Apache Maven by adding the following dependency information to your pom.xml file:
<dependency>
<groupId>amplab</groupId>
<artifactId>succinct-spark</artifactId>
<version>0.1.8</version>
</dependency>
Add the dependency to your SBT project by adding the following to build.sbt
(see the Spark Packages listing
for spark-submit and Maven instructions):
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "amplab" % "succinct" % "0.1.8"
The succinct-spark jar file can also be added to a Spark shell using the
--jars
command line option. For example, to include it when starting the
spark shell:
$ bin/spark-shell --jars succinct-0.1.8.jar