Java implementation of Succinct's core algorithms. This library provides the core algorithms for Succinct as described in the NSDI'15 paper.
This library has no external requirements.
To build your application with Succinct-Core, you can link against this library using Maven by adding the following dependency information to your pom.xml file:
<dependency>
<groupId>amplab</groupId>
<artifactId>succinct-core</artifactId>
<version>0.1.8</version>
</dependency>
The Succinct-Core library exposes Succinct in three layers:
SuccinctCore
SuccinctFile
SuccinctIndexedFile
SuccinctCore
exposes the basic construction primitive for all internal
internal data-structures, along with accessors to the core data-structures
(e.g., NPA, SA and ISA, which are termed as NextCharIdx, Input2AOS and AOS2Input
in the paper).
An implementation of the same is at SuccinctBuffer
.
SuccinctFile
builds on top of SuccinctCore
and exposes the interface for
three main functionalities:
byte[] extract(int offset, int length)
long[] search(byte[] query)
long count(byte[] query)
These primitives allow random access (extract
) and search (count
, search
)
directly on the compressed representation of flat-file (i.e., unstructured)
data. SuccinctFileBuffer
is a ByteBuffer based implementation of SuccinctFile. Look at this
example to
see how SuccinctFileBuffer
can be used.
Finally, SuccinctIndexedFile
builds on the functionality of both SuccinctCore
and SuccinctFile
to expose a record buffer, i.e., a collection of records.
This interface finds app;ications in the Succinct on Apache Spark interfaces,
particularly in SuccinctRDD
and SuccinctTableRDD
implementations.
We provide an example
program that outlines the usage of count
, search
and extract
functionalities of the SuccinctFile
. A convenient script is included in the
bin/
directory to run the example. The usage of the script is as follows:
./bin/succinct-shell <file-name>
where filename is the name of the file being analyzed.