Skip to content

tuplejump/ddf-flink

This branch is 41 commits behind ddf-project/ddf-flink:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

3b57400 · Nov 2, 2015
Oct 19, 2015
Oct 19, 2015
Oct 19, 2015
Oct 19, 2015
Nov 2, 2015
Oct 14, 2015
Jul 26, 2015
Oct 22, 2014
Jun 2, 2015
Sep 30, 2014
Oct 12, 2015
Oct 30, 2015
Aug 4, 2015
Oct 2, 2015

Repository files navigation

DDF with Flink

This project depends on DDF and uses Apache Flink engine.

DDF

Distributed DataFrame: Productivity = Power x Simplicity For Big Data Scientists & Engineers


Getting Started

This project depends on DDF v1.4.0-SNAPSHOT and requires its installation to run. To get DDF version 1.4.0-SNAPSHOT, clone DDF repo and checkout the tuplejump-integration branch.

$ git clone [email protected]:ddf-project/DDF.git
$ cd DDF
$ git fetch
$ git checkout tuplejump-integration

No changes are required when installing DDF using maven.

Before installing DDF using SBT, add a new line after line#482 in project/RootBuild.scala, (don't miss adding the comma at the end of line#482)

  ),

publishArtifact in (Compile, packageDoc) := false

This is to avoid the error in publishing docs through SBT.

DDF can be installed by,

$ bin/run-once.sh
//using maven
$ mvn package install -DskipTests
//or using sbt
$ sbt publishLocal

Installing ddf-with-flink can be done by

$ git clone [email protected]:tuplejump/ddf-with-flink.git
$ cd ddf-with-flink
$ bin/run-once.sh
$ mvn package install -DskipTests

Running tests

Tests can be run either through SBT or Maven,

$ sbt test
$ mvn test

//running a single test

$ sbt "testOnly *FlinkDDFManagerSpec*"

$ mvn test -Dsuites='io.ddf.flink.FlinkDDFManagerSpec'

Starting ddf-shell with flink engine

Execute the following only after installing ddf-with-flink

$ sbt package
$ bin/ddf-shell

SBT package is required since it generates the lib_managed which is required for running the scripts.

Running the example,

$ sbt package
$ bin/run-flink-example io.ddf.flink.examples.FlinkDDFExample

SBT package is required since it generates the lib_managed which is required for running the scripts.

####Todo

  1. Test the ML method getConfusionMatrix
  2. Implement transformPython and flattenDDF for TransformationHandler and also test the R functions.
  3. Implement the methods r2score, residuals, roc and rmse for MLMetricsSupporter

About

DDF with Flink Implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 80.4%
  • Shell 13.7%
  • Java 5.5%
  • Batchfile 0.4%