-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathindex.html
291 lines (257 loc) · 12 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
<!DOCTYPE html>
<html class="no-js" lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="keywords" content="DDF, Distributed DataFrame, productivity, distributed, java, data science, R, Python, scala" />
<meta name="robots" content="index,follow,noarchive" />
<title>Distributed DataFrame</title>
<!-- Hammer reload -->
<script>
setInterval(function(){
try {
if(typeof ws != 'undefined' && ws.readyState == 1){return true;}
ws = new WebSocket('ws://'+(location.host || 'localhost').split(':')[0]+':35353')
ws.onopen = function(){ws.onclose = function(){document.location.reload()}}
ws.onmessage = function(){
var links = document.getElementsByTagName('link');
for (var i = 0; i < links.length;i++) {
var link = links[i];
if (link.rel === 'stylesheet' && !link.href.match(/typekit/)) {
href = link.href.replace(/((&|\?)hammer=)[^&]+/,'');
link.href = href + (href.indexOf('?')>=0?'&':'?') + 'hammer='+(new Date().valueOf());
}
}
}c
}catch(e){}
}, 1000)
</script>
<!-- /Hammer reload -->
<link rel='stylesheet' href='assets/css/normalize.css'>
<link rel='stylesheet' href='assets/js/modernizr/test/caniuse_files/style.css'>
<link rel='stylesheet' href='assets/scss/app.css'>
<link rel='stylesheet' href='assets/js/prismjs/prism.css'>
<link rel='stylesheet' href='assets/css/extra.css'>
<!-- <link href='http://fonts.googleapis.com/css?family=Arimo:400,700|Open+Sans:300|Roboto:400,900' rel='stylesheet' type='text/css'>
-->
<link href='http://fonts.googleapis.com/css?family=Ubuntu:300,400|Tenor+Sans' rel='stylesheet' type='text/css'>
<link href="//maxcdn.bootstrapcdn.com/font-awesome/4.1.0/css/font-awesome.min.css" rel="stylesheet">
<link href="assets/favicon.ico" rel="shortcut icon">
<link href="assets/apple-touch-icon.png" rel="apple-touch-icon">
</head>
<body class="home">
<div class="wrapper">
<header class="desktop-menu">
<ul>
<li class="logo"><a href="/">Home</a></li>
<li class="nav-quickstart"><a href="quickstart.html">Getting Started</a></li>
<li class="nav-intro"><a href="introduction.html">Introduction</a></li>
<!-- <li class="nav-design"><a href="design.html">Design Document</a></li> -->
<li class="nav-api"><a href="https://github.com/ddf-project/DDF" target="_blank">Github</a></li>
<!-- <li class="nav-api"><a href="http://ddf.io/quickstart.html#markdown/downloads.html" target="_blank">Download</a></li> -->
<li class="nav-people"><a href="quickstart.html#markdown/developers.html">Contributors</a></li>
<li class="nav-people"><a href="https://groups.google.com/forum/#!forum/ddf-project">Community</a></li>
<!-- <li class="nav-plans"><a href="plans.html">Plans</a></li> -->
<li class="nav-contribute">
<a href="http://ddf.io/contribution.html">Contribute</a>
</li>
</ul>
</header>
<header class="mobile-menu">
<ul>
<li class="logo">
<a href="/">Home</a>
<span class="menu-button">
<i class="fa fa-bars fa-lg"></i>
</span>
</li>
<li class="nav-quickstart"><a href="quickstart.html">Getting Started</a></li>
<li class="nav-intro"><a href="introduction.html">Introduction</a></li>
<!-- <li class="nav-design"><a href="design.html">Design Document</a></li> -->
<li class="nav-api"><a href="https://github.com/ddf-project/DDF" target="_blank">Github</a></li>
<!-- <li class="nav-api"><a href="http://ddf.io/quickstart.html#markdown/downloads.html" target="_blank">Download</a></li> -->
<li class="nav-people"><a href="quickstart.html#markdown/developers.html">Contributors</a></li>
<li class="nav-people"><a href="https://groups.google.com/forum/#!forum/ddf-project">Community</a></li>
<!-- <li class="nav-plans"><a href="plans.html">Plans</a></li> -->
<li class="nav-contribute-mobile">
<a href="http://ddf.io/contribution.html">Contribute</a>
</li>
</ul>
</header>
<section class="row" id="ddf-intro">
<div class="content">
<h1>
Distributed DataFrame</h1>
<h3>Simplify Analytics on Disparate Data Sources <br>via a Uniform API Across Engines</h3>
<div class="cta">
<a href="quickstart.html" class="call-to-action button get-started radius expand">Get Started</a>
</div>
<div class="secondary-cta">
<a href="https://s3.amazonaws.com/ddf-project/binary/ddf_binary.zip" class="">Download Binary</a>
   |   
<a href="https://github.com/ddf-project/DDF/archive/master.zip" class="">Download Source Code</a>
</div>
<div class="large-12 columns">
<div class="row" id="section1">
<div class="large-5 columns">
<ul class="benefits">
<font size="5">
A simple yet powerful API above & across multiple data and compute engines. You now can:
</font>
<br>
<!--<li class="fa fa-check"> Table-like abstraction on top of big data</li>-->
<li><i class="fa fa-space-shuttle"></i> Process data at-source</li>
<li><i class="fa fa-space-shuttle"></i> Bypass the absolute requirement for a Hadoop data lake</li>
<li><i class="fa fa-space-shuttle"></i> Future-proof your analytics applications against rapidly changing data and compute engine landscape</li>
</ul>
</div>
<div class="large-7 columns">
<div class="terminal">
<ul class="tabs" data-tab>
<li class="tab-title active"><a href="#ddf-java"><h5>DDF with Java</h5></a></li>
<li class="tab-title"><a href="#ddf-r"><h5>DDF with R</h5></a></li>
<li class="tab-title"><a href="#ddf-python"><h5>DDF Python</h5></a></li>
</ul>
<div class="tabs-content">
<div class="content active" id="ddf-java">
<pre class="language-clike">
<code class="language-clike">//To start working with a DDF-on-Spark cluster:
DDFManager smanager = DDFManager.get("spark");
//Then, data can be loaded into a SparkDDF as follows:
DDF table = smanager.sql2ddf("select * from airline", false);
/* ETL, transform */
table = table.transform("dist= round(distance/2, 2)");
/* Run Machine learning using MLlib, then run prediction */
KMeansModel kmeansModel = (KMeansModel) ddf.ML.train("kmeans", 5, 5).getRawModel();
Int prediction = ddf.ML.applyModel(kmeansModel, false, true)
// To start working with flink:
DDFManager fmanager = DDFManager.get("flink");
// Data can be loaded into FlinkDDF as follows:
DDF flinkTable = fmanager.sql2ddf("select * from airline", false);
/* ETL, SQL query */
flinkTable.sql("select * from @this", "Error in SQL");
</code>
</pre>
</div>
<div class="content" id="ddf-r">
<pre class="language-clike">
<code class="language-clike">#Create DDF manager to run on Spark engine
dm <- DDFManager("spark")
#create DDF from table
ddf <- sql2ddf(dm, "select * from mtcars")
/*Basic Stats*/
#return number of columns/rows
ncol(ddf)
nrow(ddf)
#run standard summary on ddf
summary(ddf)</code>
</pre>
</div>
<div class="content" id="ddf-python">
<pre class="language-clike">
<code class="language-clike">#Create DDF manager to run on Spark engine
dm = DDFManager("spark")
#create DDF from table
ddf = dm.sql2ddf("select * from airline_na")
/* Clean data */
#drop NA
ddf.dropNA()
/* Basic Stats */
#get num rows and number of columns
ddf.getNumRows()
ddf.getNumColumns()</code>
</pre>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="row" id="implementation">
<div class="content">
<div class="large-12 columns">
<h2>Implemented on</h2>
<ul>
<li>
<img src="images/spark.png">
</li>
<li>
<img src="images/flink.png">
</li>
<li>
<img src="images/jdbc.png">
</li>
</ul>
</div>
</div>
</section>
<section class="row" id="ddf-principles">
<div class="content">
<div class="large-12 columns">
<h2>DDF Principles</h2>
</div>
<div class="large-4 columns">
<h4>The ease of app development on RDBMS</h4>
<p>
The SQL abstraction has boosted app developer productivity tremendously, hiding away all the complexity and diversity of the database engines underneath.
</p>
</div>
<div class="large-4 columns">
<h4>
The sophistication of R</h4>
<p>
For decades, data analysis idioms and packages have evolved around the powerful concept of the data.frame, from basic data transformation, filtering and projection, to advanced data mining and machine learning.
</p>
</div>
<div class="large-4 columns">
<h4>
The scale of parallel, distributed computing</h4>
<p>
Thanks to technologies like Hadoop MapReduce, <a href="http://spark.apache.org/" target="_blank">Apache Spark</a>, and other parallel computing frameworks, big compute capabilities have become widely available.
</p>
</div>
<br>
</div>
</section>
<section id="testimonials">
<div class="content">
<h2>What People Are Saying About DDF</h2>
<ul class="testimonials">
<li>
I'm working on making pandas work better with Spark, and it seems like wrapping pandas around DDF would be really cool.<span>@holdenkarau</span>
</li>
<li><strong>@adataoinc</strong> is going to open-source their amazing work with distributed data frames, amazing! #sparksummit.<span>@davidbgonzalez</span></li>
<li>We are a team .. that would be very interested in getting this to open source release. We have ideas to build on top of a smoother abstraction than RDD and Schema RDD.<span>@siditweet</span></li>
<li>I really think that Spark (and in general Scala and Java) was lacking precisely the data frame concept that is so handy when you do R and data analysis.<span>@carlosfuertes</span></li>
</ul>
</div>
</section>
<div class="push"></div>
</div>
<footer>
<div class="content">
Project supported by <a href="http://www.adatao.com" target="_blank">Adatao</a>
<div class="social">
<a class="github-button" href="https://github.com/ddf-project/DDF" data-count-href="/ddf-project/DDF/stargazers" data-count-api="/repos/ddf-project/DDF#stargazers_count" data-count-aria-label="# stargazers on GitHub" aria-label="Star ddf-project/DDF on GitHub">Star</a>
<a class="github-button" href="https://github.com/ddf-project/DDF/fork" data-count-href="/ddf-project/DDF/network" data-count-api="/repos/ddf-project/DDF#forks_count" data-count-aria-label="# forks on GitHub" aria-label="Fork ddf-project/DDF on GitHub">Fork</a>
</div>
</div>
</footer>
<script src='assets/js/jquery-1.8.3.min.js'></script>
<script src='assets/js/modernizr/modernizr.js'></script>
<script src='assets/js/foundation/js/foundation.js'></script>
<script src='assets/js/app.js?v=20151112'></script>
<script src='assets/js/prismjs/prism.js'></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-55967251-1', 'auto');
ga('send', 'pageview');
</script>
<script async defer id="github-bjs" src="https://buttons.github.io/buttons.js"></script>
</body>
</html>