Skip to content

Latest commit

 

History

History

MapReduce

MapReduce Practice

Overview

Practice Detail Reference
Word count Simple word count and generator version example link
Daily exchange rate Output currency FX change day to day as a percentage link
Word count and Text mining Two example: word count with combine step and simple NLP link
Sparse Matrix Multiplication Calculate sparse matrix multiplication
Document Similarity Comparison Calculate similaritiy and keep value above the threshold
Big Database Table Join Mechanism of NoSQL database join

Word Count

General Version - Simple example

Steps:

## Example 1
# Go to project directory
$ cd GeneralWordCount
# Try mapper
$ echo "foo foo quux labs foo bar quux" | ./mapper.py
foo     1
foo     1
quux    1
labs    1
foo     1
bar     1
quux    1
# Try entire procedure
$ echo "foo foo quux labs foo bar quux" | ./mapper.py | sort -k1,1 | ./reducer.py
bar     1
foo     3
labs    1
quux    2

Example 2

## Example 2
# Go to project directory
$ cd GeneralWordCount
# Try mapper
$ echo "X B B\nC B A\nX A C\n" | ./mapper.py
X       1
B       1
B       1
C       1
B       1
A       1
X       1
A       1
C       1
# Try entire procedure
$ echo "X B B\nC B A\nX A C\n" | ./mapper.py | sort -k1,1 | ./reducer.py
A       2
B       3
C       2
X       2

Generator Version - Ebook example

Without Hadoop Steps:

# Go to project directory
$ cd GeneratorWordCount
# Download Data (already downloaded in data/)
$ bash download_data.sh
# Analysis
$ cat data/*.txt | ./mapper.py | ./reducer.py

Daily Exchange Rate

Without Hadoop Steps:

# Go to project directory
$ cd DailyExchangeRate
# Download Data (already downloaded in data/)
$ bash download_data.sh
# Analysis
$ cat data/daily.csv | python3 mapper.py | python3 reducer.py

Word Count and Text Mining

Word Count with Combine Step

Without Hadoop Steps:

# Go to project directory
$ cd WordCountCombine
# Download Data (already downloaded in data/)
$ bash download_data.sh
# Analysis (without combine)
$ cat data/44604.txt.utf-8 | python3 mapper.py | sort -k1,1 | python3 reducer.py
# Analysis (with combine)
# TBD

Text Mining

Without Hadoop Steps:

# Go to project directory
$ cd TextMining
# Download NLTK
$ bash download_NLTK.sh
# Analysis (we'll use the data in the last example above)
$ cat ../WordCountCombine/data/44604.txt.utf-8 | python3 mapper.py | python3 reducer.py

Sparse Matrix Multiplication

Matrix to Sparse matrix util