Practice | Detail | Reference |
---|---|---|
Word count | Simple word count and generator version example | link |
Daily exchange rate | Output currency FX change day to day as a percentage | link |
Word count and Text mining | Two example: word count with combine step and simple NLP | link |
Sparse Matrix Multiplication | Calculate sparse matrix multiplication | |
Document Similarity Comparison | Calculate similaritiy and keep value above the threshold | |
Big Database Table Join | Mechanism of NoSQL database join |
Steps:
## Example 1
# Go to project directory
$ cd GeneralWordCount
# Try mapper
$ echo "foo foo quux labs foo bar quux" | ./mapper.py
foo 1
foo 1
quux 1
labs 1
foo 1
bar 1
quux 1
# Try entire procedure
$ echo "foo foo quux labs foo bar quux" | ./mapper.py | sort -k1,1 | ./reducer.py
bar 1
foo 3
labs 1
quux 2
## Example 2
# Go to project directory
$ cd GeneralWordCount
# Try mapper
$ echo "X B B\nC B A\nX A C\n" | ./mapper.py
X 1
B 1
B 1
C 1
B 1
A 1
X 1
A 1
C 1
# Try entire procedure
$ echo "X B B\nC B A\nX A C\n" | ./mapper.py | sort -k1,1 | ./reducer.py
A 2
B 3
C 2
X 2
Without Hadoop Steps:
# Go to project directory
$ cd GeneratorWordCount
# Download Data (already downloaded in data/)
$ bash download_data.sh
# Analysis
$ cat data/*.txt | ./mapper.py | ./reducer.py
Without Hadoop Steps:
# Go to project directory
$ cd DailyExchangeRate
# Download Data (already downloaded in data/)
$ bash download_data.sh
# Analysis
$ cat data/daily.csv | python3 mapper.py | python3 reducer.py
- mapper.py
- combiner.py -> TBD
- reducer.py
Without Hadoop Steps:
# Go to project directory
$ cd WordCountCombine
# Download Data (already downloaded in data/)
$ bash download_data.sh
# Analysis (without combine)
$ cat data/44604.txt.utf-8 | python3 mapper.py | sort -k1,1 | python3 reducer.py
# Analysis (with combine)
# TBD
Without Hadoop Steps:
# Go to project directory
$ cd TextMining
# Download NLTK
$ bash download_NLTK.sh
# Analysis (we'll use the data in the last example above)
$ cat ../WordCountCombine/data/44604.txt.utf-8 | python3 mapper.py | python3 reducer.py