Super-computing-for-big-data

📖 A brief introduction to the project build evacuee model with big data techniques. Include the installation instructions for each application and result demonstration.

Background

Your documentation is complete when someone can use your module without ever

A well defined specification. This can be found in the Spec document. It is a constant work in progress; please open issues to discuss changes.
An example README. This Readme is fully standard-readme compliant, and there are more examples in the example-readmes folder.
A linter that can be used to look at errors in a given Readme. Please refer to the tracking issue.
A generator that can be used to quickly scaffold out new READMEs. See generator-standard-readme.
A compliant badge for users. See the badge.

Task 1 - local application

To learn more details about this task, see task1-report.

Compile and Run

After cloning the repository, navigate to the root directory by typing in the following command in the terminal:

cd directory_to_Lab1/lab-1-group-09

Then start the sbt container in the root folder; it should start an interactive sbt process. Here, we can compile the sources by writing the compile command.

docker run -it --rm -v "`pwd`":/root sbt sbt
sbt:Lab1 >compile

Now we are set up to run our program! Consider an integer that represents the height of the rising sea level (unit: meter).
Use the run height command to start the process, and you can get information like the image below. This way of running the spark application is mainly used for testing.

sbt:Lab1 >run 5

Next, we will introduce another way of building and running this Spark application, enabling developers to inspect the event log on the spark history server.

By using the spark-submit command, we set the application to run on a local Spark "cluster." Since we have already built the JAR, all we need to do is to run the code below:

docker run -it --rm -v "`pwd`":/io -v "`pwd`"/spark-events:/spark-events spark-submit --packages 'com.uber:h3:3.7.0' target/scala-2.12/lab-1_2.12-1.0.jar height

Results

Task 2 - local application

To learn more details about this task, see task2-report.

Compile and Run

The application is executed via Planet.jar , a fat JAR file packaged by the sbt-assembly plugin. First log in to your AWS account.

Then it will come to the AWS management console, from which customers can get access to a variety of services that AWS provides. The one we use to run our Spark application is Elastic MapReduce(EMR). Type EMR in the search bar and click the link to the first result. Here, we can run our application in Spark cluster mode.

To create a cluster, the types and numbers of node instances we used to run our application on the Planet data set is listed below:

Node type	Instance Type	Number of instances
Master node	c5.2xlarge	1
Core node	c5.24xlarge	4

The final step is to add a step to the cluster. Choose Spark application as "Step type". Copy and paste the following configures into the "Spark-submit options" part. Select "Application location" from the S3 bucket and add an integer in "Arguments" to represent the rising sea level. Click "Add" and it is all set!

--conf "spark.sql.autoBroadcastJoinThreshold=-1"  
--conf "spark.sql.broadcastTimeout=36000" 
--conf "spark.yarn.am.waitTime=36000"
--conf "spark.yarn.maxAppAttempts=1"

Results

Task 3 - local application

To learn more details about this task, see task3-report.

Compile and Run

The following library dependencies should be added to transformer/build.sbt:

libraryDependencies ++= Seq(
    "io.circe" %% "circe-core" % "0.14.1",
    "io.circe" %% "circe-generic" % "0.14.1",
    "io.circe" %% "circe-parser" % "0.14.1" ,
)

Open a terminal 1, git clone the project repo to /home, run docker-compose up in the project root directory. After all the servers be started up, open another terminal and run docker-compose exec transformer sbt under root directory. to open interactive sbt terminal. When the producer starts to produce steady input stream in "events" topic like following:

events_1            | 1810821	{"timestamp":1634500171928,"city_id":1810821,"city_name":"Fuzhou","refugees":2953}
events_1            | 5506956	{"timestamp":1634500172053,"city_id":5506956,"city_name":"Las Vegas","refugees":99498}
events_1            | 1793505	{"timestamp":1634500172548,"city_id":1793505,"city_name":"Taizhou","refugees":12371}
events_1            | 1627896	{"timestamp":1634500172679,"city_id":1627896,"city_name":"Semarang","refugees":114243}
events_1            | 1258662	{"timestamp":1634500173010,"city_id":1258662,"city_name":"Rāmgundam","refugees":0}

compile the code:

sbt:Transformer> compile

run the transformer by passing command line argument: window size N (seconds). Here the window size is set to 2 seconds. Note that the input argument should be a positive integer, otherwise it will be rejected by the type-check procedure and the kafka context will be shut down

sbt:Transformer> run 2

The transformer will start to transform the input stream and write to output topic "updates".

Results

Related Efforts

Art of Readme - 💌 Learn the art of writing quality READMEs.
open-source-template - A README template to encourage open-source contributions.

Contributors

Feel free to dive in! Open an issue or submit PRs.

Standard Readme follows the Contributor Covenant Code of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
AWS-application/lab-2-group-09		AWS-application/lab-2-group-09
Kafka-Streaming-application/lab-3-group-09		Kafka-Streaming-application/lab-3-group-09
local-application/lab-1-group09		local-application/lab-1-group09
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Super-computing-for-big-data

Table of Contents

Background

Task 1 - local application

Compile and Run

Results

Task 2 - local application

Compile and Run

Results

Task 3 - local application

Compile and Run

Results

Related Efforts

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

linhanzhang/Super-computing-for-big-data

Folders and files

Latest commit

History

Repository files navigation

Super-computing-for-big-data

Table of Contents

Background

Task 1 - local application

Compile and Run

Results

Task 2 - local application

Compile and Run

Results

Task 3 - local application

Compile and Run

Results

Related Efforts

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages