Yardstick Apache Spark is a set of Apache Spark benchmarks written on top of Yardstick framework.
Visit Yardstick Repository for detailed information on how to run Yardstick benchmarks and how to generate graphs.
The documentation below describes configuration parameters in addition to standard Yardstick parameters.
- Create a local clone of Yardstick Apache Spark repository
- Import Yardstick Apache Spark POM file into your project
- Run
mvn package
command
The following benchmarks are provided:
SparkSqlQueryBenchmark
- benchmark sql query operations.SparkQueryDSLBenchmark
- benchmarks query dsl operations.
All benchmarks extend SparkAbstractBenchmark
class. A new benchmark should also extend this abstract class and implement test
method. This is the method that is actually benchmarked.
Before running Apache Spark benchmarks, run mvn package
command. This command will compile the project and also will unpack scripts from yardstick-resources.zip
file to bin
directory.
Note that this section only describes configuration parameters specific to Apache Spark benchmarks, and not for Yardstick framework. To run Apache Spark benchmarks and generate graphs, you will need to run them using Yardstick framework scripts in
bin
folder.
Refer to Yardstick Documentation for common Yardstick properties and command line arguments for running Yardstick scripts.
The following benchmark properties can be defined in the benchmark configuration:
-b
or--backups
- Set storage levelMEMORY_ONLY_2
(replicate each partition on two cluster nodes). By defaultMEMORY_ONLY
.
Yardstick Apache Spark is available under Apache 2.0 Open Source license.