Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for Spark #105

Open
bkmgit opened this issue May 3, 2021 · 2 comments
Open

Tests for Spark #105

bkmgit opened this issue May 3, 2021 · 2 comments

Comments

@bkmgit
Copy link

bkmgit commented May 3, 2021

Maybe helpful to add tests for Spark using R, Java and Scala interfaces.

@boegel
Copy link
Contributor

boegel commented May 4, 2021

@bkmgit Any suggestions for tests to use?

@bkmgit
Copy link
Author

bkmgit commented May 4, 2021

The default setup seems to install Spark in standalone mode. Spark has a job submission script format, however if using more than one node on a cluster with a job scheduler, then one first needs to obtain the resources from the job scheduler and then give information about the provisioned nodes to Spark.

Sparkhpc makes it easy to use Pyspark, in particular integrating an HPC scheduler.
Example scripts can be found in the spark on hpc repository.

Some tutorial notes on using installing Spark with Easybuild and then using that installation on a single node.

Spark does include a number of tests using sbt, maven and Python. Each major Spark release uses one version of Scala, so it may be helpful to install this along with Java 8.

trz42 added a commit to trz42/software-layer that referenced this issue Apr 28, 2023
…erl-OpenMPI-GCC-10.3.0

Adding CMake/3.20.1, Perl/5.32.1 and OpenMPI/4.1.1 with GCC/10.3.0 to NESSI/2023.04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants