-
Notifications
You must be signed in to change notification settings - Fork 5
Implement event watcher TLS and auth token generation form custom binary #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
scripts/submitArmadaSpark.sh
Outdated
| AUTH_ARG=" --conf spark.armada.auth.token=$ARMADA_AUTH_TOKEN" | ||
| else | ||
| AUTH_ARG="" | ||
| AUTH_ARGS+=("--conf" "spark.armada.auth.token=$ARMADA_AUTH_TOKEN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to have the token on the command args? Should we document that this is discouraged and signin binaries should be used for secure setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved documentation to warn users
| --conf spark.armada.container.image=$IMAGE_NAME | ||
| --conf spark.armada.queue=$ARMADA_QUEUE | ||
| --conf spark.armada.lookouturl=${ARMADA_LOOKOUT_URL:-http://localhost:30000} | ||
| --conf spark.kubernetes.file.upload.path=/tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want /tmp to be configurable?
| --deploy-mode $DEPLOY_MODE | ||
| --name $NAME | ||
| $CLASS_PROMPT $CLASS_ARG | ||
| --conf spark.home=/opt/spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/opt/spark should be configurable, same below for /opt/spark/bin/spark-submit. Can be done in a separate PR, though. SPARK_HOME should be used to find Spark binaris, which might fallback to /opt/spark like ${SPARK_HOME:-/opt/spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will implement in a separate PR, tracking in G-Research/spark#165
| val ARMADA_AUTH_SIGNIN_ARGS: OptionalConfigEntry[String] = | ||
| ConfigBuilder("spark.armada.auth.signin.args") | ||
| .doc( | ||
| "Space-separated arguments to pass to the signin binary when obtaining an authentication token." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can arguments be quoted? Do we need escaping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the parsing logic and docs
795f577 to
3e698f3
Compare
|
This approach seems a little to specific to the signin binary and our internal resources. armada-spark is an open source project after all. What if we just searched for an optional script called "/opt/spark/extraFiles/getAuth.sh"? If it exists we run it and get the token. If not, we assume no auth is required. |
@GeorgeJahad Sounds reasonable to me. After all, not everyone will have a similar binary and might need extra steps to obtain the token. |
Ok then please move forward with that. Also, @EnricoMi made a good point about: "spark.armada.auth.token=$ARMADA_AUTH_TOKEN" What do you think of removing "spark.armada.auth.token" entirely? If the getAuth.sh script exists, it creates the token when needed. |
Agree, and I am just implementing that |
Signed-off-by: Sudipto Baral <[email protected]> # Conflicts: # src/main/scala/org/apache/spark/deploy/armada/Config.scala
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
a2b85e1 to
f13e088
Compare
…d of configurable options Signed-off-by: Sudipto Baral <[email protected]>
f13e088 to
919c378
Compare
|
I'd prefer a config option over a script in a magic place. All logic should be configurable in one place, not a mixture of config and filesystem. But I agree, it should be a single option pointing to a script without any arguments. |
Signed-off-by: Sudipto Baral <[email protected]>
| They can be set in the [conf](../conf/spark-defaults.conf) file. | ||
|
|
||
| - `spark.armada.auth.token` - Armada auth token, (specific to the OIDC server being used.) | ||
| - `ARMADA_AUTH_TOKEN` - Environment variable for Armada auth token (for initial submission). For further authentication between driver and executors, configure `spark.armada.auth.script.path` to point to an authentication script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more precise to say:
Environment variable for Armada auth token (for cases when authentication script is unavailable). Otherwise configure `spark.armada.auth.script.path` to point to an authentication script.
scripts/config.sh.example
Outdated
| @@ -0,0 +1,47 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I want to include this script in this PR. We will eventually have to do a better job of documenting things, but I don't want to do it here.
| conf.flatMap(_.get(Config.ARMADA_AUTH_SCRIPT_PATH)) match { | ||
| case Some(scriptPath) => | ||
| val authScript = new java.io.File(scriptPath) | ||
| if (authScript.exists() && authScript.canExecute) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should throw an error if the script is not executable.
| val authScript = new java.io.File(scriptPath) | ||
| if (authScript.exists() && authScript.canExecute) { | ||
| try { | ||
| val token = Seq("sh", authScript.getAbsolutePath).!!.trim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the script is executable, do you need the "sh" arg?
| if (authScript.exists() && authScript.canExecute) { | ||
| try { | ||
| val token = Seq("sh", authScript.getAbsolutePath).!!.trim | ||
| if (token.nonEmpty) Some(token) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would throw an error if the token is empty
| } | ||
|
|
||
| test("getAuthToken returns None when script path config not set") { | ||
| val token = org.apache.spark.deploy.armada.submit.ArmadaUtils.getAuthToken(Some(sparkConf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe first that the config is not set, just to be sure.
|
@GeorgeJahad In this implementation
I did not pass Scenario 1: If we pass it, the driver pod would always see Scenario 2: If we don't set So I chose this design so that Any thoughts on a different approach? |
So the problem is that our internal environment is a bit weird because the signin exec doesn't work when the docker image is running on the devpod, (so we have to use the env var there,) but we don't want to pass that env var to the driver, (because that exposes it to the lookout ui.) So we want the driver to use in the signin executable. And we don't want to have to explain that wierdness to our non-internal users. So how about this: Let's remove all reference to the ARMADA_AUTH_TOKEN from the scala code, and move it into the script. So our internal script will check the env var first, and then call signin if it doesn't exist. But from the perspective of the scala code, everyone has to use the "spark.armada.auth.script.path" to get the token. What do you think? |
| val token = Seq("sh", authScript.getAbsolutePath).!!.trim | ||
| if (token.nonEmpty) Some(token) else None | ||
| } catch { | ||
| case e: Exception => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would throw an error if there is an exception, or if the script returns a non 0 exit code
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
Signed-off-by: Sudipto Baral <[email protected]>
|
@GeorgeJahad Thanks for the review. I updated the PR addressing the comments. Let me know how it looks. |
| echo ' PYTHON_SCRIPT=/opt/spark/examples/src/main/python/pi.py' | ||
| echo ' SCALA_CLASS=org.apache.spark.examples.SparkPi' | ||
| echo ' CLASS_PATH=local:///opt/spark/extraFiles/spark-examples_2.12-3.5.3.jar' | ||
| echo ' # Auth: Set ARMADA_AUTH_SCRIPT_PATH for authentication' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would remove this line, cause it can't really be set in config.sh, can it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can configure it in config.sh,
armada-spark/scripts/submitArmadaSpark.sh
Lines 125 to 128 in a035a90
| # Add auth script path if configured | |
| if [ "$ARMADA_AUTH_SCRIPT_PATH" != "" ]; then | |
| SPARK_SUBMIT_ARGS+=("--conf" "spark.armada.auth.script.path=$ARMADA_AUTH_SCRIPT_PATH") | |
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, ok, thanks!
GeorgeJahad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, (i found one last nit in init.sh)
fixes G-Research/spark#163