-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark Client StorageID support #8918
base: master
Are you sure you want to change the base?
Conversation
if (storageConfigList.isEmpty || storageConfigList.size() == 1) { | ||
cfg.getStorageConfig | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of a non empty list, we still need to pick the configuration from the list - right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such case, it'll be returned in the non-list part of the response.
Same handling as in the other clients.
@@ -168,7 +168,7 @@ class ApiClient private (conf: APIConfigurations) { | |||
val storageNamespace = key.storageClientType match { | |||
case StorageClientType.HadoopFS => | |||
ApiClient | |||
.translateURI(URI.create(repo.getStorageNamespace), getBlockstoreType) | |||
.translateURI(URI.create(repo.getStorageNamespace), getBlockstoreType(repo.getStorageId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storage namespace information is cached in this client.
The blockstore type is not cached and it means that we call the server multiple times.
Suggest to add this information to the cache or cache the configucation at the same level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding a getRepo()
call to the server, but it's not here, and only for the GC.
What cache are you suggesting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was looking at the namespace cache and missed the key to namespace cache - can ignore my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks like a worthwhile change! Surprised its scope is so low. Let's add how we tested this.
@@ -42,7 +42,7 @@ buildInfoPackage := "io.treeverse.clients" | |||
enablePlugins(S3Plugin, BuildInfoPlugin) | |||
|
|||
libraryDependencies ++= Seq( | |||
"io.lakefs" % "sdk" % "1.0.0", | |||
"io.lakefs" % "sdk" % "1.53.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have this as a separate PR! It's not that I'm afraid, the lakeFS compatibility guarantees means this is a perfectly safe change. But I'd still rather be careful because I'm actually scared by this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd don't mind separating this, but since the CI tests pass here, I'm curious -
What difference will it make?
If merging the lib bump and right after it the MSB changes -
How will it give more confidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we squash merges, 2 PRs give a more detailed commit history. That makes it easier to git bisect
and/or roll back some changes. But not critical.
cfg.getStorageConfig | ||
val storageConfigList = cfg.getStorageConfigList | ||
if (storageConfigList.isEmpty || storageConfigList.size() == 1) { | ||
cfg.getStorageConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could still check storageID when the list has size 1 and specifies a storage ID, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm complying with the other clients (WebUI, lakectl, Everest) -
Were doing the same check.
It's about making the distinction, and using the list only for actual MSB cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arielshaqed and @nopcoder for your review.
As written, still on to testing this 😐
Will update afterwards.
Meanwhile, addressed your comments.
@@ -42,7 +42,7 @@ buildInfoPackage := "io.treeverse.clients" | |||
enablePlugins(S3Plugin, BuildInfoPlugin) | |||
|
|||
libraryDependencies ++= Seq( | |||
"io.lakefs" % "sdk" % "1.0.0", | |||
"io.lakefs" % "sdk" % "1.53.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd don't mind separating this, but since the CI tests pass here, I'm curious -
What difference will it make?
If merging the lib bump and right after it the MSB changes -
How will it give more confidence?
@@ -168,7 +168,7 @@ class ApiClient private (conf: APIConfigurations) { | |||
val storageNamespace = key.storageClientType match { | |||
case StorageClientType.HadoopFS => | |||
ApiClient | |||
.translateURI(URI.create(repo.getStorageNamespace), getBlockstoreType) | |||
.translateURI(URI.create(repo.getStorageNamespace), getBlockstoreType(repo.getStorageId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding a getRepo()
call to the server, but it's not here, and only for the GC.
What cache are you suggesting here?
cfg.getStorageConfig | ||
val storageConfigList = cfg.getStorageConfigList | ||
if (storageConfigList.isEmpty || storageConfigList.size() == 1) { | ||
cfg.getStorageConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm complying with the other clients (WebUI, lakectl, Everest) -
Were doing the same check.
It's about making the distinction, and using the list only for actual MSB cases.
if (storageConfigList.isEmpty || storageConfigList.size() == 1) { | ||
cfg.getStorageConfig | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such case, it'll be returned in the non-list part of the response.
Same handling as in the other clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing real to add, and only comment is not critical. StillLGTM.
@@ -42,7 +42,7 @@ buildInfoPackage := "io.treeverse.clients" | |||
enablePlugins(S3Plugin, BuildInfoPlugin) | |||
|
|||
libraryDependencies ++= Seq( | |||
"io.lakefs" % "sdk" % "1.0.0", | |||
"io.lakefs" % "sdk" % "1.53.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we squash merges, 2 PRs give a more detailed commit history. That makes it easier to git bisect
and/or roll back some changes. But not critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Supporting Spark Client usage for lakeFS servers with multiple blockstores.
Also includes GC support.
Required updating lakeFS to the latest version.
Still working on testing it, but this can be reviewed.