-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 3.5: Fix NotSerializableException when migrating Spark tables #11157
base: main
Are you sure you want to change the base?
Conversation
999c1ba
to
e4aeb31
Compare
e4aeb31
to
005114b
Compare
@nastra could you help take a look? |
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
Show resolved
Hide resolved
@manuzhang Can you summerize the usage of ExecutorService on the Spark Executors? It looks like the current fix involves making a new Executor service per task and i'm not sure that's what we want to do. I wonder if it makes more sense to pass in a Supplier so we don't have to implement a wrapper class. But before we do that I want to make sure we are using the ExecutorService for the right reasons in the code path. |
|
I meant specifically, are we using this in listPartitions or in the FileIndex implementation? The title of this issue says this is an issue for "partitioned tables" so why does it work in that case but not in this case? Is it because the listPartitions code is using the executor service or what? |
Yes, it's used in |
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java
Show resolved
Hide resolved
After thinking about this for a while, I think you are probably right that we need to build a specific LazyExecutorService like you did originally. I'm sorry I lend you on a goose chase here, Let's make sure it is Spark specific and doesn't touch any of the other implementations. |
@RussellSpitzer I can revert to previous commit and this is Spark specific, but can you elaborate on why I submitted #11417 to add warning in the doc since this PR can't get into 1.7.0 |
Main reason was that the API that is specified in the API Module allows withExecutorService(executor service) so unless we want to break that api (which we could) we need to stick with just passing through an executor service. We could alternatively just change the API if you think that's warrrented. If we did that, I'd probably remove "executor service" all together |
ac414d8
to
005114b
Compare
@RussellSpitzer I've reverted to lazy executor service. Please check again. Thanks. |
This fixes #11147
cc @nastra @aokolnychyi @zzeekk