What may be causing and how to work around StreamingQueryException: Gave up after 3 retries while fetching MetaData ?

Spark 3.1.1, running in AWS EMR 6.3.0, python 3.7.2

I'm getting the following error:
```
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/streaming.py", line 101, in awaitTermination
  File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
pyspark.sql.utils.StreamingQueryException: Gave up after 3 retries while fetching MetaData, last exception: 
=== Streaming Query ===
Identifier: [id = e825addf-9c21-4e9d-a05b-581ae8911f29, runId = e2ea753f-d2dc-42ea-bec2-17a516faadf7]
Current Committed Offsets: {KinesisSource[events-prod]: {"shardId-000000000035":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000041":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000044":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000038":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000032":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000043":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"metadata":{"streamName":"events-prod","batchId":"0"},"shardId-000000000031":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000034":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000040":{"iteratorType":"AT_TIMESTAMP","iteratorPosition":"1647283749833"},"shardId-000000000037":
.................................................................
```

I have tried to increase the max num retries and the retry interval, e.g.:

```
MAX_NUM_RETRIES = 10  # default is 3
RETRY_INTERVAL_MS = 3000  # default is 1000
MAX_RETRY_INTERVAL_MS = 30000  # default is 10000

spark.readStream.format("kinesis")
        .option("streamName", pctx.stream_name)
        .option("endpointUrl", pctx.endpoint_url)
        .option("region", pctx.region_name)
        .option("checkpointLocation", pctx.checkpoint_path)
        .option("startingposition", "LATEST")
        .option("kinesis.client.numRetries", MAX_NUM_RETRIES)
        .option("kinesis.client.retryIntervalMs", RETRY_INTERVAL_MS)
        .option("kinesis.client.maxRetryIntervalMs", MAX_RETRY_INTERVAL_MS)
        .load()
```

but it seems the code keeps holding onto the default value of 3 retries.

Any ideas, anyone?
- What may be causing this issue
- How to work around it.  Might it be good to set `failondataloss=false`, or is that a bad idea.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What may be causing and how to work around StreamingQueryException: Gave up after 3 retries while fetching MetaData ? #110

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

What may be causing and how to work around StreamingQueryException: Gave up after 3 retries while fetching MetaData ? #110

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions