-
Notifications
You must be signed in to change notification settings - Fork 179
Description
Bug Report:
Actual Behavior
We have a rather huge streaming Dataframe (42.000.000 rows) which we want to send to our Azure Eventhub. The EventHub is scaled with 15 TUs.
However any run trying to send this data fails, due to throttling of EventHub. The exception that is being shown is:
StreamingQueryException: [STREAM_FAILED] Query [id = ..., runId = ...] terminated with exception: Job aborted due to stage failure: Task XX in stage 9.0 failed 4 times, most recent failure: Lost task 61.3 in stage 9.0 (TID 1963) (10.179.0.21 executor 7): com.microsoft.azure.eventhubs.ServerBusyException: The request was terminated because the entity is being throttled. Error code : 50002. Sub error : 101. Please wait 4 seconds and try again. To know more visit https://aka.ms/sbResourceMgrExceptions and https://aka.ms/ServiceBusThrottling
We tried to lower the sending rate with the following options:
- maxEventsPerTrigger (i.e. to 100)
- eventhubs.threadPoolSize (i.e. to 1)
- eventhubs.operationTimeout (i.e. to 15 minutes)
However none of these had any measureable impact on the Sending Rate to the EventHub.
Additional Info:
We stream from a DeltaTable, each version has usually ~42.000.000 added rows.
We use the AvailableNow Trigger and try to checkpoint. However the job usually fails before reaching any checkpoint.
Expected behavior
Adjusting the settings will lower/increase throughput when writing to Azure Event Hub.
Please let us know on how to configure the EventHubWriter so we are able to send large data without failing due to throttling.
Configuration
- Databricks/Spark version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)
- spark-eventhubs artifactId and version: com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.22