-
Notifications
You must be signed in to change notification settings - Fork 886
Description
I have a stream that is getting about 50 messages per second per shard being processed fast enough that benthos is making more than 5 GetRecords calls per second (the AWS per shard rate limit). Despite what the Amazon documentation says, the aws go sdk's GetRecords function does not return a ProvisionedThroughputExceededException error (or any error) when the API rate limit is hit; it just returns 0 records, so this issue is not obvious from the client side but can be seen in the Cloudwatch metrics.
I have tried various permutations of batch periods and sleep durations in the aws_kinesis batch policy, but can't get it to work consistently to reduce the rate limit hit while maintaining throughput.
I believe the aws_kinesis input already does a back off when GetRecords receives zero messages, but in this scenario, the rate limit will already have been hit.
One potential way to handle this is to add a configurable delay before the next GetRecords call, defaulting to no delay as this is not always necessary. Typically, adding a 200 - 250 ms delay should ensure that less than 5 GetRecords API calls are made per second for a shard, assuming we are the only consumer. This would increase latency a little bit but should not impact throughput.