java read What determines Kafka consumer offset?
Further more there's offsets.retention.minutes. If time since last commit is >
also kicks in
I am relatively new to Kafka. I have done a bit of experimenting with it, but a few things are unclear to me regarding consumer offset. From what I have understood so far, when a consumer starts, the offset it will start reading from is determined by the configuration setting
(correct me if I am wrong).
Now say for example that there are 10 messages (offsets 0 to 9) in the topic, and a consumer happened to consume 5 of them before it went down (or before I killed the consumer). Then say I restart that consumer process. My questions are:
auto.offset.resetis set to
smallest, is it always going to start consuming from offset 0 ?
auto.offset.resetis set to
largest, is it going to start consuming from offset 5 ?
Is the behaviour regarding this kind of scenario always deterministic ?
Please don't hesitate to comment if anything in my question is unclear. Thanks in advance.
It is a bit more complex than you described. The
config kicks in ONLY if your consumer group does not have a valid offset committed somewhere (2 supported offset storages now are Kafka and Zookeeper). And it also depends on what sort of consumer you use.
If you use a high-level java consumer then imagine following scenarios:
You have a consumer in a consumer group
group1that has consumed 5 messages and died. Next time you start this consumer it won't even use that
auto.offset.resetconfig and will continue from the place it died because it will just fetch the stored offset from the offset storage (Kafka or ZK as I mentioned).
You have messages in a topic (like you described) and you start a consumer in a new consumer group
group2. There is no offset stored anywhere and this time the
auto.offset.resetconfig will decide whether to start from the beginning of the topic (
smallest) or from the end of the topic (
One more thing that affects what offset value will correspond to
configs is log retention policy. Imagine you have a topic with retention configured to 1 hour. You produce 5 messages, and then an hour later you post 5 more messages. The
offset will still remain the same as in previous example but the
one won't be able to be
because Kafka will already remove these messages and thus the smallest available offset will be
Everything mentioned above is not related to
and every time you run it, it will decide where to start from using the
Just an update: From Kafka 0.9 and forth, Kafka is using a new Java version of the consumer and the auto.offset.reset parameter names have changed; From the manual:
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted):
earliest : automatically reset the offset to the earliest offset
latest : automatically reset the offset to the latest offset
none : throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.
I spent some time to find this after checking the accepted answer, so I thought it might be useful for the community to post it.