Consumers/Producers Migrations Strategies
Introduction
Migrating systems brings an entirely different set of challenges: doing it without downtime, keeping data consistency, and ensuring a seamless experience for customers in production.
This is often where the value of solid engineering practices becomes visible. Reliable testing builds confidence in avoiding regressions, while idempotency can be a great facilitator for a smooth migration.
In this article, I’ll cover common messaging-related migration scenarios that involve Kafka consumers and producers, along with their caveats.
Scenario 1: The Happy Path
The happy path is when a contract changes in a backward-compatible way. In this case, you don’t need to declare a new version of the contract, and the same topic can continue to be used.
Because the change is backward-compatible, existing consumers won’t break when handling the new contract, and they can be upgraded at their own pace.
Common backward-compatible changes include:
- Adding a new field: outdated consumers simply ignore it.
- Adding new values to an enum: outdated consumers ignore them or fall back to defaults.
- Turning optional fields mandatory: outdated consumers are already coded to handle the field whether it has a value or is null.
In these cases, consumers aren’t required to upgrade immediately. For example, a service might not care about the new field and can continue ignoring it.
When consumers do upgrade, reusing the same consumer group ID ensures they resume exactly where the previous consumer left off. Kafka’s consumer group metadata tracks the last consumed offset, preventing both reprocessing and message loss.
Scenario 2: Breaking Changes
Sometimes breaking changes to a contract are unavoidable. In this scenario, a new version of the contract must be introduced.
Typical breaking changes include:
- Deleting a mandatory field: outdated consumers fail to deserialize.
- Renaming a mandatory field: effectively the same as deleting the old one and adding a new one.
- Changing field types: outdated consumers deserialize incorrectly.
Producing Both Versions
A common approach is to apply the Parallel Change Pattern, making producers emit both the old and new contracts to the same topic during a migration period.
For example, a producer may publish both versions of a message, OrderCreatedV1 and OrderCreatedV2, to an order-created topic. Old consumers continue processing OrderCreatedV1 messages while consumers are incrementally upgraded to handle OrderCreatedV2 messages.
Eventually the producer switches to publishing only OrderCreatedV2 messages, once all consumers are upgraded.
When the new consumer reuses the same consumer group, old and new consumers operate as Competing Consumers.
For each pair of v1 and v2 messages, both messages share the same partition key, which Kafka ensures they are placed in the same partition and processed by the same consumer instance.
This guarantees that only the supported contract version of the assigned consumer instance is processed, so idempotency is not required for deduplication.
⚠️ Caveat: Replaying
Replaying such a topic in the future requires consumers to be able to handle all historic contract versions existing within the topic. And to do so idempotently, to avoid redundant processing of different versions of the same message.
⚠️ Caveat: Dual Writes
Read more in the Dual Writes post.
Consider this scenario:
- One version may be produced successfully while the other fails.
- Retrying it can cause the previous successfully produced message version to be produced again to the topic.
- Exactly-once guarantees may be compromised unless consumers are idempotent.
When dual writes become an issue, the alternative is a Producer Hard-switch strategy.
Extra Challenge: Introducing a New Consumer Group
If a new consumer group is introduced, the system behaves as Publisher-Subscriber. In this mode, non-idempotent consumers will redundantly process both versions of messages. If exactly-once guarantees are required, consider either a Consumer Hard-switch or a Producer Hard-switch.
Also be mindful of deployment strategies like Kubernetes Rolling Updates, which can temporarily run old and new consumers in parallel, reproducing this scenario.
Scenario 3: A New Topic Has to Be Introduced
Sometimes it isn’t possible to keep multiple contract versions in the same topic, and a new topic must be created, bringing versioning to topics.
This is necessary when:
- The Kafka Schema Registry subject naming strategy disallows multiple contracts per topic.
- Consumers are external, unclear, or not under your control (e.g., third-party or customer integrations), making it hard to ensure they can handle multiple contract versions.
- The new schema represents a fundamentally different model, and the old topic is no longer suitable.
- The partition key changes. While its technically possible to change a topic’s partition key, it will lead to messages being processed out-of-order, as previously produced messages won’t be re-partitioned. A new topic is usually recommended.
- The partition count changes. Again, this is technically possible, but unless your system can tolerate out-of-order messages, a new topic should be created.
In these cases, versioning applies to topics themselves (e.g., order-created vs order-created-v2), and upgrading Consumers require a multi-topic migration plan.
When Consumers are Idempotent
Idempotent consumers allow both topics to be consumed in parallel by their respective consumers, using an idempotency key to detect when a message was already processed, and enabling a gradual migration.
Old consumers would still process messages from the old topic. And in the meanwhile, the new consumers will be already consuming from the new source.
This approach allows for dark launches to give confidence in the transition. For instance, the new consumer may run in parallel with feature flags to disable side effects, just logging outputs to compare side-by-side old vs. new flows.
Lastly, the old topic’s consumers can be removed as part of a housekeeping effort. The steps for this typically looks like this:
- Phase out the old topic’s producer so no new messages are added to the old topic.
- Wait until the old topic dries out and all messages are consumed.
- Remove the old topic’s consumers.
When Consumers Aren’t Idempotent
Without idempotency, consumers must perform a hard switch. The challenge is ensuring the new consumer group starts processing from the equivalent offset of the old group, without reprocessing or skipping messages.
Read more in the Hard Switches post (coming soon…).
Rollbacks
Every migration plan will involve also a rollback plan in case any issues are discovered. They should be as solid, tested, and complex as the migration plans themselves.
For example, if a Scenario 3 migration involves:
- Switching producers to a new topic
- Switching consumers to the new topic
Then a rollback requires the reverse:
- Switching producers back to the old topic
- Switching consumers back to the old topic
Delivery guarantees also apply in reverse. Before rolling consumers back, you may need to wait until the new topic is drained and all messages produced to it are processed, ensuring no loss.
Conclusion
Idempotency is a powerful tool for simplifying migrations, particularly for high-available systems where downtime is a major concern.
It allows for parallel processing, safe retries, gradual migrations, and more resilient rollbacks. Without it, migrations may require hard-switches, which involves a risky choreography of producer and consumer switches, usually leading to downtime.
If you’re designing a high-available system today, particularly when stronger consistency is required, bake idempotency in from the start and your future self will thank you when migrations inevitably arrive.