본문 바로가기
Data PipeLine/Debezium

[Debezium] Rebalancing 무한 루프 이슈

by 연습장이 2024. 3. 1.
728x90
반응형

상황

하나의 노드에 2개의 디비지움 카프카 커넥트를 띄우고 있었음

근데 각각의 디비지움 A, B가 group.id가 1, 2로 지정되었고 internal 토픽들 3개(status, config, offset)도 모두 1, 2로 각각 지정되어 있었음

group.id를 하나로 통일해줌

이 상태에서 A 디비지움 프로듀서를 내렸다가 올렸더니 아래 에러 내용이 계속 발생함

에러 내용

[2024-02-14 16:59:59,982] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:221)
[2024-02-14 16:59:59,982] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:540)
[2024-02-14 16:59:59,983] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully joined group with generation Generation{generationId=1356445, memberId='connect-1-526e6b7a-79a0-466c-9955-8dcbf8803068', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:596)
[2024-02-14 16:59:59,984] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully synced group in generation Generation{generationId=1356445, memberId='connect-1-526e6b7a-79a0-466c-9955-8dcbf8803068', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:760)
[2024-02-14 16:59:59,984] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Joined group at generation 1356445 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-47d0514f-b9e6-4176-b4e9-7d9fbbf19ef9', leaderUrl='http://xxx.xxx.xxx.xxx:0000/', offset=1291, connectorIds=[], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1699)
[2024-02-14 16:59:59,984] WARN [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1119)
[2024-02-14 16:59:59,985] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset 1232 is behind group assignment 1291, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1183)
[2024-02-14 16:59:59,988] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Finished reading to end of log and updated config snapshot, new config log offset: 1232 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1190)
[2024-02-14 16:59:59,988] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset 1232 does not match group assignment 1291. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1154)
[2024-02-14 16:59:59,988] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:221)
[2024-02-14 16:59:59,988] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:540)
[2024-02-14 16:59:59,990] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully joined group with generation Generation{generationId=1356445, memberId='connect-1-526e6b7a-79a0-466c-9955-8dcbf8803068', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:596)
[2024-02-14 16:59:59,992] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully synced group in generation Generation{generationId=1356445, memberId='connect-1-526e6b7a-79a0-466c-9955-8dcbf8803068', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:760)
[2024-02-14 16:59:59,992] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Joined group at generation 1356445 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-47d0514f-b9e6-4176-b4e9-7d9fbbf19ef9', leaderUrl='http://xxx.xxx.xxx.xxx:0000/', offset=1291, connectorIds=[], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1699)
[2024-02-14 16:59:59,992] WARN [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1119)
[2024-02-14 16:59:59,992] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset 1232 is behind group assignment 1291, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1183)
[2024-02-14 16:59:59,996] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Finished reading to end of log and updated config snapshot, new config log offset: 1232 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1190)
[2024-02-14 16:59:59,996] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset 1232 does not match group assignment 1291. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1154)

이상하다.. 싶어서 B를 내렸다가 올리니 이번엔 B에서 위와 동일하게 발생함 저 에러 내용이 계속 반복이 됨. 처음에는 리밸런싱이 일어나다보다.. 했는데 상식적으로 리밸런싱이면 뭔가 수치가 변경되어야할 것이 아닌가? 싶었음. 무엇보다 아래 문구가 기이했음

 

Current config state offset 1232 does not match group assignment 1291.

 

그러니까 그룹에 할당된건 1291(프로듀서 그룹)인데 해당 프로듀서는 1232이다라는 뜻이다. 구글링 해보니 config 토픽을 새롭게 바꿔보라고 함. 그래서 config2에서 config3으로 수정함

[2024-02-14 17:14:07,538] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:540)
[2024-02-14 17:14:07,539] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully joined group with generation Generation{generationId=1356448, memberId='connect-1-e67f2cde-48bb-40ee-8ed4-36d7c9154c0c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:596)
[2024-02-14 17:14:07,540] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully synced group in generation Generation{generationId=1356448, memberId='connect-1-e67f2cde-48bb-40ee-8ed4-36d7c9154c0c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:760)
[2024-02-14 17:14:07,541] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Joined group at generation 1356448 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-47d0514f-b9e6-4176-b4e9-7d9fbbf19ef9', leaderUrl='http://192.168.224.204:8083/', offset=1291, connectorIds=[], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1699)
[2024-02-14 17:14:07,541] WARN [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1119)
[2024-02-14 17:14:07,541] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset -1 is behind group assignment 1291, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1183)
[2024-02-14 17:14:07,543] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Finished reading to end of log and updated config snapshot, new config log offset: -1 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1190)
[2024-02-14 17:14:07,544] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset -1 does not match group assignment 1291. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1154)
[2024-02-14 17:14:07,544] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:221)
[2024-02-14 17:14:07,544] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:540)
[2024-02-14 17:14:07,545] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully joined group with generation Generation{generationId=1356448, memberId='connect-1-e67f2cde-48bb-40ee-8ed4-36d7c9154c0c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:596)
[2024-02-14 17:14:07,546] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully synced group in generation Generation{generationId=1356448, memberId='connect-1-e67f2cde-48bb-40ee-8ed4-36d7c9154c0c', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:760)
[2024-02-14 17:14:07,547] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Joined group at generation 1356448 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-47d0514f-b9e6-4176-b4e9-7d9fbbf19ef9', leaderUrl='http://192.168.224.204:8083/', offset=1291, connectorIds=[], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1699)
[2024-02-14 17:14:07,547] WARN [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1119)
[2024-02-14 17:14:07,547] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset -1 is behind group assignment 1291, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1183)
[2024-02-14 17:14:07,549] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Finished reading to end of log and updated config snapshot, new config log offset: -1 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1190)
[2024-02-14 17:14:07,549] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset -1 does not match group assignment 1291. Forcing rebalance. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1154)
[2024-02-14 17:14:07,549] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Rebalance started (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator:221)
[2024-02-14 17:14:07,549] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:540)

바꾸기 전에는 offset이 1232이었고 이번에는 -1임

여기서 깨달았음. 아... 그룹 내에서 같은 config 토픽의 오프셋을 공유해야 하는데, 서로 다른 config 토픽을 쓰다보니까 헷갈려하는구나! internal 토픽을 하나로 통일해줘야하는구나!

그래서 B 프로듀서의 내부 토픽 3개를 전부 A 프로듀서의 내부 토픽명와 일치시켜줌

[2024-02-14 17:14:51,480] INFO REST resources initialized; server is started and ready to handle requests (org.apache.kafka.connect.runtime.rest.RestServer:319)
[2024-02-14 17:14:51,480] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect:57)
[2024-02-14 17:14:54,798] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully joined group with generation Generation{generationId=1356449, memberId='connect-1-567a6326-0077-4ddc-8342-4889adfef84f', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:596)
[2024-02-14 17:14:54,801] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Successfully synced group in generation Generation{generationId=1356449, memberId='connect-1-567a6326-0077-4ddc-8342-4889adfef84f', protocol='sessioned'} (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:760)
[2024-02-14 17:14:54,803] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Joined group at generation 1356449 with protocol version 2 and got assignment: Assignment{error=0, leader='connect-1-47d0514f-b9e6-4176-b4e9-7d9fbbf19ef9', leaderUrl='http://xxx.xxx.xxx:xxxx/', offset=1291, connectorIds=[], taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1699)
[2024-02-14 17:14:54,804] WARN [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1119)
[2024-02-14 17:14:54,804] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Current config state offset -1 is behind group assignment 1291, reading to end of config log (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1183)
[2024-02-14 17:14:54,810] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Finished reading to end of log and updated config snapshot, new config log offset: 1291 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1190)
[2024-02-14 17:14:54,810] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Starting connectors and tasks using config offset 1291 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1244)
[2024-02-14 17:14:54,810] INFO [Worker clientId=connect-1, groupId=testpc-source-connect-cluster] Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1272)

이번엔 오프셋 충돌 없이 잘 따라감

이 것을 잡아주지 못하면 무슨 일이 일어나는가?

 

해당 프로듀서 그룹 내 모든 커넥터에 명령어를 날릴 수 없음. 날리면 timeout이 발생함

 

한마디로 장애이기 때문에 빠르게 조치해줘야 한다.

728x90
반응형