테키테크 TEKITECH
Ansible로 주키퍼와 카프카를 설치하면서 헤맸던 기록 본문
1. 첫 번째 오류
앤시블 스크립트는 책<실전 카프카 개발부터 운영까지>에서 제공한 걸 사용했다. 주키퍼를 설치할 서버를 모두 구축하고, rsa 키로 ssh 연결이 잘 되는 걸 확인했는데 아래와 같이 UNREACHABLE! 오류가 생겼다.
[yt.lim@teki-ansible01 ansible_playbook]$ ansible-playbook -i hosts zookeeper.yml
PLAY [zkhosts] ********************************************************************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ************************************************************************************************************************************************************************************************************************************************************
The authenticity of host 'teki-zk01.foo.bar (10.128.0.19)' can't be established.
ECDSA key fingerprint is SHA256:FxmUkVb6V+MZAe3ejfuY3Ct8nKkh14NdGwDSJdSUQNU.
ECDSA key fingerprint is MD5:a5:eb:fa:ba:9f:de:ee:2a:c8:5e:c2:b8:55:48:25:b3.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'teki-zk02.foo.bar (10.128.0.20)' can't be established.
ECDSA key fingerprint is SHA256:vqByGQ5aS473lgjxxTredOfvKREh1mNLxmDrDRsdKeE.
ECDSA key fingerprint is MD5:c8:b3:27:ce:bc:e4:72:8f:61:df:a8:e3:6a:21:0a:d3.
Are you sure you want to continue connecting (yes/no)? The authenticity of host 'teki-zk03.foo.bar (10.128.0.21)' can't be established.
ECDSA key fingerprint is SHA256:9aayJTRBt0DKjcrLoXY9ST7t4kxPdL9oPQuP5Za43qM.
ECDSA key fingerprint is MD5:a4:ca:81:86:26:1a:2a:f1:aa:3d:46:69:d1:79:f5:58.
Are you sure you want to continue connecting (yes/no)? yes
Please type 'yes' or 'no': yes
Please type 'yes' or 'no': yes
오류 메시지를 보면 "Failed to conect to the host via ssh: Host key verification failed."라고 설명한다. host 등록, ssh 키 등록 등 다 해주었고, 테스트까지 했기 때문에 문제가 뭔지 알 수 없었다. 출력 결과를 다시 들여다보니 실행 직후 메시지가 보였다. 서버들이 모두 known hosts에 등록이 안되어있어서 yes라고 입력주어야 했는데, 서버 3개가 동시에 요청을 해서 입력 값이 제대로 들어가지 않은 것 같았다. 단순하게 그냥 두 번, 세 번 시도하면서 다 등록해주니 전부 ok 사인이 났다. 이제 다 해결한 줄 알았다.
2. 두 번째 오류
잘 진행하다가 갑자기 주키퍼 서버 호스트를 찾을 수 없다는 오류가 떴다.
fatal: [teki-zk03.foo.bar]: FAILED! => {"changed": false, "msg": "Could not find the requested service zookeeper-server: host"}
데몬 reload를 해봤는데 해결되진 않고, 오류 메시지에 문구가 하나 추가됐다. 로그를 보려면 systemctl status zookeeper-server.service와 journalctl -xe를 확인하라고 한다.
[yt.lim@teki-ansible01 ansible_playbook]$ systemctl daemon-reload
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: root
Password:
==== AUTHENTICATION COMPLETE ===
[yt.lim@teki-ansible01 ansible_playbook]$
[yt.lim@teki-ansible01 ansible_playbook]$
[yt.lim@teki-ansible01 ansible_playbook]$ ansible-playbook -i hosts zookeeper.yml
...
TASK [zookeeper : make sure a service is running] ***************************************************************************************************************************************************************************************************************************************************
fatal: [teki-zk03.foo.bar]: FAILED! => {"changed": false, "msg": "Unable to start service zookeeper-server: Job for zookeeper-server.service failed because the control process exited with error code. See \"systemctl status zookeeper-server.service\" and \"journalctl -xe\" for details.\n"}
fatal: [teki-zk01.foo.bar]: FAILED! => {"changed": false, "msg": "Unable to start service zookeeper-server: Job for zookeeper-server.service failed. See \"systemctl status zookeeper-server.service\" and \"journalctl -xe\" for details.\n"}
fatal: [teki-zk02.foo.bar]: FAILED! => {"changed": false, "msg": "Unable to start service zookeeper-server: Job for zookeeper-server.service failed. See \"systemctl status zookeeper-server.service\" and \"journalctl -xe\" for details.\n"}
PLAY RECAP ******************************************************************************************************************************************************************************************************************************************************************************************
teki-zk01.foo.bar : ok=17 changed=4 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
teki-zk02.foo.bar : ok=17 changed=4 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
teki-zk03.foo.bar : ok=17 changed=4 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
메시지에서 하란대로 로그를 봤더니 pam 인증이 실패했다며 '암호 서비스로 사전 확인 실패'라고 한다. 혹시나 하고 앤시블 스크립트 폴더를 둘러보니 도메인과 호스트 이름이 잘못 설정되어 있었다. 책에서는 peter-zk01 식으로 서버 이름을 설정했는데, 이걸 teki-zk01 와 같이 바꾸었기 때문에 안맞았던 것이다. 호스트 파일과 yml 파일 몇 개 수정하면서 누락된 것들이 있어서 문제가 생겼던 것이다. grep으로 꼼꼼하게 찾아서 다 바꿔주었다.
3월 04 07:20:00 teki-ansible01 su[19839]: (to root) yt.lim on pts/0
3월 04 07:20:00 teki-ansible01 su[19839]: pam_unix(su:session): session opened for user root by yt.lim(uid=0)
3월 04 07:20:09 teki-ansible01 passwd[19851]: pam_pwquality(passwd:chauthtok): pam_get_authtok_verify returned error: 암호 서비스로 사전 확인 실패
3월 04 07:20:13 teki-ansible01 passwd[19851]: pam_unix(passwd:chauthtok): password changed for root
3월 04 07:20:23 teki-ansible01 passwd[19855]: pam_unix(passwd:chauthtok): password changed for root
이번엔 진짜 성공! 이렇게 주키퍼 설치는 끝냈다. 카프카 설치도 똑같은 과정이니까 제발 아무 문제 없이 끝나기를 바랬다.
출력 결과 전체
[yt.lim@teki-ansible01 ansible_playbook]$ ansible-playbook -i hosts zookeeper.yml
PLAY [zkhosts] **************************************************************************************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk01.foo.bar]
ok: [teki-zk02.foo.bar]
ok: [teki-zk03.foo.bar]
TASK [common : Set timezone to Asia/Seoul] **********************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk03.foo.bar]
ok: [teki-zk01.foo.bar]
ok: [teki-zk02.foo.bar]
TASK [common : install Java and tools] **************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk03.foo.bar]
ok: [teki-zk02.foo.bar]
ok: [teki-zk01.foo.bar]
TASK [common : copy krb5 conf] **********************************************************************************************************************************************************************************************************************************************************************
changed: [teki-zk01.foo.bar]
changed: [teki-zk02.foo.bar]
changed: [teki-zk03.foo.bar]
TASK [add the group zookeeper] **********************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk03.foo.bar]
ok: [teki-zk02.foo.bar]
ok: [teki-zk01.foo.bar]
TASK [add the user zookeeper] ***********************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk03.foo.bar]
ok: [teki-zk02.foo.bar]
ok: [teki-zk01.foo.bar]
TASK [stop zookeeper-server] ************************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk03.foo.bar]
ok: [teki-zk02.foo.bar]
ok: [teki-zk01.foo.bar]
TASK [zookeeper : remove directory zk] **************************************************************************************************************************************************************************************************************************************************************
changed: [teki-zk03.foo.bar]
changed: [teki-zk01.foo.bar]
changed: [teki-zk02.foo.bar]
TASK [make dir zookeeper] ***************************************************************************************************************************************************************************************************************************************************************************
changed: [teki-zk01.foo.bar]
changed: [teki-zk02.foo.bar]
changed: [teki-zk03.foo.bar]
TASK [download zookeeper from web] ******************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk03.foo.bar]
ok: [teki-zk01.foo.bar]
ok: [teki-zk02.foo.bar]
TASK [unarchive zookeeper] **************************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk03.foo.bar]
ok: [teki-zk01.foo.bar]
ok: [teki-zk02.foo.bar]
TASK [setup link zookeeper] *************************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk01.foo.bar]
ok: [teki-zk02.foo.bar]
ok: [teki-zk03.foo.bar]
TASK [copy zookeeper server conf files] *************************************************************************************************************************************************************************************************************************************************************
changed: [teki-zk02.foo.bar]
changed: [teki-zk01.foo.bar]
changed: [teki-zk03.foo.bar]
TASK [zookeeper : create myid] **********************************************************************************************************************************************************************************************************************************************************************
changed: [teki-zk03.foo.bar]
changed: [teki-zk01.foo.bar]
changed: [teki-zk02.foo.bar]
TASK [zookeeper : change file ownership, group and permissions] *************************************************************************************************************************************************************************************************************************************
changed: [teki-zk01.foo.bar]
changed: [teki-zk02.foo.bar]
changed: [teki-zk03.foo.bar]
TASK [copy zookeeper server in systemd] *************************************************************************************************************************************************************************************************************************************************************
ok: [teki-zk01.foo.bar]
ok: [teki-zk02.foo.bar]
ok: [teki-zk03.foo.bar]
TASK [zookeeper : just force systemd to reload configs] *********************************************************************************************************************************************************************************************************************************************
ok: [teki-zk01.foo.bar]
ok: [teki-zk03.foo.bar]
ok: [teki-zk02.foo.bar]
TASK [zookeeper : make sure a service is running] ***************************************************************************************************************************************************************************************************************************************************
changed: [teki-zk01.foo.bar]
changed: [teki-zk02.foo.bar]
changed: [teki-zk03.foo.bar]
PLAY RECAP ******************************************************************************************************************************************************************************************************************************************************************************************
teki-zk01.foo.bar : ok=18 changed=7 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
teki-zk02.foo.bar : ok=18 changed=7 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
teki-zk03.foo.bar : ok=18 changed=7 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
3. 끝난 줄 알았지만 세 번째 오류
앞에서 한 번 삽질하고 나니 카프카 때는 별 탈 없이 시원하게 설치를 시작했다. 근데 갑자기 중간에서 또 호스트를 찾을 수 없다며 오류를 띄웠다. 심지어 찜찜하게 failed가 아니라 ignored로 처리되었다. 혹시나 하고 카프카 서버에 가서 확인해보니 설치가 안되어있었다.
# 오류
TASK [stop kafka-server] ****************************************************************************************************************************************************************************************************************************************************************************
fatal: [teki-kafka02.foo.bar]: FAILED! => {"changed": false, "msg": "Could not find the requested service kafka-server: host"}
...ignoring
fatal: [teki-kafka03.foo.bar]: FAILED! => {"changed": false, "msg": "Could not find the requested service kafka-server: host"}
...ignoring
fatal: [teki-kafka01.foo.bar]: FAILED! => {"changed": false, "msg": "Could not find the requested service kafka-server: host"}
...ignoring
# 결과
PLAY RECAP ******************************************************************************************************************************************************************************************************************************************************************************************
teki-kafka01.foo.bar : ok=15 changed=10 unreachable=0 failed=0 skipped=0 rescued=0 ignored=1
teki-kafka02.foo.bar : ok=15 changed=10 unreachable=0 failed=0 skipped=0 rescued=0 ignored=1
teki-kafka03.foo.bar : ok=15 changed=10 unreachable=0 failed=0 skipped=0 rescued=0 ignored=1
구글링해보니 스택오버플로우에서 데몬 reload를 하니까 된다고 했다. 그러고나서 다시 시도해보니까 해결됐다. 뭔가 앞에서 설정을 바꾸고, 또 주키퍼를 설치하면서 설정이 꼬였던 것 같다.
[yt.lim@teki-ansible01 ansible_playbook]$ systemctl daemon-reload
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: root
Password:
==== AUTHENTICATION COMPLETE ===
앞으로는 귀찮아도 꼭 다 확인해서 두 번 일하지 말자 ㅠㅠ
'Tech > Ops' 카테고리의 다른 글
우분투에 docker와 docker-compose 설치하기 (Ubuntu 20.04 ver.) (0) | 2023.04.04 |
---|---|
Dockerfile로 도커 올리기 (0) | 2021.11.10 |
Ubuntu에서 PPA로 특정 파이썬 버전을 설치하고, 파이썬 버전을 바꾸는 방법 (0) | 2021.09.22 |
[Docker] Dockerfile 만들기 - 1. 도커파일 문법과 명령어(instructions) (0) | 2021.09.10 |
[Docker] 도커 이미지 삭제와 오류 / 컨테이너 확인하고 컨테이너 삭제하기 (2) | 2021.09.10 |