๐ŸŒฑ Infra/Hadoop_HDFS

Hadoop HDFS(3.3)+Spark(3.1.1)! ๋ฌด์ž‘์ • ๋”ฐ๋ผํ•˜๊ธฐ #2

mini_world 2021. 4. 25. 20:33
๋ชฉ์ฐจ ์ ‘๊ธฐ

 

์ด ํฌ์ŠคํŒ…์€ ์ด์ „ ํฌ์ŠคํŒ…๊ณผ ์ด์–ด์ง‘๋‹ˆ๋‹ค. ๐Ÿ˜˜

Hadoop HDFS(3.3)+Spark(3.1.1)! ๋ฌด์ž‘์ • ๋”ฐ๋ผํ•˜๊ธฐ #1

 

Hadoop HDFS(3.3)+Spark(3.1.1)! ๋ฌด์ž‘์ • ๋”ฐ๋ผํ•˜๊ธฐ #1

์•ˆ๋…•ํ•˜์„ธ์š” ๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜! ์ €๋ฒˆ ํฌ์ŠคํŒ…์—๋Š” ํ•˜๋‘ก HDFS ์˜ˆ์ „ ๋ฒ„์ „ (2.0)์„ ์„ค์น˜ํ–ˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—๋Š” ํ•˜๋‘กHDFS ์ตœ์‹ ๋ฒ„์ „์ธ 3.3๋ฅผ ์„ค์น˜ํ•˜๊ณ , ๊ทธ ์œ„์— Spark๋„ ํ•จ๊ป˜ ์„ค์น˜ํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. HDFS 3.3๋ฒ„

1mini2.tistory.com


 

์ด์ „ ํฌ์ŠคํŒ…์—์„œ EC2 ํ•œ๋Œ€๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๊ทธ ์ธ์Šคํ„ด์Šค์— ํ•„์š”ํ•œ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๋ชจ๋‘ ์„ค์น˜ํ•˜๊ณ , ํ™˜๊ฒฝ๋ณ€์ˆ˜์™€ ์„ค์ •ํŒŒ์ผ์„ ์ˆ˜์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ๊ทธ ์ธ์Šคํ„ด์Šค๋ฅผ AMI์ด๋ฏธ์ง€๋กœ ๋งŒ๋“  ํ›„, ๋ณต์ œํ•˜์—ฌ ์ด 4๋Œ€์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ๋งŒ๋“ค์—ˆ์ฃ !

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ด์ œ ๊ฐ ์—ญํ• ์— ๋งž์ถฐ Master/ Worker์— ์„ค์ •์„ ํ•ด์ฃผ๊ณ , ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‹คํ–‰ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค!!

 

1. Master ์„œ๋ฒ„ ์„ค์ •ํ•˜๊ธฐ

์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” Master ์„œ๋ฒ„์˜ ํ˜ธ์ŠคํŠธ์ด๋ฆ„ ๋ณ€๊ฒฝ, hosts ํŒŒ์ผ ์ˆ˜์ •์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
Master๋กœ ์‚ฌ์šฉํ•  ๋‹จ ํ•˜๋‚˜์˜ ์„œ๋ฒ„์—์„œ๋งŒ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. 

๋”๋ณด๊ธฐ

1) ํ˜ธ์ŠคํŠธ ์ด๋ฆ„ ๋ณ€๊ฒฝ

ํ˜ธ์ŠคํŠธ์˜ ์ด๋ฆ„์„ Master๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

$ sudo hostnamectl set-hostname master

 

2) Hosts ํŒŒ์ผ ์ˆ˜์ •

Hosts ํŒŒ์ผ์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. HostsํŒŒ์ผ์—๋Š” Master ๋ฐ Worker01~03 ๋…ธ๋“œ์˜ ๋ชจ๋“  ์•„์ดํ”ผ&ํ˜ธ์ŠคํŠธ๋ช…์ด ์ •์˜๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

$ sudo vim /etc/hosts

 


2. Worker 1~3 ์„œ๋ฒ„ ์„ค์ •ํ•˜๊ธฐ

์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” Worker01~03 ์„œ๋ฒ„์˜ ํ˜ธ์ŠคํŠธ์ด๋ฆ„ ๋ณ€๊ฒฝ, hosts ํŒŒ์ผ ์ˆ˜์ •์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋”๋ณด๊ธฐ

1) ํ˜ธ์ŠคํŠธ ์ด๋ฆ„ ๋ณ€๊ฒฝ

์ €๋Š” MAC ์‚ฌ์šฉ ์œ ์ €๋กœ, iterm ํ„ฐ๋ฏธ๋„๋กœ ํ•œ๋ฒˆ์— ๋ช…๋ น์–ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
ํ˜น์‹œ ์œˆ๋„์šฐ ์‚ฌ์šฉ์ž์ด๊ณ , ๊ฐœ์ธ์ด๋ผ๋ฉด x-shell๊ฐ™์€ ํˆด์„ ๋‹ค์šด๋ฐ›์•„์„œ ์ €์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# ๊ฐ ํ˜ธ์ŠคํŠธ์— worker01, worker02, worker03 ์ž…๋ ฅ!
$ sudo hostnamectl set-hostname worker01

 
2) Hosts ํŒŒ์ผ ์ˆ˜์ •

$ sudo vim /etc/hosts

 


3. SSH-KEY ๊ตํ™˜ํ•˜๊ธฐ

์ด๋ฒˆ ๋‹จ๊ณ„๋Š” ๋ชจ๋“  ๋…ธ๋“œ์—์„œ ๋™์ผํ•˜๊ฒŒ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
โ€ป์ฃผ์˜ : ์ด๋ฒˆ ๋‹จ๊ณ„๋Š” ๊ตฌ์„ฑ์„ ํŽธํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด ๋ณด์•ˆ์— ์ทจ์•ฝํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์„ค์ •์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
์‹ค ์šด์˜ ํ™˜๊ฒฝ์—์„œ๋Š” Root๊ณ„์ •์˜ ๋ณดํ˜ธ์™€ SSHํ‚ค ๊ตํ™˜์ด ์ทจ์•ฝํ•  ์ˆ˜ ์žˆ์œผ๋‹ˆ ์„ค์ •์— ์ฃผ์˜ํ•˜์„ธ์š”.

๋”๋ณด๊ธฐ

1) SSH ์„ค์ • ๋ณ€๊ฒฝํ•˜๊ธฐ

์ž‘์—…์„ ์ง„ํ–‰ํ•˜๊ธฐ ์ „, Root ๊ณ„์ •์œผ๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

$ sudo su

 ์ด์ œ, SSH ์„ค์ •ํŒŒ์ผ์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.

# vim /etc/ssh/sshd_config

 

38๋ฒˆ์งธ ์ค„์˜ "PermitRootLogin" ์˜ ์ฃผ์„์„ ํ•ด์ œํ•ฉ๋‹ˆ๋‹ค.

65๋ฒˆ์งธ ์ค„์˜ "PasswordAuthentication" ๋ฅผ yes๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.

์ €์žฅํ›„ ์„ค์ •ํŒŒ์ผ์„ ๋ฆฌ๋กœ๋“œํ•˜๊ธฐ ์œ„ํ•ด ssh ๋ฐ๋ชฌ์„ ์žฌ์‹œ์ž‘ํ•˜๊ณ , Root ๊ณ„์ •์˜ ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

# systemctl restart sshd
# passwd

 

2) SSH ํ‚ค ์ƒ์„ฑํ•˜๊ธฐ

SSK-KEY๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (์ „๋ถ€ ์—”ํ„ฐ ๋ˆŒ๋Ÿฌ์ฃผ์„ธ์š”!)

# ssh-keygen

 

3) SSH ํ‚ค ๊ตํ™˜ํ•˜๊ธฐ

์ด์ œ sshํ‚ค๋ฅผ ๊ตํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋„ค๊ฐœ์˜ ๋…ธ๋“œ์—์„œ ๋ชจ๋‘ ๋™์ผํ•˜๊ฒŒ ์ง„ํ–‰ํ•ด์ฃผ์„ธ์š”.

# ssh-copy-id root@master
# ssh-copy-id root@worker01
# ssh-copy-id root@worker02
# ssh-copy-id root@worker03

์„ค์ •์ด ์ž˜ ๋˜์—ˆ๋‹ค๋ฉด ssh worker01 ์ด๋Ÿฐ์‹์œผ๋กœ ์ ‘์†ํ–ˆ์„๋•Œ, ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ๋ฌป์ง€ ์•Š๊ณ  ๋ฐ”๋กœ ๋กœ๊ทธ์ธ์ด ๋˜์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.
์—ฌ๊ธฐ๊นŒ์ง€ ์ง„ํ–‰ ๋˜์—ˆ๋‹ค๋ฉด ! ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ์ง„ํ–‰ํ•ด์ฃผ์„ธ์š”.

 


4. HDFS ํฌ๋ฉงํ•˜๊ธฐ

์ฒ˜์Œ ๋””์Šคํฌ๋ฅผ ์‚ฌ์šฉํ• ๋•Œ, OS์— ๋งž๊ฒŒ ํŒŒ์ผ ์‹œ์Šคํ…œ์„ ํฌ๋ฉงํ•˜๋Š”๊ฒƒ ์ฒ˜๋Ÿผ, HDFS๋ฅผ ์‹œ์ž‘ํ•˜๊ธฐ ์ „ ํฌ๋ฉง์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค :)
NameNode ํฌ๋ฉง์€ Master ์„œ๋ฒ„์—์„œ, DataNode ํฌ๋ฉง์€ Slave1, Slave2, Slave3 ์„œ๋ฒ„์—์„œ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค!

๋”๋ณด๊ธฐ

1) NameNode ํฌ๋ฉง

๋งˆ์Šคํ„ฐ ๋…ธ๋“œ์™€ Worker01๋…ธ๋“œ์—์„œ ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ๋„ค์ž„๋…ธ๋“œ ํฌ๋ฉง์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
(Worker01๋…ธ๋“œ๋Š” ์„ธ์ปจ๋”๋ฆฌ ๋„ค์ž„๋…ธ๋“œ๋กœ ์šด์˜๋ฉ๋‹ˆ๋‹ค)

[root@master ~]# /usr/local/hadoop-3.3.0/bin/hdfs namenode -format /hdfs_dir
[root@worker01 ~]# /usr/local/hadoop-3.3.0/bin/hdfs namenode -format /hdfs_dir

(์ค‘๊ฐ„ ์ƒ๋žต)

 

2) DataNode ํฌ๋ฉง

๋ฐ์ดํ„ฐ ๋…ธ๋“œ(worker01, worker02, worker03) ์„ธ ๋Œ€์—์„œ๋„ ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

[root@worker02 ~]# /usr/local/hadoop-3.3.0/bin/hdfs datanode -format /hdfs_dir/

(์ค‘๊ฐ„ ์ƒ๋žต)

 

worker01, worker02, worker03 ๋ชจ๋‘์—์„œ ๋˜‘๊ฐ™์ด ์ง„ํ–‰ํ•˜์…จ๋‹ค๋ฉด, ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ‘๋‹ˆ๋‹ค!! 

 


5. HDFS & YARN ์‹œ์ž‘ํ•˜๊ธฐ 

๋”๋ณด๊ธฐ

์ด์ œ HDFS&YARN์„ ์‹คํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

โ€ป Master ๋…ธ๋“œ์—์„œ๋งŒ ์‹œ์ž‘ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.!๐Ÿ‘๐Ÿป

์‹คํ–‰์„ ์œ„ํ•ด์„œ, ๋จผ์ € Master ๋…ธ๋“œ์—์„œ ์„ค์ •ํŒŒ์ผ์„ ํ•˜๋‚˜ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.
์•„๋ž˜์—์„œ ์ˆ˜์ •ํ•  workers ํŒŒ์ผ์€ HDFS์—์„œ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ๋ฐ์ดํ„ฐ ๋…ธ๋“œ๋ฅผ ์ง€์ •ํ•˜๋Š” ์„ค์ •ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

[root@master ~]# vim /usr/local/hadoop-3.3.0/etc/hadoop/workers
worker01
worker02
worker03

 

๋ฐ์ดํ„ฐ ๋…ธ๋“œ ์ง€์ •์ด ์™„๋ฃŒ ๋˜์—ˆ๋‹ค๋ฉด, ์ด์ œ HDFS&YARN์„ ์‹คํ–‰ํ•ด๋ด…์‹œ๋‹ค!!

๋งˆ์Šคํ„ฐ ๋…ธ๋“œ์—์„œ๋งŒ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

[root@master ~]# /usr/local/hadoop-3.3.0/sbin/start-all.sh

jps ๋ช…๋ น์–ด๋กœ HDFS & YARN์ด ์ž˜ ์‹คํ–‰๋˜์—ˆ๋Š”์ง€ ํ™•์ธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

[root@master ~]# jps

* HDFS GUI : http://<masterIP>:50070
* YARN GUI : http://<masterIP>:8080

 


6. Spark ์‹œ์ž‘ํ•˜๊ธฐ

๋”๋ณด๊ธฐ

์ด์ œ Spark ์„ ์‹คํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

โ€ป Master ๋…ธ๋“œ์—์„œ๋งŒ ์‹œ์ž‘ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.!๐Ÿ‘๐Ÿป

์‹คํ–‰์„ ์œ„ํ•ด์„œ, Spark๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Master ๋…ธ๋“œ์—์„œ ์„ค์ •ํŒŒ์ผ์„ ํ•˜๋‚˜ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.
์•„๋ž˜์—์„œ ์ˆ˜์ •ํ•  workers ํŒŒ์ผ์€ ํด๋Ÿฌ์Šคํ„ฐ ์„ค์ • ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

[root@master ~]# vim /usr/local/spark-3.1.1-bin-hadoop3.2/conf/workers

์ด์ œ ์‹คํ–‰์‹œ์ผœ๋ด…์‹œ๋‹ค!

[root@master ~]# /usr/local/spark-3.1.1-bin-hadoop3.2/sbin/start-all.sh

 

Spark ์›น์ฝ˜์†”์— ์ ‘์†ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
3๊ฐœ์˜ ์›Œ์ปค๋…ธ๋“œ๊ฐ€ ๋ณด์ด๋„ค์š” :)

 

์—ฌ๊ธฐ๊นŒ์ง€! ์ง„ํ–‰ํ•˜๋ฉด
HDFS + YARN + Spark ๊ตฌ์„ฑ์ด ์™„๋ฃŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค :)

๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ๋Š” Jupyter Notebook์„ ์‹คํ–‰ํ•˜๊ณ , ํ…Œ์ŠคํŠธ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!!

 

๋ฟ…!๐Ÿ‘ป

 


๋‹ค์Œํฌ์ŠคํŒ…์œผ๋กœ ๋ฐ”๋กœ๊ฐ€๊ธฐ~๐Ÿ‘๐Ÿป

Hadoop HDFS(3.3)+Spark(3.1.1)! ๋ฌด์ž‘์ • ๋”ฐ๋ผํ•˜๊ธฐ #3