๐ŸŒฑ Infra/Hadoop_HDFS

Hadoop HDFS(3.3)+Spark(3.1.1)! ๋ฌด์ž‘์ • ๋”ฐ๋ผํ•˜๊ธฐ #1

mini_world 2021. 4. 25. 18:04
๋ชฉ์ฐจ ์ ‘๊ธฐ

 

์•ˆ๋…•ํ•˜์„ธ์š” ๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜!

์ €๋ฒˆ ํฌ์ŠคํŒ…์—๋Š” ํ•˜๋‘ก HDFS ์˜ˆ์ „ ๋ฒ„์ „ (2.0)์„ ์„ค์น˜ํ–ˆ์—ˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฒˆ ํฌ์ŠคํŒ…์—๋Š” ํ•˜๋‘กHDFS ์ตœ์‹ ๋ฒ„์ „์ธ 3.3๋ฅผ ์„ค์น˜ํ•˜๊ณ , ๊ทธ ์œ„์— Spark๋„ ํ•จ๊ป˜ ์„ค์น˜ํ•ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

HDFS 3.3๋ฒ„์ „์€ Java 1.8๋ฒ„์ „ ์ด์ƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ^.^
(Apache Hadoop 3.3 and upper supports Java 8 and Java 11)

์ด๋ฒˆ ํฌ์ŠคํŒ…๊ณผ ์ด์–ด์ง€๋Š” ํฌ์ŠคํŒ…๋“ค์„ ๋”ฐ๋ผ์„œ ์ญ‰ ์ง„ํ–‰ํ•˜๋ฉด, HDFS+YARN+Spark ๊ตฌ์„ฑ์ด ์™„์„ฑ๋˜๊ณ , 
๋งˆ์ง€๋ง‰์œผ๋กœ๋Š” ์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.๐Ÿ‘๐Ÿป

[์„ค์น˜ํ•ด์•ผํ•  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ชฉ๋ก] 
1. Java 1.8
2. HDFS 3.3
3. Scala 2.13.5
4. Spark 3.1.1

์ž, ์ด์ œ ํ•œ๋ฒˆ ์„ค์น˜ํ•ด๋ด…์‹œ๋‹ค!

 

 


1. EC2 ์ธ์Šคํ„ด์Šค ์ƒ์„ฑํ•˜๊ธฐ! 

์ด๋ฒˆ ๋‹จ๊ณ„์—์„œ๋Š” EC2 ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
OS๋Š” AmazonLinux2 ๋ฅผ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.!!!

๋”๋ณด๊ธฐ

AWS ์›น ์ฝ˜์†”์— ์ ‘์†ํ•œ ํ›„ EC2์„œ๋น„์Šค๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. 
๋งจ ์ฒ˜์Œ ํ™”๋ฉด์—์„œ "์ธ์Šคํ„ด์Šค ์‹œ์ž‘" ๋ฒ„ํŠผ์„ ์ฐพ์•„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

๋จผ์ €, 1๋‹จ๊ณ„, AMI๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค
์ €๋Š” Amazon Linux2๋ฅผ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ด๋ฏ€๋กœ, ์ฒซํŽ˜์ด์ง€์—์„œ ๋ฐ”๋กœ ๋‚˜์˜ค๋Š” AMI๋ฅผ ์„ ํƒํ•ด์ค๋‹ˆ๋‹ค.

2๋‹จ๊ณ„, ์ธ์Šคํ„ด์Šค ์œ ํ˜•์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ €๋Š” c5.large์„ ์„ ํƒํ•œ ํ›„ "๋‹ค์Œ:์ธ์Šคํ„ด์Šค์„ธ๋ถ€์ •๋ณด ๊ตฌ์„ฑ"์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

3๋‹จ๊ณ„, ์ธ์Šคํ„ด์Šค ๊ตฌ์„ฑ์ž…๋‹ˆ๋‹ค.
๋งŒ์•ฝ, VPC์„ค์ •์ด๋‚˜, EC2 Role ์ถ”๊ฐ€๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋ฉด ์ด ๋‹จ๊ณ„์—์„œ ์ง„ํ–‰ํ•ด์ฃผ์…”์•ผ ํ•ฉ๋‹ˆ๋‹ค. 
์ €๋Š” ๋ณ„๋„๋กœ ์„ค์ •ํ• ๊ฒŒ ์—†๊ธฐ๋•Œ๋ฌธ์— ๋ชจ๋‘ ๊ธฐ๋ณธ์œผ๋กœ ๋‘๊ณ  ๋‹ค์Œ:์Šคํ† ๋ฆฌ์ง€์ถ”๊ฐ€๋กœ ๋„˜์–ด๊ฐ‘๋‹ˆ๋‹ค.

4๋‹จ๊ณ„, ์Šคํ† ๋ฆฌ์ง€ ์ถ”๊ฐ€ ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค.
์ €๋Š” ์ด ๋‹จ๊ณ„์—์„œ 50GiB gp3 ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. (HDFS ์Šคํ† ๋ฆฌ์ง€๊ฐ€ ๋  ์˜์—ญ์ž…๋‹ˆ๋‹ค.)

์ถ”๊ฐ€ํ•˜์…จ๋‹ค๋ฉด ์ด์ œ "๊ฒ€ํ†  ๋ฐ ์‹œ์ž‘"์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค. 

7๋‹จ๊ณ„, ๊ฒ€ํ† ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. ์•„๋ž˜๊นŒ์ง€ ์ญ‰ ๊ฒ€ํ† ํ•œ ํ›„ ์‹œ์ž‘ํ•˜๊ธฐ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
ํ‚คํŽ˜์–ด ํŒ์—…์ฐฝ์—์„œ ๊ฐ€์ง€๊ณ ์žˆ๋Š” ํ‚ค ํŽ˜์–ด๋ฅผ ์„ ํƒํ•˜๊ฑฐ๋‚˜ ์ƒˆ ํ‚คํŽ˜์–ด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๋ช‡ ๋ถ„ ํ›„ ์ธ์Šคํ„ด์Šค๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ธ์Šคํ„ด์Šค ๋ชฉ๋ก์—์„œ ์ƒˆ๋กœ ์ƒ์„ฑ๋œ ์ธ์Šคํ„ด์Šค๋ฅผ ํ™•์ธํ•˜๊ณ  ์ด๋ฆ„์„ HDFS with Spark ์œผ๋กœ ๋ณ€๊ฒฝํ•˜์˜€์Šต๋‹ˆ๋‹ค. 
๊ทธ๋ฆฌ๊ณ  ํผ๋ธ”๋ฆญIP์ฃผ์†Œ๋ฅผ ํ™•์ธํ•ด์ค์‹œ๋‹ค!

์ด์ œ ์ด ์ธ์Šคํ„ด์Šค์— SSH๋กœ ์ ‘์†ํ•ฉ๋‹ˆ๋‹ค.

ssh -i <ํ‚คํŽ˜์–ด๊ฒฝ๋กœ> ec2-user@<ํผ๋ธ”๋ฆญIP>

์ธ์Šคํ„ด์Šค์— ์ ‘์†๊นŒ์ง€ ํ–ˆ์œผ๋ฉด, ์ด๋ฒˆ ๋‹จ๊ณ„๋Š” ์—ฌ๊ธฐ์„œ ์™„๋ฃŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค :)
๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ‘์‹œ๋‹ค!

 


2. HDFS ์Šคํ† ๋ฆฌ์ง€๊ฐ€ ๋  ๋””์Šคํฌ ํฌ๋ฉงํ•˜๊ธฐ!

๋”๋ณด๊ธฐ

์œ„์—์„œ 50GiB์งœ๋ฆฌ EBS๋ฅผ ๋ณ„๋„๋กœ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
lsblk ๋ช…๋ น์–ด๋กœ ํ•ด๋‹น ๋””์Šคํฌ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ lsblk

์ด ๋””์Šคํฌ๋ฅผ xfs๋กœ ํฌ๋ฉงํ•ด์ค๋‹ˆ๋‹ค!!

[ec2-user@ip-172-31-3-64 install_dir]$ sudo mkfs.xfs /dev/nvme1n1

์ด ๋””์Šคํฌ๋ฅผ ๋งˆ์šดํŠธํ•  ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  fstab ์„ค์ •ํŒŒ์ผ์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.
โ€ป mount๋ช…๋ น์–ด๋กœ ๋งˆ์šดํŠธ ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์žฌ๋ถ€ํŒ…๋“ฑ์œผ๋กœ ๋งˆ์šดํŠธ๊ฐ€ ํ•ด์ œ๋  ์ˆ˜ ์žˆ๊ธฐ๋•Œ๋ฌธ์— fstab ํŒŒ์ผ์„ ์ˆ˜์ •ํ•˜์—ฌ ๋งˆ์šดํŠธ ๊ฒฝ๋กœ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. (mount๋ช…๋ น์–ด๋กœ ์„ค์ • = ์ผ์‹œ์ ์œผ๋กœ ๋งˆ์šดํŠธ, fstab = OS๋ถ€ํŒ… ์‹œ ๋งˆ์šดํŠธ)
โ€ป ๋ชจ๋“  ์„ค์ •์ด ๋๋‚œ ํ›„ ์ด ์ธ์Šคํ„ด์Šค๋ฅผ ๋ณต์ œํ•˜์—ฌ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
โ€ป์ฃผ์˜ : ์‹ค ์šด์˜ํ™˜๊ฒฝ์—์„œ๋Š” ๋””์Šคํฌ์˜ UUID๋ฅผ ํ™•์ธํ•˜์—ฌ UUID๊ธฐ์ค€์œผ๋กœ ๋งˆ์šดํŠธ ํ•˜์„ธ์š”. (UUID ํ™•์ธ ๋ช…๋ น์–ด : ls -l /dev/disk/by-uuid | grep xvda)

[ec2-user@ip-172-31-3-64 install_dir]$ sudo vi /etc/fstab

------- ์ถ”๊ฐ€ -------------------------------------------------------------
/dev/nvme1n1	/hdfs_dir	xfs	defaults	1	1
-------------------------------------------------------------------------

๊ทธ๋ฆฌ๊ณ  "/hdfs_dir" ๋””๋ ‰ํ„ฐ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•œ ํ›„ mount -a ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ, ๋งˆ์šดํŠธ๋ฅผ ํ•ด์ค๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo mkdir /hdfs_dir
[ec2-user@ip-172-31-3-64 install_dir]$ sudo mount -a
[ec2-user@ip-172-31-3-64 install_dir]$ df -h

์ด๋ ‡๊ฒŒ /dev/nvme1n1 ๋””์Šคํฌ๊ฐ€ /hdfs_dir ๊ฒฝ๋กœ์— ๋งˆ์šดํŠธ๋˜์žˆ๋Š”๊ฑธ ํ™•์ธ ํ–ˆ๋‹ค๋ฉด, ์ด ๋‹จ๊ณ„๋Š” ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค!
๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ‘๋‹ˆ๋‹ค.

 


3. ์†Œํ”„ํŠธ์›จ์–ด ์„ค์น˜ํ•˜๊ธฐ (java, HDFS, Scala, Spark)

๋”๋ณด๊ธฐ

1) yum update

๋จผ์ € ์‹œ์ž‘ํ•˜๊ธฐ ์ „์—,OS์˜ ์ตœ์‹  ์—…๋ฐ์ดํŠธ๋ฅผ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด "yum update"๋ฅผ ํ•ด์ค๋‹ˆ๋‹ค. 

[ec2-user@ip-172-31-3-64 ~]$ sudo yum update -y

 


2) Java 1.8 ์„ค์น˜ํ•˜๊ธฐ

java๋Š” yum ๋ ˆํฌ์ง€ํ† ๋ฆฌ์—์„œ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์•„๋ž˜ search ๋ช…๋ น์–ด๋กœ ํ™•์ธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
์„ค์น˜ ๊ฐ€๋Šฅํ•œ ํŒจํ‚ค์ง€ ๋ชฉ๋ก์ค‘์— java-1.8.0-openjdk๋ฅผ ์„ค์น˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 ~]$ sudo yum search java 1.8

์•„๋ž˜ ๋ช…๋ น์–ด๋กœ java-1.8.0์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 ~]$ sudo yum install java-1.8.0-openjdk -y

์„ค์น˜๊ฐ€ ์™„๋ฃŒ ๋˜์—ˆ๋‹ค๋ฉด, java -version ๋ช…๋ น์–ด๋กœ ์„ค์น˜๋œ ๋ฒ„์ „์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
1.8๋ฒ„์ „์ด ์„ค์น˜๋˜์—ˆ๋‹ค๋ฉด ์ •์ƒ์ ์œผ๋กœ ์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋œ ๊ฒƒ ์ž…๋‹ˆ๋‹ค!

[ec2-user@ip-172-31-3-64 ~]$ java -version

 


3) Hadoop 3.3 ์„ค์น˜ํ•˜๊ธฐ

์„ค์น˜๋ฅผ ์ง„ํ–‰ํ•˜๊ธฐ ์ „ ์„ค์น˜ํŒŒ์ผ์„ ์œ„ํ•œ ๋ณ„๋„์˜ ๊ฒฝ๋กœ /install_dir ๋””๋ ‰ํ„ฐ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ์ด ๊ฒฝ๋กœ์— hadoop 3.3 ๋ฐ”์ด๋„ˆ๋ฆฌ ํŒŒ์ผ์„ ๋‹ค์šด๋ฐ›์Šต๋‹ˆ๋‹ค. (๋‹ค์šด๋กœ๋“œ)

[ec2-user@ip-172-31-3-64 ~]$ sudo mkdir /install_dir
[ec2-user@ip-172-31-3-64 ~]$ cd /install_dir/
[ec2-user@ip-172-31-3-64 install_dir]$ sudo wget https://mirror.navercorp.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

์ž˜ ๋‹ค์šด๋ฐ›์•„์กŒ๋Š”์ง€ ํ™•์ธํ•ด๋ด…๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ ls -al

์ด์ œ ์ด ์••์ถ•ํŒŒ์ผ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก /usr/local ์•„๋ž˜ ๊ฒฝ๋กœ์— ์••์ถ•์„ ํ•ด์ œํ•œ ํ›„์— ๊ฒฝ๋กœ๋ฅผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo tar -zxvf hadoop-3.3.0.tar.gz -C /usr/local/
[ec2-user@ip-172-31-3-64 install_dir]$ ls -al /usr/local/hadoop-3.3.0/

๊ทธ๋ฆฌ๊ณ  ์ด ๊ฒฝ๋กœ์˜ ๋ชจ๋“  ๋””๋ ‰ํ„ฐ๋ฆฌ/ํŒŒ์ผ์˜ ์†Œ์œ ๊ถŒ์„ ๋ณ€๊ฒฝํ•ด์ค๋‹ˆ๋‹ค. 

[ec2-user@ip-172-31-3-64 install_dir]$ sudo chown root:root -R /usr/local/hadoop-3.3.0/
[ec2-user@ip-172-31-3-64 install_dir]$ ls -al /usr/local/hadoop-3.3.0/

 


4) Scala 2.13.5 ์„ค์น˜ํ•˜๊ธฐ

Scala 2.13.5 ๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. (Scala 2.13.5๋Š” Java 1.8๋ฒ„์ „์ด ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค!!)

[ec2-user@ip-172-31-3-64 install_dir]$ sudo wget https://downloads.lightbend.com/scala/2.13.5/scala-2.13.5.tgz
[ec2-user@ip-172-31-3-64 install_dir]$ ls -al

๋‹ค์šด๋ฐ›์€ ํŒŒ์ผ์˜ ์••์ถ•์„ /usr/local ๊ฒฝ๋กœ์— ํ’€์–ด์ค๋‹ˆ๋‹ค. 
๊ทธ๋ฆฌ๊ณ , ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŒŒ์ผ/๋””๋ ‰ํ„ฐ๋ฆฌ์˜ ์†Œ์œ ๊ถŒ์„ ๋ณ€๊ฒฝํ•ด์ค์‹œ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo tar -xzvf scala-2.13.5.tgz  -C /usr/local/
[ec2-user@ip-172-31-3-64 install_dir]$ sudo chown -R root:root /usr/local/scala-2.13.5/

 Scala์˜ ์„ค์น˜์™€ ์„ค์ •์ด ์ž˜ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ์Šค์นผ๋ผ๋ฅผ ์‹คํ–‰ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ /usr/local/scala-2.13.5/bin/scala

์Šค์นผ๋ผ๊ฐ€ ์ž˜ ์„ค์น˜ ๋˜์—ˆ๋‹ค๋ฉด, ์œ„์˜ ์บก์ณ์™€ ๊ฐ™์ด ์‹คํ–‰๋˜๋Š”๊ฒƒ์„ ํ™•์ธํ• ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 

 


5)  Spark 3.1.1 ์„ค์น˜ํ•˜๊ธฐ

์ด์ œ ๋งˆ์ง€๋ง‰ ์„ค์น˜ ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค. (๋‹ค์šด๋กœ๋“œ)
์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ๋ณต๋ถ™ํ•ด์„œ ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œ ํ•ด์ฃผ์„ธ์š”!

[ec2-user@ip-172-31-3-64 install_dir]$ sudo wget https://mirror.navercorp.com/apache/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz

๊ทธ๋ฆฌ๊ณ  ๋‹ค์šด๋ฐ›์€ ํŒŒ์ผ์„ /usr/local ๊ฒฝ๋กœ์— ์••์ถ•์„ ํ•ด์ œํ•˜๊ณ , ์†Œ์œ ๊ถŒ์„ ๋ณ€๊ฒฝํ•ด์ค๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo tar -xzvf spark-3.1.1-bin-hadoop3.2.tgz -C /usr/local/
[ec2-user@ip-172-31-3-64 install_dir]$ sudo chown -R root:root /usr/local/spark-3.1.1-bin-hadoop3.2/

์ž˜ ์„ค์น˜๊ฐ€ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ Spark-Shell์„ ์‹คํ–‰ํ•ด๋ด…๋‹ˆ๋‹ค.
๋’ค์—์„œ๋Š” Spark Cluster ๋ชจ๋“œ๋กœ ์‹คํ–‰๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค :)

[ec2-user@ip-172-31-3-64 install_dir]$ /usr/local/spark-3.1.1-bin-hadoop3.2/bin/spark-shell

 

์—ฌ๊ธฐ๊นŒ์ง€ ์™„๋ฃŒ ๋˜์—ˆ์œผ๋ฉด ๋ชจ๋“  ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์„ค์น˜ ์™„๋ฃŒํ–ˆ์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ •๊ณผ ์„ค์ •ํŒŒ์ผ๋“ค์„ ์ˆ˜์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค!

 


4. ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ • ๋ฐ HDFS ์„ค์ •ํŒŒ์ผ ์ˆ˜์ •ํ•˜๊ธฐ!

๋”๋ณด๊ธฐ

1) ํ™˜๊ฒฝ๋ณ€์ˆ˜

ํ•˜๋‘ก๊ณผ ์ŠคํŒŒํฌ๋ฅผ ์‹คํ–‰ํ• ๋•Œ ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ์ž˜๋ชป ์„ค์ •ํ•˜๋ฉด ์˜ค๋ฅ˜๊ฐ€ ์˜ค์ง€๊ฒŒ ๋‚œ๋‹ค์š”....ใ…œ...์ฃผ์˜ํ•ด์„œ ์„ค์ •ํ•˜์„ธ์š”...๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚
/etc/profile์„ ์—ด๊ณ , ์•„๋ž˜์™€ ๊ฐ™์ด ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk-1.8.0.282.b08-1.amzn2.0.1.x86_64
export HADOOP_HOME=/usr/local/hadoop-3.3.0
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/usr/local/spark-3.1.1-bin-hadoop3.2

 


2) ํ•˜๋‘ก ์„ค์ •ํŒŒ์ผ

- ํ•˜๋‘ก ์„ค์ •ํŒŒ์ผ ๊ฒฝ๋กœ: /usr/local/hadoop-3.3.0/etc/hadoop/ *.xml
- core-site.xml : ๊ณตํ†ต ์„ค์ •
- hdfs * .xml : HDFS ์„ค์ •
- mapred * .xml: Mapreduce ์„ค์ •
- yarn * .xml : yarn ์„ค์ •

   2-1) core-site.xml

ํ•˜๋‘ก ์‹œ์Šคํ…œ ์„ค์ • ํŒŒ์ผ๋กœ, ๋กœ๊ทธํŒŒ์ผ, ๋„คํŠธ์›Œํฌ ํŠœ๋‹, I/OํŠœ๋‹, ํŒŒ์ผ์‹œ์Šคํ…œํŠœ๋‹, ์••์ถ• ๋“ฑ ์‹œ์Šคํ…œ ์„ค์ • ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.
HDFS์™€ ๋งต๋ฆฌ๋“€์Šค์—์„œ ๊ณตํ†ต์ ์œผ๋กœ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•˜๊ฒŒ ๋˜๋ฉฐ, core-default.xml์ด ๊ธฐ๋ณธ ๊ฐ’์ด๋ฉฐ, core-site.xml์— ์„ค์ •๊ฐ’์ด ์—†๋Š” ๊ฒฝ์šฐ ๊ธฐ๋ณธ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 

[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /usr/local/hadoop-3.3.0/etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
       <value>hdfs://master:9000</value>
    </property>
</configuration>

 

   2-2) hdfs-site.xml

HDFS์—์„œ ์‚ฌ์šฉํ•  ํ™˜๊ฒฝ์ •๋ณด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. hdfs-default.xml์ด ๊ธฐ๋ณธ ๊ฐ’์ด๋ฉฐ, hdfs-site.xml์— ์„ค์ •๊ฐ’์ด ์—†๋Š” ๊ฒฝ์šฐ ๊ธฐ๋ณธ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /usr/local/hadoop-3.3.0/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///hdfs_dir/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///hdfs_dir/datanode</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>worker01:50090</value>
    </property>
</configuration>

 

   2-3) yarn-site.xml

Resource Manager ๋ฐ Node Manager์— ๋Œ€ํ•œ ๊ตฌ์„ฑ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /usr/local/hadoop-3.3.0/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>file:///hdfs_dir/yarn/local</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>file:///hdfs_dir/yarn/logs</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
         <value>master</value>
    </property>
</configuration>

 

   2-4) mapred-site.xml

MapReduce ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค์ • ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /usr/local/hadoop-3.3.0/etc/hadoop/mapred-site.xml
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

 

   2-5) hadoop-env.sh

hadoop ํ™˜๊ฒฝ์„ค์ • ํŒŒ์ผ์— java ์„ค์ •์„ ํ•ด์ค๋‹ˆ๋‹ค. ์ด๋ถ€๋ถ„๋„ ๋น ์กŒ์„ ๊ฒฝ์šฐ์— ์˜ค๋ฅ˜๊ฐ€ ๋‚˜์˜ค๋”๋ผ๊ณ ์š” ใ… ใ… 

[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /usr/local/hadoop-3.3.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk-1.8.0.282.b08-1.amzn2.0.1.x86_64
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"

 


3) Spark ์„ค์ •ํŒŒ์ผ

spark ์„ค์ •ํŒŒ์ผ ๊ฒฝ๋กœ๋Š” /usr/local/spark-3.1.1-bin-hadoop3.2/conf ์ž…๋‹ˆ๋‹ค.
์„ค์ •ํŒŒ์ผ์ด ํƒฌํ”Œ๋ฆฟ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋˜๊ธฐ๋•Œ๋ฌธ์—, cp๋ช…๋ น์–ด๋กœ ํŒŒ์ผ์„ ๋ณต์‚ฌํ•ด์„œ ์“ฐ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

  3-1) Spark-default.conf

spark-defaults.conf ํŒŒ์ผ์„ ๋ณต์‚ฌํ•˜๊ณ  ๋งจ ์•„๋ž˜์— ์„ธ์ค„์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo cp /usr/local/spark-3.1.1-bin-hadoop3.2/conf/spark-defaults.conf.template  /usr/local/spark-3.1.1-bin-hadoop3.2/conf/spark-defaults.conf
[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /usr/local/spark-3.1.1-bin-hadoop3.2/conf/spark-defaults.conf
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://namenode:8021/spark_enginelog

 

  3-3) Spark-env.sh

spark-env.sh ํŒŒ์ผ์„ ๋ณต์‚ฌํ•˜๊ณ , ์•„๋ž˜ ๋‹ค์„ฏ์ค„์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

[ec2-user@ip-172-31-3-64 install_dir]$ sudo cp /usr/local/spark-3.1.1-bin-hadoop3.2/conf/spark-env.sh.template /usr/local/spark-3.1.1-bin-hadoop3.2/conf/spark-env.sh
[ec2-user@ip-172-31-3-64 install_dir]$ sudo vim /usr/local/spark-3.1.1-bin-hadoop3.2/conf/spark-env.sh
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk-1.8.0.282.b08-1.amzn2.0.1.x86_64
export SPARK_MASTER_HOST=master
export HADOOP_HOME=/usr/local/hadoop-3.3.0
export YARN_CONF_DIR=\$HADOOP_HOME/etc/hadoop
export HADOOP_CONF_DIR=\$HADOOP_HOME/etc/hadoop

 

์—ฌ๊ธฐ๊นŒ์ง€ ์™„๋ฃŒ ๋˜์—ˆ๋‹ค๋ฉด ์ด์ œ ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ‘๋‹ˆ๋‹ค!

 


5. EC2 ๋ณต์ œํ•˜๊ธฐ

์ด์ œ ์„ค์น˜๋œ ์ด EC2๋ฅผ 3๊ฐœ ๋” ๋ณต์ œํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค!!!

๋”๋ณด๊ธฐ

๋จผ์ €, AWS EC2 ์ฝ˜์†”๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
EC2๋ฅผ ์„ ํƒํ•˜๊ณ  ์ž‘์—… > ์ด๋ฏธ์ง€ ๋ฐ ํ…œํ”Œ๋ฆฟ > ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€ ์ƒ์„ฑ ํŽ˜์ด์ง€์—์„œ, ์ด๋ฏธ์ง€์˜ ์ด๋ฆ„๋งŒ ์ž‘์„ฑํ•˜์‹  ํ›„ "์ด๋ฏธ์ง€ ์ƒ์„ฑ"์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

EC2์ฝ˜์†”, ์™ผ์ชฝ ๋„ค๋น„๊ฒŒ์ด์…˜ ๋ฐ”์—์„œ AMI๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.
์‹œ๊ฐ„์ด ์กฐ๊ธˆ ์ง€๋‚œ ํ›„ ํ™•์ธํ•˜์‹œ๋ฉด avaliable ์ƒํƒœ๋กœ ๋ณ€๊ฒฝ๋œ๊ฒƒ์„ ๋ณผ ์ˆ˜์žˆ์Šต๋‹ˆ๋‹ค.
์ด์ œ, ์ด ์ด๋ฏธ์ง€๋ฅผ ์„ ํƒํ•˜๊ณ , ์ƒ๋‹จ์˜ ์ž‘์—… > ์‹œ์ž‘ํ•˜๊ธฐ๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

์ธ์Šคํ„ด์Šค ์œ ํ˜•์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
์ €๋Š” c5.large ํƒ€์ž…์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„ "๋‹ค์Œ:์ธ์Šคํ„ด์Šค ์„ธ๋ถ€์ •๋ณด ๊ตฌ์„ฑ"์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

์ธ์Šคํ„ด์Šค ๊ฐœ์ˆ˜๋ฅผ 3์œผ๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค.
ํ•œ๋ฒˆ์— ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 3์œผ๋กœ ๋ณ€๊ฒฝ ๋˜์—ˆ๋‹ค๋ฉด, ๊ฒ€ํ†  ๋ฐ ์‹œ์ž‘์„ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

์ธ์Šคํ„ด์Šค ์‹œ์ž‘ ์ „ ๊ฒ€ํ†  ํŽ˜์ด์ง€์ž…๋‹ˆ๋‹ค.
์ž˜๋ชป๋œ ๋ถ€๋ถ„์ด ์žˆ๋Š”์ง€ ํ•œ๋ฒˆ ๋” ๊ฒ€ํ† ํ•˜๊ณ  "์‹œ์ž‘ํ•˜๊ธฐ"๋ฅผ ํด๋ฆญํ•ฉ๋‹ˆ๋‹ค.

์‹œ์ž‘ ํ›„ ์•ฝ๊ฐ„์˜ ์‹œ๊ฐ„์ด ํ๋ฅด๋ฉด, ์•„๋ž˜ ์‚ฌ์ง„์ฒ˜๋Ÿผ ๋„ค๊ฐœ์˜ ์ธ์Šคํ„ด์Šค๊ฐ€ ์šด์˜์ค‘์ธ ์ƒํƒœ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
์ €๋Š” ๊ตฌ๋ถ„ํ•˜๊ธฐ ํŽธํ•˜๋„๋ก Name์„ ๋ณ€๊ฒฝํ•ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

์ž, ์—ฌ๊ธฐ๊นŒ์ง€ ์ง„ํ–‰ ๋˜์—ˆ๋‹ค๋ฉด ์ด์ œ ๋‹ค์Œ ํฌ์ŠคํŒ…์œผ๋กœ ๋„˜์–ด๊ฐ‘๋‹ˆ๋‹ค! 

 

 

๋‚˜๋จธ์ง€๋Š” ๋‹ค์Œํฌ์ŠคํŒ…์—์„œ ์ด์–ด์ง‘๋‹ˆ๋‹ค!! ๋ฟ…! ๐Ÿค—๐Ÿค—๐Ÿค—