How to setup Hadoop 2.2.0 ( HDFS ) stable version in your CentOS
How to setup Hadoop 2.2.0 ( HDFS ) stable version in your CentOS, this is article step by step to welcome you to big data
1) Requirements:
First of all, make sure your CentOS has Java SDK, to check Java SDK exists:
$ java -version
java version "1.7.0_25"
If no Java, install Java SDK: $ yum install java-1.7.0-openjdk-devel.x86_64
2) Hadoop user & group:
We can add user to hadoop group:
$ groupadd hadoop
$ adduser hdfs -g hadoop
3) Setup SSH certificate
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
4) Setup Hadoop
Download: 1) http://hadoop.apache.org/releases.html#Download
Switch user hdfs:
$ su hdfs
$ cd /home/hdfs
Download:
$ wget http://mirrors.digipower.vn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
$ tar xvzf hadoop-2.2.0.tar.gz
$ mv hadoop-2.2.0 hadoop
5) Hadoop variables:
Exit to root user, and open /root/.bashrc, add following line to setting up:
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jre-1.7.0/
export HADOOP_INSTALL=/home/hdfs/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
###end of paste
Go back to user hdfs, open /home/hdfs/hadoop/etc/hadoop/hadoop-env.sh, and modify this line:
JAVA_HOME=/usr/lib/jvm/jre-1.7.0/
6) Check Hadoop in system:
$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /home/hdfs/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar
7) Hadoop configuration:
$ su hdfs
$ cd /home/hdfs
$ vi /home/hdfs/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
$ vi /home/hdfs/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
$ vi /home/hdfs/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
8) Namenode & Datanode:
We create folder for namenode & datanote
$ mkdir -p /home/hdfs/namenode
$ mkdir -p /home/hdfs/datanode
9) Namenode & Datanode configuration:
$ vi /home/hdfs/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hdfs/datanode</value>
</property>
</configuration>
10) Format Namenode:
Easy task, almost done, we format namenode for HDFS ( Hadoop file system)
$ cd /home/hdfs/hadoop
$ bin/hadoop namenode –format
If done, the result seem likes this:
14/04/11 23:52:00 INFO common.Storage: Storage directory /home/hdfs/namenode has been successfully formatted.
14/04/11 23:52:00 INFO namenode.FSImage: Saving image file /home/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
14/04/11 23:52:00 INFO namenode.FSImage: Image file /home/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 196 bytes saved in 0 seconds.
11) Start Hadoop services:
$ start-dfs.sh
$ start-yarn.sh
12) Check Namenode & Datanode:
If everything is OK, we will see the result via jps command:
$ jps
1913 NodeManager
1822 ResourceManager
1680 SecondaryNameNode
2299 Jps
1431 NameNode
1519 DataNode
We will have an article for example Hadoop - WordCount as soon as posible, Face4store.com will discuss with you about this subject later.
Thank for your comment.