Following steps to install single node Hadoop 2.4.0
Prerequisites:
- Installed JDK 1.7.x (if not, see this link: Install JDK 1.7.x)
A. System Configuration
1. Add Hadoop system user (Optional step)
This step is optional but we recommend to create Hadoop system user to separate with other software applications
- Add new group
1 | root@ubuntu:~# addgroup hadoop |
- Add new user in hadoop group
1 | root@ubuntu:~# adduser –ingroup hadoop hduser |
2. Config SSH access
Prerequisites:
- Make sure that SSH is up and running on your machine and configured it to allow SSH public key authentication.
Generate an SSH key for the hduser:
1 2 3 4 5 6 7 8 9 10 11 12 | root@ubuntu:~$ su - hduser hduser@ubuntu:~$ ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/hduser/.ssh/id_rsa): Created directory '/home/hduser/.ssh'. Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub. The key fingerprint is: 9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu The key's randomart image is: [...snipp...] hduser@ubuntu:~$ |
Enable SSH access with the newly created key above:
1 | hduser@ubuntu:~$ cat /home/hduser/.ssh/id_rsa.pub >> /home/hduser/.ssh/authorized_keys |
Make sure the SSH access is applied to hduser user
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | root@ubuntu:~# ssh hduser@localhost hduser@localhost's password: Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic i686) * Documentation: https://help.ubuntu.com/ System information as of Tue May 13 11:36:31 ICT 2014 System load: 0.08 Processes: 85 Usage of /: 5.3% of 37.04GB Users logged in: 2 Memory usage: 2% IP address for eth0: 192.168.1.101 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/ Last login: Tue May 13 11:36:32 2014 from localhost hduser@ubuntu:~$ |
3. Disable IPV6
"Apache Hadoop is not currently supported on IPv6 networks. It has only been tested and developed on IPv4 stacks. Hadoop needs IPv4 to work, and only IPv4 clients can talk to the cluster.If your organisation moves to IPv6 only, you will encounter problems."
In root user, edit /etc/sysctl.conf to disable IPv6
1 | root@ubuntu:~# vi /etc/sysctl.conf |
Add the following lines to the end of the file
1 2 3 4 | #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 |
Reboot the system to update the configurations correctly
1 | root@ubuntu:~# reboot |
B. Hadoop Installation
1. Download hadoop-2.4.0.tar.gz
1 2 3 4 5 6 7 8 9 10 11 12 13 | root@ubuntu:~# wget http://apache.mirrors.pair.com/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz --2014-05-13 11:16:02-- http://apache.mirrors.pair.com/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz Resolving apache.mirrors.pair.com (apache.mirrors.pair.com)... 216.92.2.131 Connecting to apache.mirrors.pair.com (apache.mirrors.pair.com)|216.92.2.131|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 138943699 (133M) [application/x-gzip] Saving to: hadoop-2.4.0.tar.gz 100%[======================================>] 138,943,699 247KB/s in 9m 19s 2014-05-13 11:25:22 (243 KB/s) - hadoop-2.4.0.tar.gz saved [138943699/138943699] root@ubuntu:~# |
2. Move downloaded file to /usr/local
1 | root@ubuntu:~# mv hadoop-2.4.0.tar.gz /usr/local/ |
3. Extract file
1 2 | root@ubuntu:~# cd /usr/local/ root@ubuntu:/usr/local# tar xzf hadoop-2.4.0.tar.gz |
4. Rename hadoop-2.4.0 folder to hadoop folder
1 | root@ubuntu:~# mv hadoop-2.4.0 hadoop |
5. Change owner of all files in hadoop folder
1 | root@ubuntu:/usr/local# chown -R hduser:hadoop hadoop |
6. Config Hadoop files for single node:
1 | hduser@ubuntu:/usr/local# cd hadoop/etc/hadoop/ |
a. Modify yarn-site.xml
1 | hduser@ubuntu:/usr/local/hadoop/etc/hadoop# vi yarn-site.xml |
Insert code below
1 2 3 4 5 6 7 8 9 10 11 | <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration> |
b. Modify core-site.xml
1 | hduser@ubuntu:/usr/local/hadoop/etc/hadoop# vi core-site.xml |
Insert code below
1 2 3 4 5 6 | <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> |
c. Create mapred-site.xml
1 | hduser@ubuntu:/usr/local/hadoop/etc/hadoop# vi mapred-site.xml |
Insert code below
1 2 3 4 5 6 | <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> |
d. Modify hdfs-site.xml
1 | hduser@ubuntu:/usr/local/hadoop/etc/hadoop# vi hdfs-site.xml |
Insert code below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value> </property> </configuration> |
7. Create folders to store data files of namenode and datanode
1 2 | hduser@ubuntu:/usr/local/hadoop/etc/hadoop# mkdir -p yarn_data/hdfs/namenode hduser@ubuntu:/usr/local/hadoop/etc/hadoop# mkdir -p yarn_data/hdfs/datanode |
8. Update $HOME/.bashrc with hduser user
1 2 3 4 5 6 7 8 9 | # Hadoop variables export JAVA_HOME=/usr/local/java/jdk1.7.0_55/ export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME |
9. Open and modify hadoop-env.sh
1 | hduser@ubuntu:~$ vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh |
Modify JAVA_HOME like this
1 | export JAVA_HOME=/usr/local/java/jdk1.7.0_55/ |
10. Reboot the system to apply new configuration
1 | root@ubuntu:~# reboot |
11. After rebooting, verify the Hadoop Version installed using the following command in the terminal
1 2 3 4 5 6 7 8 | hduser@ubuntu:~$ hadoop version Hadoop 2.4.0 Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1583262 Compiled by jenkins on 2014-03-31T08:29Z Compiled with protoc 2.5.0 From source with checksum 375b2832a6641759c6eaf6e3e998147 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.4.0.jar hduser@ubuntu:~$ |
C. Hadoop Start-up
1. Format the HDFS filesystem via the NameNode
The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” (which includes only your local machine if you followed this tutorial). You need to do this the first time you set up a Hadoop cluster.
Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | hduser@ubuntu:~$ hadoop namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 14/05/14 09:36:44 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.4.0 STARTUP_MSG: classpath = ........................................ STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common -r 1583262; compiled by 'jenkins' on 2014-03-31T08:29Z STARTUP_MSG: java = 1.7.0_55 ************************************************************/ 14/05/14 09:36:44 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 14/05/14 09:36:44 INFO namenode.NameNode: createNameNode [-format] Formatting using clusterid: CID-e9602272-9da3-414e-a3b2-0003f94927eb 14/05/14 09:36:45 INFO namenode.FSNamesystem: fsLock is fair:true 14/05/14 09:36:45 INFO namenode.HostFileManager: read includes: HostSet( ) 14/05/14 09:36:45 INFO namenode.HostFileManager: read excludes: HostSet( ) 14/05/14 09:36:45 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000 14/05/14 09:36:45 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 14/05/14 09:36:45 INFO util.GSet: Computing capacity for map BlocksMap 14/05/14 09:36:45 INFO util.GSet: VM type = 32-bit 14/05/14 09:36:45 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB 14/05/14 09:36:45 INFO util.GSet: capacity = 2^22 = 4194304 entries 14/05/14 09:36:45 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false 14/05/14 09:36:45 INFO blockmanagement.BlockManager: defaultReplication = 1 14/05/14 09:36:45 INFO blockmanagement.BlockManager: maxReplication = 512 14/05/14 09:36:45 INFO blockmanagement.BlockManager: minReplication = 1 14/05/14 09:36:45 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 14/05/14 09:36:45 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false 14/05/14 09:36:45 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000 14/05/14 09:36:45 INFO blockmanagement.BlockManager: encryptDataTransfer = false 14/05/14 09:36:45 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 14/05/14 09:36:45 INFO namenode.FSNamesystem: fsOwner = hduser (auth:SIMPLE) 14/05/14 09:36:45 INFO namenode.FSNamesystem: supergroup = supergroup 14/05/14 09:36:45 INFO namenode.FSNamesystem: isPermissionEnabled = true 14/05/14 09:36:45 INFO namenode.FSNamesystem: HA Enabled: false 14/05/14 09:36:45 INFO namenode.FSNamesystem: Append Enabled: true 14/05/14 09:36:45 INFO util.GSet: Computing capacity for map INodeMap 14/05/14 09:36:45 INFO util.GSet: VM type = 32-bit 14/05/14 09:36:45 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB 14/05/14 09:36:45 INFO util.GSet: capacity = 2^21 = 2097152 entries 14/05/14 09:36:45 INFO namenode.NameNode: Caching file names occuring more than 10 times 14/05/14 09:36:45 INFO util.GSet: Computing capacity for map cachedBlocks 14/05/14 09:36:45 INFO util.GSet: VM type = 32-bit 14/05/14 09:36:45 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB 14/05/14 09:36:45 INFO util.GSet: capacity = 2^19 = 524288 entries 14/05/14 09:36:45 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 14/05/14 09:36:45 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0 14/05/14 09:36:45 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000 14/05/14 09:36:45 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 14/05/14 09:36:45 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 14/05/14 09:36:45 INFO util.GSet: Computing capacity for map NameNodeRetryCache 14/05/14 09:36:45 INFO util.GSet: VM type = 32-bit 14/05/14 09:36:45 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB 14/05/14 09:36:45 INFO util.GSet: capacity = 2^16 = 65536 entries 14/05/14 09:36:45 INFO namenode.AclConfigFlag: ACLs enabled? false Re-format filesystem in Storage Directory /usr/local/hadoop/yarn_data/hdfs/namenode ? (Y or N) 14/05/14 09:38:10 INFO namenode.FSImage: Allocated new BlockPoolId: BP-140501697-127.0.1.1-1400035090698 14/05/14 09:38:10 INFO common.Storage: Storage directory /usr/local/hadoop/yarn_data/hdfs/namenode has been successfully formatted. 14/05/14 09:38:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 14/05/14 09:38:11 INFO util.ExitUtil: Exiting with status 0 14/05/14 09:38:11 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1 ************************************************************/ hduser@ubuntu:~$ |
2. Start Single-Node Cluster
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | hduser@ubuntu:~$ start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [localhost] localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-ubuntu.out localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.out Starting secondary namenodes [0.0.0.0] The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established. ECDSA key fingerprint is 64:0d:71:90:1b:7a:af:93:55:39:45:5d:ec:16:c7:44. Are you sure you want to continue connecting (yes/no)? no 0.0.0.0: Host key verification failed. starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-ubuntu.out localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-ubuntu.out hduser@ubuntu:~$ |
3. Check whether Hadoop processes are started successfully
1 2 3 4 5 6 7 | hduser@ubuntu:~$ jps 1898 NodeManager 1407 NameNode 1556 DataNode 2179 Jps 1769 ResourceManager hduser@ubuntu:~$ |
4. Stop Single-Node Cluster
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | hduser@ubuntu:~$ stop-all.sh This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh Stopping namenodes on [localhost] localhost: stopping namenode localhost: stopping datanode Stopping secondary namenodes [0.0.0.0] The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established. ECDSA key fingerprint is 64:0d:71:90:1b:7a:af:93:55:39:45:5d:ec:16:c7:44. Are you sure you want to continue connecting (yes/no)? yes 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts. 0.0.0.0: no secondarynamenode to stop stopping yarn daemons stopping resourcemanager localhost: stopping nodemanager no proxyserver to stop hduser@ubuntu:~$ |
5. View Hadoop Web-Interface
Go to http://localhost:50070 or http://ubuntu_ip_address:50070 to connect Hadoop Web-Interface
0 comments:
Post a Comment