Installing Hadoop 1.0.x-stable On 10+ Nodes With HBase, ZooKeeper, Thrift, HappyBase

For more information about Hadoop, please watch https://www.youtube.com/watch?v=d2xeNpfzsYI. HBase on top of Hadoop provides powerful, extremely high throughput (Hadoop HDFS) with secondary indexing, automatic sharding of data, and Map-Reduce. I’m going to try to keep this guide as simple as possible for our future reference. I hope you find it useful!

The hardware topology in our case will be 13 Dell R420 1RU webservers, each with 143GB SSD RAID 1 drives and 2 x Intel E5-2450 @ 2.10GHz CPUs (20MB Cache), 16 cores (32 with HT), with 32GB of RAM. These servers are in 2 racks connected by 2 x 1gig Foundry switching split into 2 vlans – frontend and backend.

N.B., Hadoop 1.0.x-stable requires one NameNode for filesystem metadata (see Hadoop 2.0 for NameNode HA and HDFS Federation), at least three Datanodes (replica level by default is set to 3), exactly one JobTracker, and many TaskTrackers.

Footnote: In Hadoop 2.0.x, the Map-Reduce JobTracker has been split into two components: the ResourceManager and ApplicationMaster. Apache mentions that, “the new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. An application is either a single job in the sense of classic MapReduce jobs or a DAG of such jobs. The ResourceManager and per-machine NodeManager daemon, which manages the user processes on that machine, form the computation fabric. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.” For more information see this YARN document.

In our case, we will set up 1 master NameNode with hostname www3 and 1 master JobTracker with hostname www4 for Map/Reduce jobs. The rest of the servers will be slaves, and as such will be DataNodes with TaskTrackers. These will have hostnames www5 through www15. www2 will be a hot-spare of www3 in the event of system failure. We achieve this by specifying a secondary NFS path for dfs.name.dir, which will be mounted on our failover server and replayed in the event of www4 failure. The operating systems on these servers are all CentOS 6.x x86_64.

Create a new Unix user hdfs that will be for HDFS daemon operations on each node: useradd hdfs
Download Hadoop 1.0.x stable. In our case, we get from a mirror: wget http://apache.petsads.us/hadoop/common/stable/hadoop-1.0.4.tar.gz.
Extract the directory to /usr/local: tar -C /usr/local -zxvf hadoop-1.0.4.tar.gz then ln -s /usr/local/hadoop-1.0.4 /usr/local/hadoop. Then make sure it is owned by the hdfs user: chown -R hdfs /usr/local/hadoop-1.0.4
Make sure java is installed. yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel

First we set up the NameNode on www3. Open /usr/local/hadoop/conf/core-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
<!-- URI of NameNode (master metadata HDFS node) -->
    <name>fs.default.name</name>
    <value>hdfs://www3/</value>
  </property>
  <property>
    <name>fs.inmemory.size.mb</name>
    <value>200</value>
    <!--Larger amount of memory allocated for the in-memory file-system used to merge map-outputs at the reduces. -->
  </property>
  <property>
    <name>io.sort.factor</name>
    <value>100</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>200</value>
  </property>
  <property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
  </property>
</configuration>

Next create a folder for Namenode metadata and mapreduce temp data:

mkdir /usr/local/hadoop/namenode ; mkdir /usr/local/hadoop/datanode; mkdir -p /var/hadoop/temp ; chown hdfs /usr/local/hadoop/namenode; chown hdfs /usr/local/hadoop/datanode; chown -R hdfs /var/hadoop/temp

. Edit /usr/local/hadoop/conf/hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/usr/local/hadoop/namenode,/nfs_storage/hadoop/namenode_backup</value>
  </property>
  <property>
    <name>dfs.block.size</name>
    <!-- 128MB -->
    <value>134217728</value>
  </property>
  <property>
    <name>dfs.namenode.handler.count</name>
    <!-- # RPC threads from datanodes -->
    <value>40</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/usr/local/hadoop/datanode</value>
  </property>
  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
  </property>
  <property>
    <name>dfs.support.append</name>
    <value>true</value>
  </property>
</configuration>

Next modify /usr/local/hadoop/conf/mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>www4:8021</value>
  </property>
  <property>
    <name>mapred.system.dir</name>
    <value>/hadoop/mapred/system/</value>
  </property>
  <property>
    <name>mapred.local.dir</name>
    <value>/var/hadoop/temp</value>
  </property>
  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>20</value>
  </property>
  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>20</value>
  </property>
  <property>
    <name>mapred.queue.names</name>
    <value>default,rooms</name>
  </property>
  <property>
    <name>mapred.acls.enabled</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.reduce.parallel.copies</name>
    <value>20</value>
    <!--Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.-->
  </property>
  <property>
    <name>mapred.map.child.java.opts</name>
    <value>-Xmx512M</value>
  </property>
  <property>
    <name>mapred.reduce.child.java.opts</name>
    <value>-Xmx512M</value>
  </property>
  <property>
    <name>mapred.task.tracker.task-controller</name>
    <value>org.apache.hadoop.mapred.DefaultTaskController</value>
  </property>
</configuration>

Next modify your slaves and masters files /usr/local/hadoop/conf/slaves and /usr/local/hadoop/conf/masters:

[root@www3 conf]# cat masters
www3
www4

[root@www3 conf]# cat slaves
www5
www6
www7
www8
www9
www10
www11
www12
www13
www14
www15

We now need to set up your environment on the NameNode www3. Open /usr/local/hadoop/conf/hadoop-env.sh:

export HADOOP_NODENAME_OPTS="-XX:+UseParallelGC ${HADOOP_NODENAME_OPTS}"
export HADOOP_HEAPSIZE="1000"
export JAVA_HOME=/usr/lib/jvm/java

Also you will want to put

export JAVA_HOME=/usr/lib/jvm/java

in your ~/.bashrc

After www3 (NameNode master server) has been set up as follows, we just have to copy the conf files over to www4 and the other nodes then fire up hadoop. Here are the steps after syncing the conf files and making the appropriate (meta)data directories like /usr/local/hadoop/namenode or /usr/local/hadoop/datanode.

To start a Hadoop cluster you will need to start both the HDFS and Map/Reduce cluster. Format a new distributed filesystem:
```
$ bin/hadoop namenode -format
```

You will get output like:

[hdfs@www3 hadoop]$ bin/hadoop namenode -format
12/12/29 02:00:05 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = www3/10.23.23.12
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
Re-format filesystem in /usr/local/hadoop/namenode ? (Y or N) 
2/12/29 02:15:39 INFO util.GSet: VM type       = 64-bit
12/12/29 02:15:39 INFO util.GSet: 2% max memory = 17.77875 MB
12/12/29 02:15:39 INFO util.GSet: capacity      = 2^21 = 2097152 entries
12/12/29 02:15:39 INFO util.GSet: recommended=2097152, actual=2097152
12/12/29 02:15:40 INFO namenode.FSNamesystem: fsOwner=hdfs
12/12/29 02:15:40 INFO namenode.FSNamesystem: supergroup=supergroup
12/12/29 02:15:40 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/12/29 02:15:40 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/12/29 02:15:40 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/12/29 02:15:40 INFO namenode.NameNode: Caching file names occuring more than 10 times 
12/12/29 02:15:40 INFO common.Storage: Image file of size 110 saved in 0 seconds.
12/12/29 02:15:40 INFO common.Storage: Storage directory /usr/local/hadoop/namenode has been successfully formatted.
12/12/29 02:15:40 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at www3/10.23.23.12
************************************************************/

The hdfs user should have ssh (pubkey) access to all the slaves as well as to itself from the nameNode and the jobTracker node. Here we create the keys for each (ssh into each nameNode and jobTracker to do this) and add to all ssh authorized_hosts files; sudo su - hdfs ; ssh-keygen -t rsa ; cat ~/.ssh/id_rsa.pub Copy that public key to the ~hdfs/.ssh/authorized_keys file on the slaves and the master NameNode.
Start the HDFS with the following command *** after the slaves are configured/installed *** – see Part 2 and then run on the designated NameNode:
```
$ bin/start-dfs.sh
```
Start the jobTracker node ***after part2!***
```
$ bin/start-mapred.sh
```

Installing Hadoop 1.0.x-stable On 10+ Nodes With HBase, ZooKeeper, Thrift, HappyBase – Part 1!

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112