Home > Software Center, Structured Storage > Setup Hadoop on Ubuntu 11.04 64-bit

Setup Hadoop on Ubuntu 11.04 64-bit

Hadoop documentation page has provided a clear statement for hadoop setup on Linux. However, in this entry I want to make the same process simpler and shorter, tailored to suit Ubuntu 11.04 64-bit OS.

1. Install Sun JDK

Sun JDK is unavailable in the official repository of Ubuntu Software Center. What a shame! Let’s resort to an external PPA (Personal Package Archives). Launch the Terminal and run the following commands:

sudo add-apt-repository ppa:ferramroberto/java
sudo apt-get update
sudo apt-get install sun-java6-bin
sudo apt-get install sun-java6-jdk

Add JAVA_HOME variable:

sudo gedit /etc/environment

Append a new line in the file:

export JAVA_HOME="/usr/lib/jvm/java-6-sun-1.6.0.26"

Test the success of installation in Terminal:

java -version

2. Check SSH Setting

ssh localhost

If it says “connection refused”, you’d better reinstall SSH:

sudo apt-get install openssh-server openssh-client

If you cannot ssh to localhost without a passphrase, execute the following commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

3. Setup Hadoop

Download a recent stable release and unpack it. Edit conf/hadoop-env.sh to define JAVA_HOME as "/usr/lib/jvm/java-6-sun-1.6.0.26":

# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.26

Pseudo-Distributed Operation:

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

Switch to hadoop root directory and format a new distributed file system:

bin/hadoop namenode -format

You’ll get info like “Storage directory /tmp/hadoop-jasper/dfs/name has been successfully formatted.” Remember this path is the HDFS home directory of namenode.

Start and stop hadoop daemons:

bin/start-all.sh 
bin/stop-all.sh

Web interfaces for the NameNode and the JobTracker:

4. Deploy An Example Map-Reduce Job

Let’s run the WordCount example job, which is already embedded in hadoop release. In your local directory, e.g., “/home/jasper/mapreduce/wordcount/”, put some text files. Then copy these files from local directory to HDFS directory and list them:

bin/hadoop dfs -copyFromLocal /home/jasper/mapreduce/wordcount /tmp/hadoop-jasper/dfs/name/wordcount

bin/hadoop dfs -ls /tmp/hadoop-jasper/dfs/name/wordcount

Run the job:

bin/hadoop jar hadoop*examples*.jar wordcount /tmp/hadoop-jasper/dfs/name/wordcount /tmp/hadoop-jasper/dfs/name/wordcount-output

If the output info looks no problem, copy the output file from HDFS to local directory:

bin/hadoop dfs -getmerge /tmp/hadoop-jasper/dfs/name/wordcount-output /home/jasper/mapreduce/wordcount/

Now you can open the output file in your local directory to view the results.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: