Tuesday, October 8, 2013

Main configuration files in Hadoop pseudo distribution mode ..

Hadoop single node cluster configuration files and settings...:

In any Hadoop release like 1.0.4 or 1.2.0 or 1.2.1...etc, to configure for single node cluster, we have to configure main 4 files in hadoop.

They are

1) hadoop-env.sh
2) core-site.xml
3) hdfs-site.xml
4) mapred-site.xml

Note: Here commands or any file names in linux operating system is fully case sensitive, So be careful while typing or adding environment variables to .bashrc or .profiles

Below are the common content for the above files

For
1) uncomment  JAVA_HOME with correct javahome path

2) add property tag under configuration tag, and write name tag and then value tag..like
 
       <configuration>
                  <property>
                            <name>fs.default.name</name>
                            <value>hdfs://localhost:9000</value>
                 </property>
                 <property>
                            <name>hadoop.tmp.dir</name>
                            <value>/home/hadoop/tmp</value>
                 </property>
      </configuration>

3) same as in step 2

        <configuration>
                    <property>
                              <name>dfs.replication</name>
                              <value>1</value>
                    </property>
       </configuration>

4) lly

         <configuration>
                      <property>
                                <name>mapred.job.tracker</name>
                                <value>localhost:9001</value>
                     </property>
        </configuration>



These are the main configuration settings to run single node hadoop cluster.

1) JAVA_HOME
2) fs.default.name = hdfs://localhost:9000
    hadoop.tmp.dir = /home/hadoop/tmp    (This is custom location and must have sufficient permissions)
3) dfs.replication = 1
4) mapred.job.tracker = localhost:9001

No comments:

Post a Comment