HADOOP: 2013

Wednesday, October 9, 2013

All about premitive and characters...conversions...etc ************************

The Low level discussion of primitive data types and Representations :

byte notation bit notation combinations range

1byte = = 8bit = = pow(2,8) 256 -128 to +127
2byte = = 16bit = = pow(2,8)*pow(2,8) 65536 -32,768 to +32,767
4byte = = 32bit = = pow(2,8)*pow(2,8)*pow(2,8) xxxxxxx xxxxxxxxxxxxxx
8byte = = 64bit = = pow(2,8)*pow(2,8)*pow(2,8)*pow(2,8) xxx xxxxxxxxxxxxx

In java primitive

byte - The byte data type is an 8-bit signed two's complement integer.
short - The short data type is a 16-bit signed two's complement integer.
int - The int data type is a 32-bit signed two's complement integer.
long - The long data type is a 64-bit signed two's complement integer.

float - The float data type is a single-precision 32-bit IEEE 754 floating point.
double - The double data type is a double-precision 64-bit IEEE 754 floating point.

char - The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).

here char is of 16 bit unicode so it's data representation will be in the form of hexadecimal like /uffff

here hexadecimal start with '0' and ends with 'f'

so numberin will be like below

0 - 0
1 - 1
2 - 2
3 - 3
4 - 4
5 - 5
6 - 6
7 - 7
8 - 8
9 - 9
10 -A
11 -B
12 -C
13 -D
14 -E
15 -F

TOTAL numbers from 0 to 15 are 16, means base 16.

char clarification :
Literals of types char and String may contain any Unicode (UTF-16) characters. If your editor and file system allow it, you can use such characters directly in your code. If not, you can use a "Unicode escape" such as ' \u0061' ( Latin small letter A)

below code will give you clarity

code snippet:
char i= '\u0061';
char j= 97;
char k= 'a';

System.out.println(i);
System.out.println((int)i);
System.out.println((char)i);
System.out.println(j);
System.out.println((int)j);
System.out.println((char)j);
System.out.println(k);
System.out.println((int)k);
System.out.println((char)k);

output:
a
97
a
a
97
a
a
97
a

Value conversions:

From the above code, let us discuss on char 'a'

\u0061 - it is in hexa decimal notation -- 16bit -- 2byte -- pow(2,8)*pow(2,8) -- combinations of 16 bits in binary representation.

0000000001100001 is the Binary representation of character 'a' ( total 16bits) use this link1 or link2 to calculate the value of this binary code.

convert it into decimal like this

0*pow(16,3)+0*pow(16,2)+6*pow(16,1)+1*pow(16,0) = 0+0+6*16+1*1 = 0+0+96+1 = 97

So, 97 is the decimal notation of character 'a'.

..
..
..
..
To be continued.......

sources:
http://www.binaryhexconverter.com/decimal-to-binary-converter
http://docs.oracle.com/javase/7/docs/api/java/io/DataInput.html
http://en.wikipedia.org/wiki/List_of_Unicode_characters
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
http://docs.oracle.com/javase/tutorial/i18n/text/string.html***

How to connect to any machine without password using SSH ? And how to configure ? 9/10

Hi All,

Here I am going to show you how to connect to any machine with the use of ssh.

Here we consider two machines ( both are Linux here)
1) Master
2) Slave

Now the scenario is, I want to access Master machine from Slave Machine.

What to do:
Steps...

- Install openssh-server in both of these machines.
Use command 'sudo apt-get install openssh-server'
- ssh configuration files are created in the location ' ~/.ssh ' is a directory.
- Since Slave wants to connect to Master, Slave needs public key of Master machine in it.

Note: Think the machine name nagarjuna@nagarjuna-Aspire-4736 is a Master Machine.

pic1:

- Generate public key, using ssh-keygen command in Master machine

pic2:

- we get files like below, can see using tree command.

pic3:

- Here we need the file named ' id_rsa.pub ' to be copied to Slave machine.

Key Copying process to Slave Machine:

- Create a touch file with the name ' authorized_keys ' under the .ssh folder in the Slave machine.
- And now copy the Masters ' id_rsa.pub ' content to ' authorized_keys ' file using below command
' cat id_rsa.pub >>authorized_keys ' .
- Now connect to Master using the command ' ssh nagarjuna@nagarjuna-Aspire-4736 ' in the Slave Machine.

Troubleshooting :

- If you are unable to connect to master machine. Follow below steps
1) The above process will be successed only when domain is registered with an IP address.
2) Otherwise edit the public key - meaning - at the end of the public key, we will see domainname like
nagarjuna@nagarjuna-Aspire-4736. Here you have to replace nagarjuna-Aspire-4736 with the IP address of Master machine.
3) Now try to connect.

If you want to know the IP address of Master machine, use the command ' ifconfig ' , You will IP.

If you still facing problems while connecting, then try to restart ssh server service in both Machines using

sudo /etc/init.d/ssh restart

sources:
https://help.ubuntu.com/10.04/serverguide/openssh-server.html
http://www.cyberciti.biz/faq/howto-start-stop-ssh-server/

Tuesday, October 8, 2013

Main configuration files in Hadoop pseudo distribution mode ..

Hadoop single node cluster configuration files and settings...:

In any Hadoop release like 1.0.4 or 1.2.0 or 1.2.1...etc, to configure for single node cluster, we have to configure main 4 files in hadoop.

They are

1) hadoop-env.sh
2) core-site.xml
3) hdfs-site.xml
4) mapred-site.xml

Note: Here commands or any file names in linux operating system is fully case sensitive, So be careful while typing or adding environment variables to .bashrc or .profiles

Below are the common content for the above files

For
1) uncomment JAVA_HOME with correct javahome path

2) add property tag under configuration tag, and write name tag and then value tag..like

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>

3) same as in step 2

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

4) lly

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

These are the main configuration settings to run single node hadoop cluster.

1) JAVA_HOME
2) fs.default.name = hdfs://localhost:9000
hadoop.tmp.dir = /home/hadoop/tmp (This is custom location and must have sufficient permissions)
3) dfs.replication = 1
4) mapred.job.tracker = localhost:9001

Monday, October 7, 2013

namenode not getting started in hadoop1.2.1 ?

namenode not getting started in hadoop1.2.1 ?

Hi All,

Today I have faced one problem in configuring hadoop single node cluster using hadoop1.2.1 version.

I have just configured all xml files in conf directory and tried to run the hdfs using start-dfs.sh. But I am unable to start the namenode.

Solution:

Goto core-site.xml and change the property name "dfs.default.name" to "fs.default.name". And run the hdfs service using the same command start-dfs.sh .

Check services using jps

http://stackoverflow.com/questions/8076439/namenode-not-getting-started/19227253#19227253

Thursday, September 19, 2013

What are the domains we do projects on (BFSI)?

BFSI:

Banking, Financial services and Insurance (BFSI) is an industry term for companies that provide a range of such financial products/services such as universal banks.

Banking may include core banking, retail, private, corporate, investment, cards and the like. Financial Services may include stock-broking, payment gateways, mutual funds etc. Insurance covers both life and non-life.

This term is commonly used by information technology (IT)/Information technology enabled services (ITES)/business process outsourcing (BPO) companies and technical/professional services firms that manage data processing, application testing and software development activities in this domain.

source: http://en.wikipedia.org/wiki/BFSI

UDF in Hive..

Writing a UDF:

To illustrate the process of writing and using a UDF, we’ll write a simple UDF to trim

characters from the ends of strings. Hive already has a built-in function called trim, so

we’ll call ours strip. The code for the Strip Java class is shown in

A UDF for stripping characters from the ends of strings

package com.hadoopbook.hive;

import org.apache.commons.lang.StringUtils;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class Strip extends UDF {

private Text result = new Text();

public Text evaluate(Text str) {

if (str == null) {

return null;

}

result.set(StringUtils.strip(str.toString()));

return result;

}

public Text evaluate(Text str, String stripChars) {

if (str == null) {

return null;

}

result.set(StringUtils.strip(str.toString(), stripChars));

return result;

}

A UDF must satisfy the following two properties:

1. A UDF must be a subclass of org.apache.hadoop.hive.ql.exec.UDF.

2. A UDF must implement at least one evaluate() method.

To use the UDF in Hive, we need to package the compiled Java class in a JAR file and register the file with Hive:

ADD JAR /path/to/hive-examples.jar;

We also need to create an alias for the Java classname:

CREATE TEMPORARY FUNCTION strip AS 'com.hadoopbook.hive.Strip';

The TEMPORARY keyword here highlights the fact that UDFs are only defined for the duration of the Hive session (they are not persisted in the metastore). In practice, this means you need to add the JAR file, and define the function at the beginning of each script or session.

Note:

As an alternative to calling ADD JAR, you can specify—at launch time— a path where Hive looks for auxiliary JAR files to put on its classpath (including the MapReduce classpath). This technique is useful for automatically adding your own library of UDFs every time you run Hive.

There are two ways of specifying the path, either passing the --auxpath option to the hive command:

% hive --auxpath /path/to/hive-examples.jar or by setting the HIVE_AUX_JARS_PATH environment variable before invoking Hive. The auxiliary path may be a comma-separated list of JAR file paths or a directory containing JAR files.

Alternatively you can edit $HIVE_HOME/conf/hive-site.xml with a hive.aux.jars.path property. Either way you need to do this before starting hive

The UDF is now ready to be used, just like a built-in function:

hive> SELECT strip(' bee ') FROM dummy;

bee

hive> SELECT strip('banana', 'ab') FROM dummy;

nan

Notice that the UDF’s name is not case-sensitive:

hive> SELECT STRIP(' bee ') FROM dummy;

bee

Analysing JSON (JavaScript Object Notation) Document in Hive

Analysing JSON (JavaScript Object Notation) Document:

you can use TEXTFILE as the input and output format, then use a JSON SerDe to parse each JSON document as a record.

Example 1:

1: Create a test file in Json format

$ cat> jsont.txt

{"a" :10, "b" :11, "c" :15}

{"a" :20, "b" :21, "c" :25}

{"a" :30, "b" :31, "c" :35}

{"a" :40, "b" :41, "c" :45}

{"a" :50, "b" :51, "c" :55}

{"a" :60, "b" :61, "c" :65}

2: Create a hive table

hive> create table jsont1(str string);

3: Load data into hive table from local xml file

hive> load data local inpath 'jsont.txt' into table jsont1;

4: Create another table to extract the json data

hive>create table jsont2(a int, b int, c int);

5: Insert jsont1 table data into jsont2 table

hive>insert overwrite table jsont2 select get_json_object(str, '$.a'), get_json_object(str, '$.b'), get_json_object(str, '$.c') from jsont1;

Example2 :

$Cat>jsonex.txt

{ "top" : [

{"table":"user",

"data":{

"name":"John Doe","userid":"2036586","age":"74","code":"297994","status":1}},

{"table":"user",

"data":{

"name":"Mary Ann","userid":"14294734","age":"64","code":"142798","status":1}},

{"table":"user",

"data":{

"name":"Carl Smith","userid":"13998600","age":"36","code":"32866","status":1}},

{"table":"user",

"data":{

"name":"Anil Kumar":"2614012","age":"69","code":"208672","status":1}},

{"table":"user",

"data":{

"name":"Kim Lee","userid":"10471190","age":"53","code":"79365","status":1}}

]}

CREATE TABLE user (line string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'

STORED AS TEXTFILE

LAOD DATA LOCAL INPATH ‘jsonex.txt’ OVERWRITE INTO TABLE user;

SELECT get_json_object(col0, '$.name') as name, get_json_object(col0, '$.userid') as uid,

get_json_object(col0, '$.age') as age, get_json_object(col0, '$.code') as code,

get_json_object(col0, '$.status') as status

FROM

(SELECT get_json_object(user.line, '$.data') as col0

FROM user

WHERE get_json_object(user.line, '$.data') is not null) temp;

Note: A string like $.user.id means to take each record, represented by $, find the user key, which is assumed to be a JSON map in this case, and finally extract the value for the id key inside the user. This value for the id is used as the value for the user_id column.

Analyzing XML Data in Hive.. 9/10

Analyzing XML Data:

To load the xml data in local directory

$cat>xmltestfile.txt

<emp><ename>Kiran</ename><sal>10000</sal></emp>

<emp><ename>Seshu</ename><sal>20000</sal></emp>

<emp><ename>Ramu</ename><sal>30000</sal></emp>

<emp><ename>Rama</ename><sal>40000</sal></emp>

<emp><ename>Srinu</ename><sal>50000</sal></emp>

<emp><ename>Ravi</ename><sal>60000</sal></emp>

<emp><ename>Sandhya</ename><sal>70000</sal></emp>

To create hive table

Hive>create table xmldata(str string);

To load data into hive table from local xml file

hive> load data local inpath 'xmltestfile.txt' into table xmldata;

To create another table to extract the xml data

hive>create table xmld1(ename array<string>, sal array<string>);

To insert xmld table data into xmld1 table

hive>insert overwrite table xmld1 select xpath(str, 'emp/ename/text()'), xpath(str, 'emp/sal/text()') from xmldata;

To create another table to convert the data from array type to normal type

hive> create table xmld2(ename string, sal int);

To insert the data to xmld2 from xmld1

hive> insert overwrite table xmld2 select ename[0],sal[0] from xmld1;

XPath UDFs:

XPath expressions :

Ex:

hive> SELECT xpath(\'<a>b1b2</a>\',\'//@id\')

> FROM src LIMIT 1;

[foo","bar]

hive> SELECT xpath (\'<a>b1b2b3<c class="bb">c1</c>

<c>c2</c></a>\', \'a/*[@class="bb"]/text()\')

> FROM src LIMIT 1;

[b1","c1]

(The long XML string was wrapped for space.)

hive> SELECT xpath_double (\'<a>2<c>4</c></a>\', \'a/b + a/c\')

> FROM src LIMIT 1;

6.0

Weblog Data Analysis..9/10

Weblog data analysis:

Input text: hivelog.txt

89.151.85.133 - - [23/Jun/2009:10:39:11 +0300] "GET /movie/127Hours HTTP/1.1" 200 766

212.76.137.2 - - [23/Jun/2009:10:39:11 +0300] "GET /movie/BlackSwan HTTP/1.1" 200 766

74.125.113.104 - - [23/Jun/2009:10:39:11 +0300] "GET /movie/TheFighter HTTP/1.1" 200 766

212.76.137.2 - - [23/Jun/2009:10:39:11 +0300] "GET /movie/Inception HTTP/1.1" 200 766

127.0.0.1 - - [23/Jun/2009:10:39:11 +0300] "GET /movie/TrueGrit HTTP/1.1" 200 766

10.0.12.1 - - [23/Jun/2009:10:39:11 +0300] "GET /movie/WintersBone HTTP/1.1" 200 766

hive> CREATE TABLE hive_log (

host STRING,

identity STRING,

user STRING,

time STRING,

request STRING,

status STRING,

size STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES (

"input.regex" =

"([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)",

"output.format.string"="%1$s %2$s %3$s %4$s %5$s %6$s %7$s"

) STORED AS TEXTFILE;

Then you load the data from the log file to hive table:

Hive>load data local inpath ‘hivelog.txt ‘ into table hive_log;

A quick test will tell you if the data’s being correctly handled by the SerDe. Since the RegexSerDe class is part of the Hive contrib, you’ll need to register the JAR so that it’s copied into the distributed cache and can be loaded by the MapReduce tasks:

hive> add jar $HIVE_HOME/lib/hive-contrib-0.7.1-cdh3u2.jar;

hive> SELECT host, request FROM hive_logs LIMIT 10;

89.151.85.133 "GET /movie/127Hours HTTP/1.1"

212.76.137.2 "GET /movie/BlackSwan HTTP/1.1"

74.125.113.104 "GET /movie/TheFighter HTTP/1.1"

212.76.137.2 "GET /movie/Inception HTTP/1.1"

127.0.0.1 "GET /movie/TrueGrit HTTP/1.1"

10.0.12.1 "GET /movie/WintersBone HTTP/1.1"

If you’re seeing nothing but NULL values in the output, it’s probably because you have

a missing space in your regular expression.

Hive Storage formats . 9/10

Storage Formats:
------------------------

There are two dimensions that govern table storage in Hive: the row format and the file format. The row format dictates how rows, and the fields in a particular row, are stored. In Hive parlance, the row format is defined by a SerDe

When acting as a deserializer, which is the case when querying a table, a SerDe will deserialize a row of data from the bytes in the file to objects used internally by Hive to operate on that row of data.

The default storage format: Delimited text

When you create a table with no ROW FORMAT or STORED AS clauses, the default format is delimited text, with a row per line.

The default row delimiter is Control-A (octal form of the delimiter characters can be used—001 for Control-A)
The default collection item delimiter is a Control-B
The default map key delimiter is a Control-C
Rows in a table are delimited by a newline character.

Example:

John Doe^A100000.0^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BState
Taxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600
Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.
05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601
Todd Jones^A70000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.
1^A200 Chicago Ave.^BOak Park^BIL^B60700
Bill King^A60000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.
1^A300 Obscure Dr.^BObscuria^BIL^B60100

look like in JavaScript Object Notation (JSON), where we have also inserted the names
from the table schema:
{
"name": "John Doe",
"salary": 100000.0,
"subordinates": ["Mary Smith", "Todd Jones"],
"deductions": {
"Federal Taxes": .2,
"State Taxes": .05,
"Insurance": .1
},
"address": {
"street": "1 Michigan Ave.",
"city": "Chicago",
"state": "IL",
"zip": 60600
}
}

Note:
Binary SerDe’s should not be used with the default TEXTFILE format (or explicitly using a STORED AS TEXTFILE clause). There is always the possibility that a binary row will contain a newline character, which would cause Hive to truncate the row and fail at deserialization time.

Hive supports what type of Data Types ?

Data Types:

Collection Data Types:

Sunday, September 15, 2013

hadoop-tutorial-hbase-part-6-key-design

Hadoop Tutorial: HBase Part 6 -- Key Design from Marty Hall

hadoop-tutorial-hbase-part-5-java-client-api-advanced-topics

Hadoop Tutorial: HBase Part 5 -- Java Client API Advanced Topics from Marty Hall

hadoop-tutorial-hbase-part-4-java-admin-api

Hadoop Tutorial: HBase Part 4 -- Java Admin API from Marty Hall

hadoop-tutorial-hbase-part-3-java-client-api

Hadoop Tutorial: HBase Part 3 -- Java Client API from Marty Hall

hadoop-tutorial-hbase-part-2-installation-and-shell

This below ppt is very good to learn HBase...part2

Hadoop Tutorial: HBase Part 2 -- Installation and Shell from Marty Hall

hadoop-tutorial-hbase-part-1-overview

This below ppt is very good to understand HBase.

Hadoop Tutorial: HBase Part 1 -- Overview from Marty Hall

Thursday, September 12, 2013

Installation of Hive in Single Node Hadoop Cluster Machine..

Hive Installation:

Hi all,

Here, I am going to show you how to install hive in hadoop single node cluster using tarball /offline.

Prerequisites:

- Hadoop must be installed, to check type " $ echo $HADOOP_HOME ".
- If HADOOP_HOME is not set, then set it immediately, because hive or any hadoop related applications always search for this variable in the current machine.

Note: Here commands or any file names in linux operating system is fully case sensitive, So be careful while typing or adding environment variables to .bashrc or .profiles

Here I am using 'hduser' as default user to run hadoop cluster. so I install through this user. Don't be confused with hduser, nagarjuna, sudo. ok

First you download the stable version of Hive from apache website. and must set to the currently installed Hadoop version, otherwise you will face errors or any bugs. ok

Installation Steps:

here I am using the hive version 0.9.0 for hadoop 1.0.4 cluster.
download hive-0.9.0-bin.tar.gz and do not download hive-0.9.0.tar.gz.

See the below structure I have in my machine

- In the above diagram, I have copied hivexxx.tar.gz file to /usr/local/. And observe it has no permissions to access other that root/sudo user.
- so give permission to this file using chmod command using sudo permission like below.

and then, extract that tar file using tarball command like below, here hduser may not have permission to extract to that folder so use sudo to extract.

Now, You will see hivexxx folder like below

Till now, we did only extraction of hivexxx tar.gz file to some location by using required permissions.
Okay, now we have to set system variables to run hive.

- We need HIVE_HOME and PATH system variables.
- Here we use User Level System Variables, by placing a bash script lines in .bashrc or .profile file under hduser's home directory which are already hidden.

- And editing of these files is your wish, I use nano or gedit to edit these files.

For example

add script to end of .bashrc or .profile file

Now, logout and login (Re-login) to hduser. Then check for $HIVE_HOME, If it shows the Hive home then Hive is ready to use.

check like below screen

Note:
From the above screen, hive shell will be displayed even though hadoop is not running. To run any SQL queries in hive shell, you must run hadoop. Otherwise you will get connection error like errors.

Please let me know if any mistakes are in this post, welcome your valuable feedback.
contact at nagarjuna.lingala@gmail.com

you can also see me at javaojava.blogspot.com also.

Wednesday, September 11, 2013

What does Secondary NameNode ? And what is the use ?

Secondary NameNode :

- This Secondary NN is not a hot backup of the NameNode. It cannot be used to the event of a NameNode failure.

- This Daemon, periodically synchronizes with the NameNode block index. During the synchronizing process, the Secondary NN retrieves the current NameNode image and edit logs, merges them together, and then sends the merged image back to the NameNode.

what is Data Locality ?

Data Locality :

- Applications using HDFS can achieve high throughput because the Hadoop framework was designed to move computation to the data.

- Applications can run on the nodes where the data resides instead of moving the data to the applications.

Tuesday, September 10, 2013

Important things about TaskTracker and mapred-site.xml configuration....

Tasks in TaskTracker:

    For each input split, a map task is created that runs the user-supplied map function on
each record in the split. Map tasks are executed in parallel. This means each chunk of
the input dataset is being processed at the same time by various machines that make
up the cluster. It’s fine if there are more map tasks to execute than the cluster can handle.
They’re simply queued and executed in whatever order the framework deems best.

The map function takes a key-value pair as input and produces zero or more intermediate key-value pairs.

The input format is responsible for turning each record into its key-value pair representation.

There is always a single tasktracker on each worker node.
Both tasktrackers and datanodes run on the same machines, which makes each node
both a compute node and a storage node, respectively. Each tasktracker is configured
with a specific number of map and reduce task slots that indicate how many of each
type of task it is capable of executing in parallel. A task slot is exactly what it sounds
like; it is an allocation of available resources on a worker node to which a task may be
assigned, in which case it is executed. A tasktracker executes some number of map
tasks and reduce tasks in parallel, so there is concurrency both within a worker where
many tasks run, and at the cluster level where many workers exist. Map and reduce
slots are configured separately because they consume resources differently.

It is common that tasktrackers allow more map tasks than reduce tasks to execute in parallel.

Upon receiving a task assignment from the jobtracker, the tasktracker executes an
attempt of the task in a separate process.

Upon receiving a task assignment from the jobtracker, the tasktracker executes an attempt

of the task in a separate process.

Difference between Task and Task attempt (Task instance):

- A task is the logical unit of work, while a task attempt is a specific, physical instance

of that task being executed.

mapred-site.xml

<property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>4</value>
    <description>The maximum number of map tasks that will be run simultaneously by a task tracker.</description>
</property>

<property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>4</value>
    <description>The maximum number of reduce tasks that will be run simultaneously by a task tracker.</description>
</property>

Saturday, August 24, 2013

What is Timestamp? In computers...domain....!

TimeStamp...

A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. The term derives from rubber stamps used in offices to stamp the current date, and sometimes time, in ink on paper documents, to record when the document was received. A common example of this type of timestamp is a postmark on a letter. However, in modern times usage of the term has expanded to refer to digital date and time information attached to digital data. For example, computer files contain timestamps that tell when the file was last modified, and digital cameras add timestamps to the pictures they take, recording the date and time the picture was taken.

or

A timestamp is the current time of an event that is recorded by a computer.
Timestamps are employed extensively within computers and over networks for various types of synchronization. For example, they are assigned to packets in some network protocols in order to facilitate the reassembly of the data (e.g., human speech) in the proper sequence by the receiving host (i.e., computer). Also, they are used by database management systems (DBMS) to determine the transaction order in the event of a system failure (e.g., a computer crash caused by a loss of electrical power or disk failure).
Timestamps are also routinely used to provide information about files, including when they were created and last accessed or modified. This information is included in the inode, which is a data structure on a filesystem on a Unix-like operating system that stores all the information about a file except its name and its actual data.
Another important application is events that are recorded in system log files. The timestamps in such files can be extremely useful for monitoring system security and for forensic purposes.
The time as recorded by timestamps can be measured in terms of the time of day or relative to some starting point. And it is measured with high precision in small fractions of a second.
The accuracy of the time is maintained through a variety of mechanisms, including the high-precision clocks built into computers and the network time protocol (NTP). NTP uses coordinated universal time (UTC) to synchronize computer clock times to a millisecond (and sometimes to a fraction of a millisecond) and uses UDP (user datagram protocol), one of the core Internet protocols, as its transport mechanism.

sources: http://www.linfo.org/timestamp.html
http://en.wikipedia.org/wiki/Timestamp