Thursday, September 19, 2013

Analyzing XML Data in Hive.. 9/10


Analyzing XML Data: 

  1. To load the xml data in local directory

$cat>xmltestfile.txt
<emp><ename>Kiran</ename><sal>10000</sal></emp>
<emp><ename>Seshu</ename><sal>20000</sal></emp>
<emp><ename>Ramu</ename><sal>30000</sal></emp>
<emp><ename>Rama</ename><sal>40000</sal></emp>
<emp><ename>Srinu</ename><sal>50000</sal></emp>
<emp><ename>Ravi</ename><sal>60000</sal></emp>
<emp><ename>Sandhya</ename><sal>70000</sal></emp>
^d

  1. To create hive table

Hive>create table xmldata(str string);

  1. To load data into hive table from local xml file

hive> load data local inpath 'xmltestfile.txt' into table xmldata;

  1. To create another table to extract the xml data

hive>create table xmld1(ename array<string>, sal array<string>);

  1. To insert xmld table data into xmld1 table

hive>insert overwrite table xmld1 select xpath(str, 'emp/ename/text()'), xpath(str, 'emp/sal/text()') from xmldata;

  1. To create another table to convert the data from array type to normal type

hive> create table xmld2(ename string, sal int);

  1. To insert the data to xmld2 from xmld1

hive> insert overwrite table xmld2 select ename[0],sal[0] from xmld1;


XPath UDFs:





XPath expressions :


Ex:


hive> SELECT xpath(\'<a><b id="foo">b1</b><b id="bar">b2</b></a>\',\'//@id\')
> FROM src LIMIT 1;

[foo","bar]

hive> SELECT xpath (\'<a><b class="bb">b1</b><b>b2</b><b>b3</b><c class="bb">c1</c>
<c>c2</c></a>\', \'a/*[@class="bb"]/text()\')
> FROM src LIMIT 1;

[b1","c1]
(The long XML string was wrapped for space.)

hive> SELECT xpath_double (\'<a><b>2</b><c>4</c></a>\', \'a/b + a/c\')
> FROM src LIMIT 1;

6.0


No comments:

Post a Comment