Getting Started with Hive
In this tutorial, you'll create a Hive table, load data from a tab-delimited text file, and run a couple of basic queries against the table. For details on setting up HiveServer2 and starting BeeLine, see Connecting to HiveServer2.
Take a look at the source data
First, take a look at the contents of the file using the terminal:
- Save the following data to a text file named
If you're working on the MapR Virtual Machine, we'll be loading the file from the MapR Virtual Machine's local file system (not the cluster storage layer), so save the file in the MapR Home directory (for example,
1320352532 1001 http://www.mapr.com/doc http://www.mapr.com 192.168.10.1 1320352533 1002 http://www.mapr.com http://www.example.com 192.168.10.10 1320352546 1001 http://www.mapr.com http://www.mapr.com/doc 192.168.10.1
- Make sure you are in the Home directory where you saved
cd ~if you are not sure).
cat sample-table.txtto display the following output.
mapr@mapr-desktop:~$ cat sample-table.txt 1320352532 1001 http://www.mapr.com/doc http://www.mapr.com 192.168.10.1 1320352533 1002 http://www.mapr.com http://www.example.com 192.168.10.10 1320352546 1001 http://www.mapr.com http://www.mapr.com/doc 192.168.10.1
Notice that the file consists of only three lines, each of which contains a row of data fields separated by the TAB character. The data in the file represents a web log.
Create a table in Hive and load the source data:
- Type the following command to start the Hive shell, using tab-completion to expand the
- At the
hive>prompt, type the following command to create the table:
CREATE TABLE web_log(viewTime INT, userid BIGINT, url STRING, referrer STRING, ip STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
- Type the following command to load the data from
sample-table.txtinto the table:
LOAD DATA LOCAL INPATH '/home/mapr/sample-table.txt' INTO TABLE web_log;
Run basic queries against the table:
- Try the simplest query, one that displays all the data in the
SELECT web_log.* FROM web_log;
This query would be inadvisable with a large table, but with the small sample table it returns very quickly.
- Try a simple SELECT to extract only data that matches a desired
This query launches a MapReduce job to filter the data.
SELECT web_log.* FROM web_log WHERE web_log.url LIKE '%doc';