您的当前位置:首页Hive & HBase导入数据
Hive & HBase导入数据
来源:锐游网
一、启动环境
1.1 切换用户
sudo su - hadoop
1.2 HDFS + YARN
hdfs namenode -format
/opt/hadoop-2.7.3/sbin/start-dfs.sh
/opt/hadoop-2.7.3/sbin/start-yarn.sh
/opt/zookeeper-3.4.6/bin/zkServer.sh restart
jps
1.3 HBase
/opt/hbase-1.2.6/bin/start-hbase.sh
jps
hbase shell
1.4 Hive
service mysql start
/opt/apache-hive-1.2.1-bin/bin/hive --service metastore &
/opt/apache-hive-1.2.1-bin/bin/hive --service hiveserver2 &
ps -ef|grep Hive
jps
hive
二、HBase导入数据集
2.1 准备数据集
$ su -l hadoop
$ wget https://labfile.oss.aliyuncs.com/courses/567/log.csv
$ hdfs namenode -format
$ start-dfs.sh
$ start-yarn.sh
$ hdfs dfs -mkdir -p /user/hadoop/
$ hdfs dfs -put /home/hadoop/log.csv /user/hadoop/log.csv
2.2 HBase建表
$ start-hbase.sh
$ hbase shell
hbase(main):001:0> create 'access_log', 'cf1', 'cf2'
hbase(main):002:0> exit
2.3 导入数据
$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf1:date,cf1:id,cf2:url,cf2:pre_url,cf2:ip,cf2:country access_log /user/hadoop/log.csv
三、Hive导入数据集
3.1 准备数据集
$ su hadoop
$ cd ~
$ wget https://labfile.oss.aliyuncs.com/courses/567/log_example.csv
$ hdfs namenode -format
$ start-dfs.sh
$ start-yarn.sh
$ wget https://labfile.oss.aliyuncs.com/courses/1136/mysql-connector-java-5.1.46.tar.gz
$ tar zxvf mysql-connector-java-5.1.46.tar.gz
3.2 Hive建表
$ sudo service mysql start
$ schematool -initSchema -dbType mysql - 初始化
$ hive
hive> CREATE TABLE access(time varchar(40), id varchar(10), url varchar(100), pre_url varchar(100), ip varchar(32), country varchar(10))
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
3.3 导入数据
hive> LOAD DATA LOCAL INPATH '/home/hadoop/log_example.csv' OVERWRITE INTO TABLE access;
hive> SELECT COUNT(*) FROM access WHERE url = '/downloads/product_1' AND country = 'us';
四、Hive关联HBase
4.1 准备数据集
$ wget https://labfile.oss.aliyuncs.com/courses/2629/test.csv
$ awk '$0=NR","$0' test.csv > woqu.csv #增加一列行号和分隔符 > HBase rowkey
$ hdfs dfs -put /home/hadoop/woqu.csv /user/hadoop/woqu.csv
4.2 HBase建表&导数
$ hbase shell
hbase(main):001:0> create 'woqu', 'cf1'
$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf1:userid,cf1:goodsid,cf1:catid,cf1:sellerid,cf1:brandid,cf1:month,cf1:day,cf1:action,cf1:agelevel,cf1:gender,cf1:province woqu /user/hadoop/woqu.csv
4.3 Hive建表&关联HBase表
hive> CREATE external TABLE `woqu`(
`key` varchar(255),
`userid` varchar(255),
`goodsid` varchar(255),
`catid` varchar(255),
`sellerid` varchar(255),
`brandid` varchar(255),
`month` varchar(255),
`day` varchar(255),
`action` varchar(255),
`agelevel` varchar(255),
`gender` varchar(255),
`province` varchar(255))
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:userid,cf1:goodsid,cf1:catid,cf1:sellerid,cf1:brandid,cf1:month,cf1:day,cf1:action,cf1:agelevel,cf1:gender,cf1:province")TBLPROPERTIES ("hbase.table.name" = "woqu");
hive> select count(*) from woqu;
因篇幅问题不能全部显示,请点此查看更多更全内容