Every day to be a little better

Hive实战操作

Hive实战操作

mysql配置安装:

  • 安装: mysql / mysql-server
yum install -y mysql
yum install -y mysql-server
/etc/init.d/mysqld start
  • 安装Hive
wget http://archive.apache.org/dist/hive/hive-0.13.0/apache-hive-0.13.0-bin.tar.gz
tar -zxvf apache-hive-0.13.0-bin.tar.gz
cd /usr/local/src/apache-hive-0.13.0-bin/conf
  • vi conf/hive-site.xml
<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
        </property>
         <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
        </property>
        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>root</value>
        </property>
        <property>
                  <name>hive.metastore.schema.verification</name>
                  <value>false</value>
                  <description>
                  </description>
        </property>

</configuration>
  • vi ~/.bashrc
### 底部加入以下配置
### java
export JAVA_HOME=/usr/local/src/jdk1.6.0_45
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
# export PATH=$PATH:$JAVA_HOME/bin

### hadoop
export HADOOP_HOME=/usr/local/src/hadoop-1.2.1


### Hbase
export HBASE_HOME=/usr/local/src/hbase-0.98.0-hadoop1
export HBASE_CLASSPATH=$HBASE_HOME/conf
export HBASE_LOG_DIR=$HBASE_HOME/logs


### hive conf
export HIVE_HOME=/usr/local/src/apache-hive-0.13.0-bin
# export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin:$PATH

export PATH=$PATH:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$HADOOP_HOME/bin
  • source ~/.bashrc
  • 将mysql driver 放入lib目录
cp mysql-connector-java-5.1.41-bin.jar /usr/local/src/apache-hive-0.13.0-bin/lib/
  • 测试hive
hive
  • 创建表
create EXTERNAL TABLE w_a
(
    usrid STRING,
    age STRING,
    sex STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LINES TERMINATED BY '\n';

create EXTERNAL TABLE w_b
(
    usrid STRING,
    active STRING,
    time STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LINES TERMINATED BY '\n'

  • 查看表
show tables;
  • 查看表结构
desc w_a;
  • 导入数据
    • a.txt
user1   27  1
user2   28  1
user3   29  0
...
  • b.txt
user1   100 20170301
user3   101 20170302
user4   102 20170303
...
  • 导入命令
hive -e "LOAD DATA LOCAL INPATH './a.txt' OVERWRITE INTO TABLE w_a"
hive -e "LOAD DATA LOCAL INPATH './b.txt' OVERWRITE INTO TABLE w_b"
  • 查看导入数据
select * from w_a;
  • 变链接查询
select a.*, b.* from w_a a join w_b b on a.usrid=b.usrid;
hive> select a.*, b.* from w_a a join w_b b on a.usrid=b.usrid;
Total jobs = 1
Execution log at: /tmp/root/root_20180311051515_f78a507f-805c-4360-8ff4-976cb05a798f.log
2018-03-11 05:15:19 Starting to launch local task to process map join;  maximum memory = 1013645312
2018-03-11 05:15:20 Dump the side-table into file: file:/tmp/root/hive_2018-03-11_05-15-15_970_401222296198383944-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable
2018-03-11 05:15:20 Uploaded 1 File to: file:/tmp/root/hive_2018-03-11_05-15-15_970_401222296198383944-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable (543 bytes)
2018-03-11 05:15:20 End of local task; Time Taken: 0.674 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201803110434_0001, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201803110434_0001
Kill Command = /usr/local/src/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201803110434_0001
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2018-03-11 05:15:27,738 Stage-3 map = 0%,  reduce = 0%
2018-03-11 05:15:29,755 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.73 sec
2018-03-11 05:15:32,768 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 0.73 sec
MapReduce Total cumulative CPU time: 730 msec
Ended Job = job_201803110434_0001
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 0.73 sec   HDFS Read: 324 HDFS Write: 180 SUCCESS
Total MapReduce CPU Time Spent: 730 msec
OK
user1   27  1   user1   100 20170301
user3   29  0   user3   101 20170302
user4   30  1   user4   102 20170303
user5   31  0   user5   103 20170304
user7   33  1   user7   104 20170305
user8   34  0   user8   105 20170306
Time taken: 16.83 seconds, Fetched: 6 row(s)
  • 从HDFS中导入
LOAD DATA INPATH '/user_name.data.utf.txt' OVERWRITE INTO TABLE u_info
  • overwrite表示加载的数据会覆盖原来的内容
  • 对比本地的方式:LOAD DATA LOCAL INPATH

  • 利用insert命令导入数据
    • 执行命令(例子):
    insert into table table1 select usrid, age from w_alimit 3;
    
    • 也可以支持动态分区插入:
    insert into table test1 partition(c) select * from test2;
    
  • 直接通过查询插入
    create table test2 as select * from test1;
    
  • 数据导出(导出为本地文件)

insert overwrite local directory '/home/developer/hive_test/1.txt' select usrid, sex from w_a;
  • 数据导出(导出为HDFS文件)

    insert overwrite directory '/hive_output' select * from w_b;
    

Partition

  • partition是Hive提供的一种机制:用户通过指定一个或多个partition key,决定数据存放方式,进而优化数据的查询,一个表可以指定多个partition key,每个partition在hive中以文件夹的形式存在。

  • 创建表

create TABLE p_t
(
    usrid STRING,
    age STRING
)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n';

  • p1.txt
user2   28  20170302
user4   30  20170302
user6   32  20170302
user8   34  20170302
  • 导入数据
LOAD DATA LOCAL INPATH  '/root/hive_test/p1.txt' OVERWRITE INTO TABLE p_t partition(dt='20180311');
## 去掉LOCAL 就是从hdfs导入文件
hive> LOAD DATA INPATH  '/root/hive_test/p1.txt' OVERWRITE INTO TABLE p_t partition(dt='20180311');
FAILED: SemanticException Line 1:18 Invalid path ''/root/hive_test/p1.txt'': No files matching path hdfs://192.168.57.110:9000/root/hive_test/p1.txt
hive> LOAD DATA LOCAL INPATH  '/root/hive_test/p1.txt' OVERWRITE INTO TABLE p_t partition(dt='20180311');
Copying data from file:/root/hive_test/p1.txt
Copying file: file:/root/hive_test/p1.txt
Loading data to table default.p_t partition (dt=20180311)
Partition default.p_t{dt=20180311} stats: [numFiles=1, numRows=0, totalSize=72, rawDataSize=0]
OK
Time taken: 0.721 seconds
  • 查询
select * from p_t where dt='20180311';

Transform

  • transform功能部分可以用UDF替代,但是如果拼接的字段是根据上一次查询的结果时,UDF就不能用,UDF只能用在本行操作
  • transform功能缺点是效率底了点
  • vi transform.awk
{
    print $1"_"$2
}
hive> add file ./transform.awk
hive>  select transform(usrid,age) using "awk -f transform.awk" as (uuu) from w_a;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201803110434_0002, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201803110434_0002
Kill Command = /usr/local/src/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201803110434_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-03-11 05:38:15,036 Stage-1 map = 0%,  reduce = 0%
2018-03-11 05:38:18,051 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.69 sec
2018-03-11 05:38:20,060 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.69 sec
MapReduce Total cumulative CPU time: 690 msec
Ended Job = job_201803110434_0002
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 0.69 sec   HDFS Read: 298 HDFS Write: 72 SUCCESS
Total MapReduce CPU Time Spent: 690 msec
OK
user1_27
user2_28
user3_29
user4_30
user5_31
user6_32
user7_33
user8_34
Time taken: 10.161 seconds, Fetched: 8 row(s)

Hive整合hbase

  • 创建Hbase表:
    • create ‘classes’,’user’
  • 加入数据:
put 'classes','001','user:name','jack'
put 'classes','001','user:age','20'
put 'classes','002','user:name','liza'
put 'classes','002','user:age','18'
  • 创建Hive表并验证:
    • create external table classes(id int, name string, age int)
    • STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
    • WITH SERDEPROPERTIES (“hbase.columns.mapping” = “:key,user:name,user:age”)
    • TBLPROPERTIES(“hbase.table.name” = “classes”);
  • 再添加数据到Hbase:
    • put ‘classes’,’003′,’user:age’,’1820183291839132′

未经允许不得转载:奇葩菌博客 » Hive实战操作

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址