最新消息: USBMI致力于为网友们分享Windows、安卓、IOS等主流手机系统相关的资讯以及评测、同时提供相关教程、应用、软件下载等服务。

Hive练习

IT圈 admin 2浏览 0评论

Hive练习

题目1、

该题目的两个核心思路:

1、自连接

2、倒推法

现有这么一批数据,现要求出:
每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数三个字段的意思:
用户名,月份,访问次数A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,11最后结果展示:用户	月份		最大访问次数	总访问次数		当月访问次数
A	2015-01		33			33		  33
A	2015-02		33			43		  10
A	2015-03		38			81		  38
B	2015-01		30			30		  30
B	2015-02		30			45                15
B	2015-03		44			89		  44
第一步

select name,mon,sum(num) from hive1 
group by name,mon order by name,mon;当月访问次数:
A       2015-01 33
A       2015-02 10
A       2015-03 38
B       2015-01 30
B       2015-02 15
B       2015-03 44
第二步:
select name,mon,sum(num) 
from hive1 b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon;总访问次数:
name    mon     _c2
A       2015-01 33
A       2015-02 43
A       2015-03 81
B       2015-01 30
B       2015-02 45
B       2015-03 89
第三步:
select name,mon,max(b.s) 
from 
(select name,mon,sum(num) s from hive1 group by name,mon order by name,mon)b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon;最大访问次数:
name    mon     _c2
A       2015-01 33
A       2015-02 33
A       2015-03 38
B       2015-01 30
B       2015-02 30
B       2015-03 44
第四步:
select aa.name,aa.mon,cc.ss,aa.tt,bb.mm
from 
(select name,mon,sum(num) tt 
from hive1 b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon)aa 
join 
(select name,mon,sum(num) mm from hive1 group by name,mon order by name,mon)bb
on aa.name=bb.name and aa.mon=bb.mon 
join 
(select name,mon,max(b.s) ss 
from 
(select name,mon,sum(num) s from hive1 group by name,mon order by name,mon)b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon)cc
on aa.name=cc.name and aa.mon=cc.mon;连接三张表:
aa.name aa.mon  cc.ss   aa.tt   bb.mm
A       2015-01 33      33      33
A       2015-02 33      43      10
A       2015-03 38      81      38
B       2015-01 30      30      30
B       2015-02 30      45      15
B       2015-03 44      89      44

题目2、

 核心思路: 行 转 列

1、// 建表语句:
CREATE TABLE `course2` (`id` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,`sid` int(11) DEFAULT NULL,`course` varchar(255) DEFAULT NULL,`score` int(11) DEFAULT NULL 
) ENGINE=InnoDB DEFAULT CHARSET=utf8;// 插入数据
// 字段解释:id, 学号, 课程, 成绩
INSERT INTO `course2` VALUES (1, 1, 'yuwen', 43);
INSERT INTO `course2` VALUES (2, 1, 'shuxue', 55);
INSERT INTO `course2` VALUES (3, 2, 'yuwen', 77);
INSERT INTO `course2` VALUES (4, 2, 'shuxue', 88);
INSERT INTO `course2` VALUES (5, 3, 'yuwen', 98);
INSERT INTO `course2` VALUES (6, 3, 'shuxue', 65);求:所有数学课程成绩 大于 语文课程成绩的学生的学号
select * from course2;
1	1	yuwen	43
2	1	shuxue	55
3	2	yuwen	77
4	2	shuxue	88
5	3	yuwen	98
6	3	shuxue	65
第一步:select sid,
case course when "yuwen" then score else 0 end as yuwen, 
case course when "shuxue" then score else 0 end as shuxue,
case course when "yingyu" then score else 0 end as yingyu 
from course2;1	43	0	0
1	0	55	0
2	77	0	0
2	0	88	0
3	98	0	0
3	0	65	0
第二步:
select sid,
sum(case course when "yuwen" then score else 0 end) as yuwen, 
sum(case course when "shuxue" then score else 0 end) as shuxue,
sum(case course when "yingyu" then score else 0 end) as yingyu 
from course2 group by sid;1	43	55	0
2	77	88	0
3	98	65	0
第三步:
select sid from (
select sid,
sum(case course when "yuwen" then score else 0 end) as yuwen, 
sum(case course when "shuxue" then score else 0 end) as shuxue,
sum(case course when "yingyu" then score else 0 end) as yingyu 
from course2 group by sid
) b where b.shuxue > b.yuwen;1
2

题目3、

思路:分组,跟字符串切分

2014010114
2014010216
2014010317
2014010410
2014010506
2012010609
2012010732
2012010812
2012010919
2012011023
2001010116
2001010212
2001010310
2001010411
2001010529
2013010619
2013010722
2013010812
2013010929
2013011023
2008010105
2008010216
2008010337
2008010414
2008010516
2007010619
2007010712
2007010812
2007010999
2007011023
2010010114
2010010216
2010010317
2010010410
2010010506
2015010649
2015010722
2015010812
2015010999
2015011023
比如:2010012325表示在2010年01月23日的气温为25度。现在要求使用hive,计算每一年出现过的最大气温的日期+温度。
要计算出每一年的最大气温。我用
select substr(data,1,4),max(substr(data,9,2)) from table2 group by substr(data,1,4);
出来的是  年份  +  温度  这两列数据例如 2015     99但是如果我是想select 的是:具体每一年最大气温的那一天  + 温度 。例如  20150109     99
请问该怎么执行hive语句。。
group by 只需要substr(data,1,4),
但是select  substr(data,1,8),又不在group by 的范围内。  
是我陷入了思维死角。一直想不出所以然。。求大神指点一下。	
在select 如果所需要的。不在group by的条件里。这种情况如何去分析?
解题:
select substr(data,1,4) y,max(substr(data,9,2)) m from hive2 group by 
substr(data,1,4);
2001    29
2007    99
2008    37
2010    17
2012    32
2013    29
2014    17
2015    99
select substr(data,1,4) y,substr(data,5,4) d,max(substr(data,9,2)) m from 
hive2 group by substr(data,1,4),substr(data,5,4);
2001    0101    16
2001    0102    12
2001    0103    10
2001    0104    11
2001    0105    29
2007    0106    19
2007    0107    12
2007    0108    12
2007    0109    99
2007    0110    23
2008    0101    05
2008    0102    16
2008    0103    37
2008    0104    14
2008    0105    16
2010    0101    14
2010    0102    16
2010    0103    17
2010    0104    10
2010    0105    06
2012    0106    09
2012    0107    32
2012    0108    12
2012    0109    19
2012    0110    23
2013    0106    19
2013    0107    22
2013    0108    12
2013    0109    29
2013    0110    23
2014    0101    14
2014    0102    16
2014    0103    17
2014    0104    10
2014    0105    06
2015    0106    49
2015    0107    22
2015    0108    12
2015    0109    99
2015    0110    23
两个连接:
select concat(a.y,b.d),b.m from 
(select substr(data,1,4) y,max(substr(data,9,2)) m from hive2 group by 
substr(data,1,4)) a
join 
(select substr(data,1,4) y,substr(data,5,4) d,max(substr(data,9,2)) m from 
hive2 group by substr(data,1,4),substr(data,5,4)) b 
on a.y=b.y and a.m=b.m;
20010105        29
20070109        99
20080103        37
20100103        17
20120107        32
20130109        29
20140103        17
20150109        99

题目四:

现有一份以下格式的数据:
表示有id为1,2,3的学生选修了课程a,b,c,d,e,f中其中几门:id course 
1,a 
1,b 
1,c 
1,e 
2,a 
2,c 
2,d 
2,f 
3,a 
3,b 
3,c 
3,e编写Hive的HQL语句来实现以下结果:
表中的1表示选修,表中的0表示未选修id	a	b	c	d	e	f
1	1	1	1	0	1	0
2	1	0	1	1	0	1
3	1	1	1	0	1	0

第一步:

select id,concat_ws(',',collect_list(course)) from hive4 group by id;1       a ,b ,c ,e 
2       a ,c ,d ,f 
3       a ,b ,c ,e

第二步:

select a.id,
if(a.c like '%a%',1,0)a,
if(a.c like '%b%',1,0)b,
if(a.c like '%c%',1,0)c,
if(a.c like '%d%',1,0)d,
if(a.c like '%e%',1,0)e,
if(a.c like '%f%',1,0)f 
from 
(select id,concat_ws(',',collect_list(course))c from hive4 group by id)a;1       1       1       1       0       1       0
2       1       0       1       1       0       1
3       1       1       1       0       1       0


题目五:

现有如下格式的一份数据:店铺,月份,金额
shop,mon,money
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250需求:编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额

每个店铺的当月销售额:

select shop,mon,sum(money) from hive3 group by shop,mon;a       01      350
a       02      5000
a       03      600
b       01      7800
b       02      2500
c       01      470
c       02      630

累计到当月的总销售额:

select a.shop, a.mon,sum(b.money) from hive3 b, 
(select shop,mon,sum(money) from hive3 group by shop,mon) a 
where a.shop=b.shop and a.mon >= b.mon group by a.shop,a.mon;a       01      350
a       02      5350
a       03      5950
b       01      7800
b       02      10300
c       01      470
c       02      1100

连接在一起:

select aa.*,bb.ss from 
(select shop,mon,sum(money) from hive3 group by shop,mon)aa,
(select a.shop, a.mon,sum(b.money)ss from hive3 b, 
(select shop,mon,sum(money) from hive3 group by shop,mon) a 
where a.shop=b.shop and a.mon >= b.mon group by a.shop,a.mon)bb 
where aa.shop=bb.shop and aa.mon=bb.mon;a       01      350     350
a       02      5000    5350
a       03      600     5950
b       01      7800    7800
b       02      2500    10300
c       01      470     470
c       02      630     1100


Hive练习

题目1、

该题目的两个核心思路:

1、自连接

2、倒推法

现有这么一批数据,现要求出:
每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数三个字段的意思:
用户名,月份,访问次数A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,11最后结果展示:用户	月份		最大访问次数	总访问次数		当月访问次数
A	2015-01		33			33		  33
A	2015-02		33			43		  10
A	2015-03		38			81		  38
B	2015-01		30			30		  30
B	2015-02		30			45                15
B	2015-03		44			89		  44
第一步

select name,mon,sum(num) from hive1 
group by name,mon order by name,mon;当月访问次数:
A       2015-01 33
A       2015-02 10
A       2015-03 38
B       2015-01 30
B       2015-02 15
B       2015-03 44
第二步:
select name,mon,sum(num) 
from hive1 b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon;总访问次数:
name    mon     _c2
A       2015-01 33
A       2015-02 43
A       2015-03 81
B       2015-01 30
B       2015-02 45
B       2015-03 89
第三步:
select name,mon,max(b.s) 
from 
(select name,mon,sum(num) s from hive1 group by name,mon order by name,mon)b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon;最大访问次数:
name    mon     _c2
A       2015-01 33
A       2015-02 33
A       2015-03 38
B       2015-01 30
B       2015-02 30
B       2015-03 44
第四步:
select aa.name,aa.mon,cc.ss,aa.tt,bb.mm
from 
(select name,mon,sum(num) tt 
from hive1 b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon)aa 
join 
(select name,mon,sum(num) mm from hive1 group by name,mon order by name,mon)bb
on aa.name=bb.name and aa.mon=bb.mon 
join 
(select name,mon,max(b.s) ss 
from 
(select name,mon,sum(num) s from hive1 group by name,mon order by name,mon)b,
(select name,mon,sum(num) from hive1 group by name,mon order by name,mon)a 
where b.mon <= a.mon and a.name=b.name 
group by a.name,a.mon)cc
on aa.name=cc.name and aa.mon=cc.mon;连接三张表:
aa.name aa.mon  cc.ss   aa.tt   bb.mm
A       2015-01 33      33      33
A       2015-02 33      43      10
A       2015-03 38      81      38
B       2015-01 30      30      30
B       2015-02 30      45      15
B       2015-03 44      89      44

题目2、

 核心思路: 行 转 列

1、// 建表语句:
CREATE TABLE `course2` (`id` int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,`sid` int(11) DEFAULT NULL,`course` varchar(255) DEFAULT NULL,`score` int(11) DEFAULT NULL 
) ENGINE=InnoDB DEFAULT CHARSET=utf8;// 插入数据
// 字段解释:id, 学号, 课程, 成绩
INSERT INTO `course2` VALUES (1, 1, 'yuwen', 43);
INSERT INTO `course2` VALUES (2, 1, 'shuxue', 55);
INSERT INTO `course2` VALUES (3, 2, 'yuwen', 77);
INSERT INTO `course2` VALUES (4, 2, 'shuxue', 88);
INSERT INTO `course2` VALUES (5, 3, 'yuwen', 98);
INSERT INTO `course2` VALUES (6, 3, 'shuxue', 65);求:所有数学课程成绩 大于 语文课程成绩的学生的学号
select * from course2;
1	1	yuwen	43
2	1	shuxue	55
3	2	yuwen	77
4	2	shuxue	88
5	3	yuwen	98
6	3	shuxue	65
第一步:select sid,
case course when "yuwen" then score else 0 end as yuwen, 
case course when "shuxue" then score else 0 end as shuxue,
case course when "yingyu" then score else 0 end as yingyu 
from course2;1	43	0	0
1	0	55	0
2	77	0	0
2	0	88	0
3	98	0	0
3	0	65	0
第二步:
select sid,
sum(case course when "yuwen" then score else 0 end) as yuwen, 
sum(case course when "shuxue" then score else 0 end) as shuxue,
sum(case course when "yingyu" then score else 0 end) as yingyu 
from course2 group by sid;1	43	55	0
2	77	88	0
3	98	65	0
第三步:
select sid from (
select sid,
sum(case course when "yuwen" then score else 0 end) as yuwen, 
sum(case course when "shuxue" then score else 0 end) as shuxue,
sum(case course when "yingyu" then score else 0 end) as yingyu 
from course2 group by sid
) b where b.shuxue > b.yuwen;1
2

题目3、

思路:分组,跟字符串切分

2014010114
2014010216
2014010317
2014010410
2014010506
2012010609
2012010732
2012010812
2012010919
2012011023
2001010116
2001010212
2001010310
2001010411
2001010529
2013010619
2013010722
2013010812
2013010929
2013011023
2008010105
2008010216
2008010337
2008010414
2008010516
2007010619
2007010712
2007010812
2007010999
2007011023
2010010114
2010010216
2010010317
2010010410
2010010506
2015010649
2015010722
2015010812
2015010999
2015011023
比如:2010012325表示在2010年01月23日的气温为25度。现在要求使用hive,计算每一年出现过的最大气温的日期+温度。
要计算出每一年的最大气温。我用
select substr(data,1,4),max(substr(data,9,2)) from table2 group by substr(data,1,4);
出来的是  年份  +  温度  这两列数据例如 2015     99但是如果我是想select 的是:具体每一年最大气温的那一天  + 温度 。例如  20150109     99
请问该怎么执行hive语句。。
group by 只需要substr(data,1,4),
但是select  substr(data,1,8),又不在group by 的范围内。  
是我陷入了思维死角。一直想不出所以然。。求大神指点一下。	
在select 如果所需要的。不在group by的条件里。这种情况如何去分析?
解题:
select substr(data,1,4) y,max(substr(data,9,2)) m from hive2 group by 
substr(data,1,4);
2001    29
2007    99
2008    37
2010    17
2012    32
2013    29
2014    17
2015    99
select substr(data,1,4) y,substr(data,5,4) d,max(substr(data,9,2)) m from 
hive2 group by substr(data,1,4),substr(data,5,4);
2001    0101    16
2001    0102    12
2001    0103    10
2001    0104    11
2001    0105    29
2007    0106    19
2007    0107    12
2007    0108    12
2007    0109    99
2007    0110    23
2008    0101    05
2008    0102    16
2008    0103    37
2008    0104    14
2008    0105    16
2010    0101    14
2010    0102    16
2010    0103    17
2010    0104    10
2010    0105    06
2012    0106    09
2012    0107    32
2012    0108    12
2012    0109    19
2012    0110    23
2013    0106    19
2013    0107    22
2013    0108    12
2013    0109    29
2013    0110    23
2014    0101    14
2014    0102    16
2014    0103    17
2014    0104    10
2014    0105    06
2015    0106    49
2015    0107    22
2015    0108    12
2015    0109    99
2015    0110    23
两个连接:
select concat(a.y,b.d),b.m from 
(select substr(data,1,4) y,max(substr(data,9,2)) m from hive2 group by 
substr(data,1,4)) a
join 
(select substr(data,1,4) y,substr(data,5,4) d,max(substr(data,9,2)) m from 
hive2 group by substr(data,1,4),substr(data,5,4)) b 
on a.y=b.y and a.m=b.m;
20010105        29
20070109        99
20080103        37
20100103        17
20120107        32
20130109        29
20140103        17
20150109        99

题目四:

现有一份以下格式的数据:
表示有id为1,2,3的学生选修了课程a,b,c,d,e,f中其中几门:id course 
1,a 
1,b 
1,c 
1,e 
2,a 
2,c 
2,d 
2,f 
3,a 
3,b 
3,c 
3,e编写Hive的HQL语句来实现以下结果:
表中的1表示选修,表中的0表示未选修id	a	b	c	d	e	f
1	1	1	1	0	1	0
2	1	0	1	1	0	1
3	1	1	1	0	1	0

第一步:

select id,concat_ws(',',collect_list(course)) from hive4 group by id;1       a ,b ,c ,e 
2       a ,c ,d ,f 
3       a ,b ,c ,e

第二步:

select a.id,
if(a.c like '%a%',1,0)a,
if(a.c like '%b%',1,0)b,
if(a.c like '%c%',1,0)c,
if(a.c like '%d%',1,0)d,
if(a.c like '%e%',1,0)e,
if(a.c like '%f%',1,0)f 
from 
(select id,concat_ws(',',collect_list(course))c from hive4 group by id)a;1       1       1       1       0       1       0
2       1       0       1       1       0       1
3       1       1       1       0       1       0


题目五:

现有如下格式的一份数据:店铺,月份,金额
shop,mon,money
a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250需求:编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额

每个店铺的当月销售额:

select shop,mon,sum(money) from hive3 group by shop,mon;a       01      350
a       02      5000
a       03      600
b       01      7800
b       02      2500
c       01      470
c       02      630

累计到当月的总销售额:

select a.shop, a.mon,sum(b.money) from hive3 b, 
(select shop,mon,sum(money) from hive3 group by shop,mon) a 
where a.shop=b.shop and a.mon >= b.mon group by a.shop,a.mon;a       01      350
a       02      5350
a       03      5950
b       01      7800
b       02      10300
c       01      470
c       02      1100

连接在一起:

select aa.*,bb.ss from 
(select shop,mon,sum(money) from hive3 group by shop,mon)aa,
(select a.shop, a.mon,sum(b.money)ss from hive3 b, 
(select shop,mon,sum(money) from hive3 group by shop,mon) a 
where a.shop=b.shop and a.mon >= b.mon group by a.shop,a.mon)bb 
where aa.shop=bb.shop and aa.mon=bb.mon;a       01      350     350
a       02      5000    5350
a       03      600     5950
b       01      7800    7800
b       02      2500    10300
c       01      470     470
c       02      630     1100


与本文相关的文章

发布评论

评论列表 (0)

  1. 暂无评论