hive-sql语句对in和notin的替换
对于hive-sql⾥的⼦查询不⽀持not in或in ,⽬前测试,应该是⼀个hive语句⾥只能⽀持⼀个not in 或in语句,多了不⽀持,对not in的替换⽤ left join id(关联字段)is null ,in的替换⽤left join id is not null替换,或者⽤left semi join(更优化)
not in 的替换⽰例
--原sql
select id from a
where id not in (select id from b);
--hive-sql替换⽰例
select id from a
left  join b  on b.id=a.id
where b.id is null;
in 替换⽰例
--原sql
select id from a
where id in (select id from b);
--hive-sql替换⽰例
select id from a
left  join b  on b.id=a.id
where b.id is not null;
--或
select id from a
left  semi join b  on a.id=b.id
hive杂记
hivesql⾥的⽇期函数不⽀持’%Y-%m-%d’这样的表达,虽然不会报错,但是不会起作⽤
⽀持’2019-06-16’这样的表达
顺便记下我⾃⼰做的⼀个⽐较复杂的案例
原sql
SELECT
count(DISTINCT t.passport_user_id) 新增⽤户数,
count( t.passport_user_id) 新增⽤户交易笔数,
ade_amount) 新增⽤户交易⾦额
FROM t_trade t
sql语句替换表中内容INNER JOIN t_user u ON u.passport_user_id = t.passport_user_id
WHERE t.id IN (
SELECT min(d.id) id
FROM t_trade d
INNER JOIN t_bank b ON d.bank_code = b.bank_code
hod_type = 1
and DATE_ate_time,'%Y-%m-%d')= '2019-05-23'
ade_type = 3
AND d.bank_trade_status = 1
AND d.passport_user_id NOT IN (
SELECT d1.passport_user_id
FROM t_trade d1
INNER JOIN t_bank b ON d1.bank_code = b.bank_code
hod_type = 1
AND d1.income_begin_date < d.income_begin_date  ---这是难点,需要⽤到原sql⾥的字段判断,因此我在替换后的sql⾥只好整个搬过去,希望以后能到更好的解决办法
ade_type = 3
AND d1.bank_trade_status = 1
)
GROUP BY d.passport_user_id
)
替换后sql
left semi join(看左表的字段在右表⾥存不存在,有的话就返回左表的字段,注:只返回左表的字段,右表的字段不会返回,且右表字段不能出现在条件⾥)
SELECT
'⾦超' channle,
count(DISTINCT t.passport_user_id) new_user,
count( t.passport_user_id) new_count,
ade_amount) new_amount
FROM odsdb_bankconsignment_prod.t_trade t
INNER JOIN odsdb_bankconsignment_prod.t_user u ON u.passport_user_id = t.passport_user_id
left SEMI join (
SELECT min(d.id) id
FROM odsdb_bankconsignment_prod.t_trade d
INNER JOIN odsdb_bankconsignment_prod.t_bank b ON d.bank_code = b.bank_code
left OUTER join
(
SELECT d1.passport_user_id
FROM odsdb_bankconsignment_prod.t_trade d1
INNER JOIN odsdb_bankconsignment_prod.t_bank b ON d1.bank_code = b.bank_code
hod_type = 1
AND DATE_ate_time,'yyyy-MM-dd') < '2019-06-18'
ade_type = 3
AND d1.bank_trade_status = 1
)f
on d.passport_user_id=f.passport_user_id
hod_type = 1
and DATE_ate_time,'yyyy-MM-dd')= '2019-06-18'
ade_type = 3
AND d.bank_trade_status = 1
---and f.income_begin_date<d.income_begin_date
AND f.passport_user_id is null
GROUP BY d.passport_user_id
)mt on mt.id=t.id
where t.passport_user_id not IN
(SELECT passport_id
from bi_tag_factory.tag_white_list_detail where white_list_id=139)

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。