关于sparkSQL⼦查询的相关问题sparkSQL⽀持的⼦查询:
spark.sql("select u1 from (select u1,u2,u3 from tableTemp)").show()
spark.sql("select u1 from tableTemp1 where u1 > (select max(pre) from tableTemp2)").show()
sparkSQL不⽀持的⼦查询:
1. exists和in⼦查询(新版本已⽀持)
⽐如:
Select * From feeData Where FeeID Not in (Select ID From FeeInfo)
FeeInfo 表最好有索引
优化语句:Select * From feeData as d Left Join FeeInfo f on d.FeeID =f.ID Where
f.Id is null
原因参考
in⼦查询不⽀持,in后直接跟值做查询条件还是⽀持的:
spark.sql("select * from tableTemp where pre_prob in (0.9,1,2)").show()
但还应注意in与括号之间的距离,因为sparkSQL内置了in(expr2,expr3,…)函数,返回的是布尔值:
> SELECT 1 in(1, 2, 3);
true
> SELECT 1 in(2, 3, 4);
false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3));
false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3));
true
2. ⼦查询作为字段
⽐如:
val result = spark.sql("select
ssdggs,
ssdggsid,
ssdsmc,
ssds,
count(*) sl,
round(count(*) / (select count(*) from dyycjl dy sbm=sb.ssgdgsid) * 100 , 2) zb
from sbxx sb group by sb.ssdggs,sb.ssdggsid,sb.ssdsmc,sb.ssds")
改为:
val tempTable1 = spark.sql("select
ssdggs,
ssdggsid,
ssdsmc,
ssds,
count(*) sl
from sbxx sb group by sb.ssdggs,sb.ssdggsid,sb.ssdsmc,sb.ssds")
val tempTable2 = spark.sql("select xgsbm,count(*) num from dyycjl group by xgsbm")
sql中select是什么意思
spark.sql("select t1.*,round(t1.sl / num *100 , 2) zb from tempTable1 t1 left join tempTable2 t2 sbm=t2.ssgdgsid")

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。