java获取hive表所有字段,HiveSql从表中动态获取空列计数
我正在使⽤datastax spark集成和spark SQL thrift服务器,它为我提供了⼀个Hive SQL接⼝来查询Cassandra中的表.
我的数据库中的表是动态创建的,我想要做的是仅根据表名在表的每列中获取空值的计数.
我可以使⽤describe database.table获取列名,但在hive SQL中,如何在另⼀个为所有列计数null的select查询中使⽤其输出.
更新1:使⽤Dudu的解决⽅案回溯
Error running query: TExecuteStatementResp(status=TStatus(errorCode=0,
errorMessage=”org.apache.spark.sql.AnalysisException: Invalid usage of
‘*’ in explode/json_tuple/UDTF;”, sqlState=None,
infoMessages=[“org.apache.hive.service.cli.HiveSQLException:org.apache.spark.sql.AnalysisException:
Invalid usage of ‘‘ in explode/json_tuple/UDTF;:16:15″,
‘org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:org$apache$spark$sql$hive$thriftserver$SparkExec
‘org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:runInternal:SparkExecuteStatementOperation.scala:1
‘org.apache.hive.service.cli.operation.Operation:run:Operation.java:257′,
‘org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:388’,
‘org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:369’,
‘org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:262’,
‘org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:437’,
‘org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313’,
‘org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298’,
‘org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39’,
‘org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39’,
‘org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56’,
‘org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286’,
‘urrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142’,
thrift‘urrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617’,
‘java.lang.Thread:run:Thread.java:745’], statusCode=3),
operationHandle=None)
在以下解决⽅案中,不需要单独处理每个列.
结果是列索引和该列中的空值数.
您可以稍后通过列索引将其连接到从Metastore检索到的信息.
⼀个限制是包含确切⽂本null的字符串将被计为空值.
演⽰
CTE(由mytable定义的mytable)显然可以替换为实际表
with mytable as
(
select stack
(
5
,1 ,1.2 ,date '2017-06-21' ,null
,2 ,2.3 ,null ,null
,
3 ,null ,null ,'hello'
,4 ,4.5 ,null ,'world'
,5 ,null ,date '2017-07-22' ,null
) as (id,amt,dt,txt)
)
select pe.pos as col_index
,count(case when pe.val='null' then 1 end) as nulls_count
from mytable t lateral view posexplode (split(printf(concat('%s',repeat('\u0001%s',field(unhex(1),t.*,unhex(1))-2)),t.*),'\\x01')) pe
group by pe.pos
;
+-----------+-------------+
| col_index | nulls_count |
+-----------+-------------+
| 0 | 0 |
| 1 | 2 |
| 2 | 3 |
| 3 | 3 |
+-----------+-------------+

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。