使⽤Impalaparser解析SQL
Impala对于hive引擎的语法和hive原⽣的有些许不同,虽然使⽤hive的parser也能部分兼容,但是由于impala的parser是使⽤cup和flex开发的,所以对impala的query进⾏语法解析的时候建议还是使⽤Impala原⽣的parser
1.在安装了impala的机器下到impala-frontend的jar包(环境中的impala版本为
2.12.0+cdh5.15.1+0)
lintong@master:/opt/cloudera/parcels/CDH/jars$ ls | grep impala-frontend
impala-frontend-0.1-SNAPSHOT.jar
2.使⽤mvn install安装到本地仓库中,或者上传到私服仓库中
mvn install:install-file -Dfile=/home/lintong/下载/impala-frontend-0.1-SNAPSHOT.jar -DgroupId=org.apache.impala -DartifactId=impala-frontend -Dversion=0.1-SNAPSHOT -Dpackaging=jar 3.在⼯程中引⼊impala-frontend和java-cup,java-cup的版本可以使⽤反编译⼯具打开impala-frontend的jar进⾏确认
<dependency>
<groupId>org.apache.impala</groupId>
<artifactId>impala-frontend</artifactId>
<version>0.1-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>dev</groupId>
<artifactId>java-cup</artifactId>
<version>0.11-a-czt02-cdh</version>
</dependency>
在解析select语句的时候如果报
java.lang.NoClassDefFoundError: org/apache/sentry/core/model/db/DBModelAction
at org.apache.impala.analysis.TableRef.<init>(TableRef.java:138)
at org.apache.impala.analysis.CUP$SqlParser$actions.case421(SqlParser.java:18035)
at org.apache.impala.analysis.CUP$SqlParser$actions.CUP$SqlParser$do_action(SqlParser.java:5976)
at org.apache.impala.analysis.SqlParser.do_action(SqlParser.java:1349)
at java_cup.runtime.lr_parser.parse(lr_parser.java:587)
x.core.LineageInfo(XXXXTest.java:41)
flect.NativeMethodAccessorImpl.invoke0(Native Method)
flect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
flect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at flect.Method.invoke(Method.java:498)
at org.del.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.del.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.del.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
at junit.JUnitStarter.main(JUnitStarter.java:53)
Caused by: java.lang.ClassNotFoundException: org.odel.db.DBModelAction
at java.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 28 more
Process finished with exit code 255
在pom中添加
<dependency>
<groupId>org.apache.sentry</groupId>
<artifactId>sentry-core-model-db</artifactId>
<version>1.5.1-cdh5.15.1</version>
</dependency>
4.参考Impala的源代码中parser的demo
github/cloudera/Impala/blob/master/fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
解析select和create kudu table等语句
import org.apache.impala.analysis.*;
import java.io.StringReader;
String impalaSelectQuery = "SELECT `ds` FROM `db1`.`table1` WHERE (`ds`='test') OR (`ds`='2020-08-02') OR (`ds`='2020-08-01') LIMIT 100"; // select语句
String hiveSelectQuery = "select city,array_contains(city, 'Paris') from default.arraydemo  limit 5";
String kuduCreateTableQuery = "CREATE TABLE `db1`.`my_first_table`\n" +
"(\n" +
"  id BIGINT,\n" +
"  name STRING,\n" +
"  PRIMARY KEY(id)\n" +
")\n" +
"PARTITION BY HASH PARTITIONS 16\n" +
"STORED AS KUDU\n" +
"TBLPROPERTIES (\n" +
"  'kudu.master_addresses' = 'hadoop01:7051,hadoop02:7051,hadoop03:7051', \n" +
"  'kudu.table_name' = 'my_first_table'\n" +
");"; // kudu建表语句
String invalidQuery = "INVALIDATE METADATA db1.tb1"; // 刷新元数据语句
String refreshQuery = "REFRESH db1.tb1 partition(ds='2021-05-02')"; // 刷新元数据语句
String computeQuery = "COMPUTE INCREMENTAL STATS db1.tb1"; // compute stats语句
String describeQuery = "Describe db1.tb1;"; // describe语句
String renameQuery = "ALTER TABLE my_db.customers RENAME TO my_db.users;"; // rename语句
String addColQuery = "ALTER TABLE db1.tb1 ADD COLUMNS (col1 string)"; // add col语句
String alterColQuery = "ALTER TABLE db1.tb1 CHANGE col1 col2 bigint"; // alter col语句
String setQuery = "set mem_limit = 5gb";
String useQuery = "use default";
String query = impalaSelectQuery;
SqlScanner input = new SqlScanner(new StringReader(query));
SqlParser parser = new SqlParser(input);
ParseNode node = null;
try {
node = (ParseNode) parser.parse().value;
if (node instanceof SelectStmt) {
System.out.println("查询语句"); // with语句也属于查询语句
SelectStmt selectStmt = (SelectStmt) node;
String databaseName = TableRefs().get(0).getPath().get(0);
String tableName = TableRefs().get(0).getPath().get(1);
System.out.println(databaseName);
System.out.println(tableName);
} else if (node instanceof CreateTableStmt) {
System.out.println("建表语句");
CreateTableStmt createTableStmt = (CreateTableStmt) node;
System.out.Tbl());
for (ColumnDef def : ColumnDefs()) {
System.out.ColName() + " " + TypeDef());
}
} else if (node instanceof ResetMetadataStmt) {
System.out.println("刷新元数据语句");
} else if (node instanceof ComputeStatsStmt) {
System.out.println("compute stats语句");
} else if (node instanceof DescribeTableStmt) {
System.out.println("describe语句");
} else if (node instanceof AlterTableOrViewRenameStmt) {
System.out.println("rename语句");
} else if (node instanceof AlterTableAddReplaceColsStmt) {
System.out.println("add col语句");
} else if (node instanceof AlterTableAlterColStmt) {
System.out.println("alter col语句");
} else if (node instanceof UseStmt) {
System.out.println("use语句");
} else if (node instanceof SetStmt) {
System.out.println("set语句");
} else {
System.out.Class());
}
} catch (Exception e) {
e.printStackTrace();
fail("\nParser error:\n" + ErrorMsg(query));
}
输出
建表语句
my_first_table
id BIGINT
name STRING
impala建textfile表语句error parse new
create table IF NOT EXISTS default.bbb (
column1 string,
column2 int,
column3 bigint
);
不添加其他参数默认建⽴的是TEXTFILE格式的hive表
CREATE TABLE default.bbb (  column1 STRING,  column2 INT,  column3 BIGINT ) STORED AS TEXTFILE LOCATION 'hdfs://xx-nameservice/user/hive/warehouse/bbb' impala建parquet表语句
create table IF NOT EXISTS default.bbb (
column1 string,
column2 int,
column3 bigint
)
stored as parquet;
表结构
CREATE TABLE default.bbb (  column1 STRING,  column2 INT,  column3 BIGINT ) STORED AS PARQUET LOCATION 'hdfs://xx-nameservice/user/hive/warehouse/bbb' 

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。