SparkDataFrame常⽤操作FiltergroupByaggpivot⽅法(scala版)SparkDataFrame 常⽤操作 Filter/groupBy/agg/pivot⽅法
⽬录
先构造⼀组数据
val dataDF = List(
("id1","click","0108",1,1.0),
("id1","view","0101",2,1.0),
("id2","buy","0105",3,7.0),
("id2","click","0104",4,9.0),
("id2","click","0105",5,1.0),
("id3","buy","0106",6,1.0),
("id3","view","0105",7,1.0),
("id3","view","0105",8,1.0),
("id3","view","0106",9,6.0),
("id3","view","0105",10,10.0),
("id5","view","0103",11,1.0),
("id5","click","0106",12,1.0)).toDF("id","action","date","grade","rate")
dataDF.show()
查看⼀下
scala> dataDF.show()
+---+------+----+-----+----+
|id|action|date|grade|rate|
+---+------+----+-----+----+
|id1| click|0108|    1| 1.0|
|id1|  view|0101|    2| 1.0|
|id2|  buy|0105|    3| 7.0|
|id2| click|0104|    4| 9.0|
|id2| click|0105|    5| 1.0|
|id3|  buy|0106|    6| 1.0|
|id3|  view|0105|    7| 1.0|
|id3|  view|0105|    8| 1.0|
|id3|  view|0106|    9| 6.0|
|id3|  view|0105|  10|10.0|
|id5|  view|0103|  11| 1.0|
|id5| click|0106|  12| 1.0|
+---+------+----+-----+----+
⼀、filter⽅法
先看⼀下各列数据类型
val typeMap = Map[String,String]
结果:
scala>val typeMap = Map[String,String]
typeMap: llection.immutable.Map[String,String]= Map(rate -> DoubleType, id -> StringType, date -> StringType, grade -> IntegerType, action -> St ringType)
对DoubleType/IntegerType类型过滤
//DoubleType
dataDF.filter("rate=1.0").show()
dataDF.filter("rate>1.0").show()
dataDF.filter("rate>=1.0").show()
dataDF.filter("rate!=1.0").show()
//IntegerType
dataDF.filter("grade=1.0").show()
dataDF.filter("grade>1.0").show()
dataDF.filter("grade>=1.0").show()
dataDF.filter("grade!=1.0").show()
//或者使⽤
import spark.implicits._ //使⽤$⽅法必须
val colname = "rate"
groupby分组
dataDF.filter($"rate" === 2).show()
dataDF.filter($"rate" > 2).show()
dataDF.filter($"rate" =!= 2).show()
需要传递参数的情况
val colname = "rate"
val ratenum = 1.0
dataDF.filter($"$colname" === ratenum).show()
dataDF.filter($"$colname" > ratenum).show()
dataDF.filter($"$colname" =!= ratenum).show()
对StringType类型过滤
import spark.implicits._
val colname = "action"
dataDF.filter($"$colname".equalTo("view")).show()
过滤⾮空取值
dataDF.filter($"$colname".isNotNull).show()
逻辑条件判断
dataDF.filter($"$colname".isNotNull && ($"grade" < 2.0 || $"grade" > 5.0 ) ).show()
⼆、withColumn/groupBy/agg/pivot⽅法
1. ⽅法概述
withColumn 新增⼀列
dataDF.withColumn("total1",lit(1)).show()
dataDF.withColumn("total1",lit("hello")).show()
groupBy 分组聚合 import org.apache.spark.unt
import org.apache.spark.sql.functions.{col, count, explode, lit, max, min,sum,avg}
//分组计数
//分组求最⼤/平均/最⼩/求和
agg 分组聚合,需要结合groupBy使⽤
alias 为新增列重命名

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。