spark中agg函数
说明
agg函数经常与groupBy函数⼀起使⽤,起到分类聚合的作⽤;
如果单独使⽤则对整体进⾏聚合;
代码⽰例
package com.dt.spark.Test
import org.apache.spark.sql.{DataFrame, SparkSession}
object AggTest {
case class Student(classId:Int,name:String,gender:String,age:Int)
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("local[*]").appName("testAgg").getOrCreate()
import spark.implicits._
val sc = spark.sparkContext
sc.setLogLevel("WARN")
val stuDF: DataFrame = Seq(
Student(1001, "zhangsan", "F", 20),
Student(1002, "lisi", "M", 16),
Student(1003, "wangwu", "M", 21),
groupby是什么函数Student(1004, "zhaoliu", "F", 21),
Student(1004, "zhouqi", "M", 22),
Student(1001, "qianba", "M", 19),
Student(1003, "liuliu", "F", 23)
)
.toDF()
import org.apache.spark.sql.functions._
//同样也可以这样写
//upBy("gender").agg("age"->"max","age"->"min","age"->"avg","id"->"count").show()
stuDF.agg(max("age"),min("age"),avg("age"),count("classId")).show()
}
结果输出
+------+--------+--------+------------------+--------------+
|gender|max(age)|min(age)| avg(age)|count(classId)| +------+--------+--------+------------------+--------------+
| F| 23| 20|21.333333333333332| 3|
| M| 22| 16| 19.5| 4|
+------+--------+--------+------------------+--------------+
+--------+--------+------------------+--------------+
|max(age)|min(age)| avg(age)|count(classId)|
+--------+--------+------------------+--------------+
| 23| 16|20.285714285714285| 7|
+--------+--------+------------------+--------------+
+-------+------+--------+--------+--------+--------------+
|classId|gender|max(age)|min(age)|avg(age)|count(classId)| +-------+------+--------+--------+--------+--------------+
| 1001| F| 20| 20| 20.0| 1|
| 1001| M| 19| 19| 19.0| 1|
| 1002| M| 16| 16| 16.0| 1|
| 1003| M| 21| 21| 21.0| 1|
| 1003| F| 23| 23| 23.0| 1|
| 1004| F| 21| 21| 21.0| 1|
| 1004| M| 22| 22| 22.0| 1|
+-------+------+--------+--------+--------+--------------+
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论