SparkSQL内置函数
在Spark 1.5.x版本,增加了⼀系列内置函数到DataFrame API中,并且实现了code-generation的优化。与普通的函数不同,DataFrame的函数并不会执⾏后⽴即返回⼀个结果值,⽽是返回⼀个Column对象,⽤于在并⾏作业中进⾏求值。Column可以⽤在DataFrame的操作之中,⽐如select,filter,groupBy等。函数的输⼊值,也可以是Column。
聚合函数:
approxCountDistinct, avg, count, countDistinct, first, last, max, mean, min, sum, sumDistinct
集合函数:
array_contains, explode, size, sort_array
⽇期时间转换
unix_timestamp, from_unixtime, to_date, quarter, day, dayofyear, weekofyear, from_utc_timestamp, to_utc_timestamp
从⽇期时间中提取字段
year, month, dayofmonth, hour, minute, second
⽇期/时间计算
datediff, date_add, date_sub, add_months, last_day, next_day, months_between
获取当前时间等
current_date, current_timestamp, trunc, date_format
数学函数:
abs, acros, asin, atan, atan2, bin, cbrt, ceil, conv, cos, sosh, exp, expm1, factorial, floor, hex, hypot, log, log10, log1p, log2, pmod, pow, rint, round, shiftLeft, shiftRight, shiftRightUnsigned, signum, sin, sinh, sqrt, tan, tanh, toDegrees, toRadians, unhex
混合函数:
array, bitwiseNOT, callUDF, coalesce, crc32, greatest, if, inputFileName, isNaN, isnotnull, isnull, least, lit, md5, monotonicallyIncreasingId, nanvl, negate, not, rand, randn, sha, sha1, sparkPartitionId, struct, when
字符串函数:groupby是什么函数
ascii, base64, concat, concat_ws, decode, encode, format_number, format_string, get_json_object, initcap, instr, length, levenshtein, locate, lower, lpad, ltrim, printf, regexp_extract, regexp_replace, repeat, reverse, rpad, rtrim, soundex, space, split, substring, substring_index, translate, trim, unb
窗⼝函数:
cumeDist, denseRank, lag, lead, ntile, percentRank, rank, rowNumber
ase64, upper
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论