javagroupby_Java实现GroupBy分组TopN功能
详情
在Java 8 的Lambda(stream)之前,要在Java代码中实现相似SQL中的group by分组聚合功能,还是⽐较困难的。这之前Java对函数式编程⽀持不是很好,Scala则把函数式编程发挥到了机制,实现⼀个group by聚合对Scala来说就是⼏⾏代码的事情:val birds =
List("Golden Eagle","Gyrfalcon", "American Robin", "Mountain BlueBird", "Mountain-Hawk Eagle")
val groupByFirstLetter = upby(_.charAt(0))
输出:Map(M -> List(Mountain BlueBird, Mountain-Hawk Eagle), G -> List(Golden Eagle, Gyrfalcon),
A -> List(American Robin))
Java也有少量第三⽅的函数库来⽀持,例如Guava的Function,以及functional java这样的库。 但总的来说,内存对Java集合进⾏GroupBy ,OrderBy, Limit等TopN操作还是⽐较繁琐。本⽂实现⼀个简单的group功能,⽀持⾃⼰设置key以及聚合函数,通过简单的⼏个类,可以实现SQL都⽐较难实现的先分组,⽽后组内排序,最后取组内TopN。
实现
假设我们有这样⼀个Person类:package me.lin;
class Person {
private String name;
private int age;
private double salary;
public Person(String name, int age, double salary) {
super();
this.name = name;
this.age = age;
this.salary = salary;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
public double getSalary() {
return salary;
}
public void setSalary(double salary) {
this.salary = salary;
}
public String getNameAndAge() {
Name() + "-" + Age();
}
@Override
public String toString() {
return "Person [name=" + name + ", age=" + age + ", salary=" + salary
+ "]";
}
}
对于⼀个Person的List,想要根据年龄进⾏统计,取第⼀个值,取salary最⾼值等。实现如下:
聚合操作
定义⼀个聚合接⼝,⽤于对分组后的元素进⾏聚合操作,类⽐到MySQL中的count(*) 、sum():package me.lin; import java.util.List;
/**
*
* 聚合操作
*
* Created by Brandon on 2016/7/21.
*/
public interface Aggregator{
/**
* 每⼀组的聚合操作
*
* @param key 组别标识key
* @param values 属于该组的元素集合
* @return
*/
Object aggregate(Object key , Listvalues);
}
我们实现⼏个聚合操作,更复杂的操作⽀持完全可以⾃⼰定义。CountAggragator:package me.lin;
import java.util.List;
/**
*
* 计数聚合操作
*
* Created by Brandon on 2016/7/21.
*/
public class CountAggregatorimplements Aggregator{
@Override
public Object aggregate(Object key, Listvalues) {
return values.size();
}
}
FisrtAggregator:package me.lin;
import java.util.List;
/**
*
* 取第⼀个元素
*
* Created by Brandon on 2016/7/21.
*/
public class FirstAggregatorimplements Aggregator{
@Override
public Object aggregate(Object key, Listvalues) {
if ( values.size() >= 1) {
( 0 );
}else {
return null;
}
}
}
TopNAggregator:package me.lin;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
/**
*
* 取每组TopN
*
* Created by Brandon on 2016/7/21.
*/
public class TopNAggregatorimplements Aggregator{ private Comparatorcomparator;
private int limit;
public TopNAggregator(Comparatorcomparator, int limit) { this.limit = limit;
thisparator = comparator;
}
@Override
public Object aggregate(Object key, Listvalues) {
if (values == null || values.size() == 0) {
return null;
}
ArrayListcopy = new ArrayList<>( values );
Collections.sort(copy, comparator);
int size = values.size();
int toIndex = Math.min(limit, size);
return copy.subList(0, toIndex);
}
}
分组实现
接下来是分组实现,简单起见,采⽤⼯具类实现:package me.lin; import flect.Field;
import flect.InvocationTargetException;
import flect.Method;
import java.util.ArrayList;
groupby分组import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
/**
* Collection分组⼯具类
*/
public class GroupUtils {
/**
* 分组聚合
*
* @param listToDeal 待分组的数据,相当于SQL中的原始表
* @param clazz 带分组数据元素类型
* @param groupBy 分组的属性名称
* @param aggregatorMap 聚合器,key为聚合器名称,作为返回结果中聚合值map中的key * @param 元素类型Class
* @return
* @throws NoSuchFieldException
* @throws SecurityException
* @throws IllegalArgumentException
* @throws IllegalAccessException
*/
public static Map> groupByProperty(
CollectionlistToDeal, Classclazz, String groupBy,
Map> aggregatorMap) throws NoSuchFieldException,
SecurityException, IllegalArgumentException, IllegalAccessException {
Map> groupResult = new HashMap>();
for (T ele : listToDeal) {
Field field = DeclaredField(groupBy);
field.setAccessible(true);
Object key = (ele);
if (!ainsKey(key)) {
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论