SpringCloudGateway之踩坑⽇记
⽬录
⼀、背景
楼主所在的团队全⾯拥抱了Spring Cloud体系,但由于历史原因,以及使⽤了腾讯云TSF的⽼版本,加上开发⾃维护的基础⼯具包⼀掺和,所有项⽬使⽤的Spring Cloud都停留在 2.1.2.RELEASE 版本,所以Spring Cloud Gateway(后⾯简称SCG)使⽤的是
2.1.2.RELEASE 版本。我们知道 SCG 是基于 Spring WebFlux ⽽构建的专属⽹关系统,⽽ Spring WebFlux 则是和 Spring MVC ⼀样,基于 Spring Web ⽽构建,⽽ Spring WebFlux 则是因为将 Spring MVC “Reactor化”成本很⾼⽽且不好维护⽽⽣成的新产品。17年的 Spring Web 就已经⽀持了响应流,我们可以看下其Gradle⽂件:
dependencyManagement {
imports {
mavenBom "io.projectreactor:reactor-bom:${reactorVersion}"
mavenBom "ioty:netty-bom:${nettyVersion}"
mavenBom "lipse.jetty:jetty-bom:${jettyVersion}"
}
}
其中就有reactor-bom。说实话,当时在针对⽹关选型时,抛弃了Zuul⽽选择SCG,但没想到只能⽤SCG的 2.1.2.RELEASE 版本,该版本于19年6⽉发布,加上本来就是Spring Cloud家族中的新星,⼀切都在快速迭代和升级中,所以虽然距现在不到两年,但实际上SCG的模块已经进⾏⼤幅调整(新版本连spring-cloud-gateway-core都不存在了,改成了spring-cloud-gateway-server)。本⽂将详细阐述我遇到的其中的⼏个坑(该坑在新版本已经被修复)。
⼆、神秘的超时
这⾥先阐述下我们基于SCG的⾃建⽹关的位置(得搞清楚⾃⼰的定位):
坑⼀:通过SCG的GlobalFilter记录的⽹关处理耗时不准
我们知道,想记录SCG的请求处理耗时没那么简单,我们之前使⽤的是GlobalFilter,创建了⼀个LogGlobalFilter,将其Order设置的⼩⼀点(⽐如⼩于0),这样执⾏顺序会靠前点,⽅法执⾏时记录⼀个时间,返回时记录另⼀个时间,代码⽚段如下:
public class LogGlobalFilter extends AbstractGlobalFilter {
private ModifyResponseBodyGatewayFilterFactory factory = new ModifyResponseBodyGatewayFilte
rFactory();
private GatewayFilter modifyResponseBodyGatewayFilter;
@PostConstruct
public void init() {
ModifyResponseBodyGatewayFilterFactory.Config config = new ModifyResponseBodyGatewayFilterFactory.Config();
config.setInClass(String.class);
config.setOutClass(String.class);
config.setRewriteFunction(new GatewayResponseRewriteFunction());
modifyResponseBodyGatewayFilter = factory.apply(config);
}
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
return modifyResponseBodyGatewayFilter.filter(exchange, chain).doOnSuccess(
(Void v) -> {
Long reqStartNanoTime = Attribute(REQUEST_START_NANO_TIME_ATTRIBUTE);
StringBuilder logStr = new StringBuilder("调⽤成功")
.append(", Gateway应答:").append((String) Attribute(RESPONSE_BODY_ATTRIBUTE))
.append(", 耗时(毫秒):").append(reqStartNanoTime == null ?
"计算失败" : (System.nanoTime() - reqStartNanoTime.longValue()) / 1000000);
log.String());
}
);
}
private static class GatewayResponseRewriteFunction implements RewriteFunction<String, String> {
@Override
public Publisher<String> apply(ServerWebExchange exchange, String body) {
return Mono.just(body);
}
}
}
这⾥表⾯上看没啥问题,LogGlobalFilter的Order值⼩于0,算是还⽐较⾼的执⾏优先级,⽽且我们在 fi
lter ⽅法的开始记录了⼀次系统本地时间,在 doOnSuccess ⽅法中记录了应答的时间,两个时间⼀减,可以⼤致得出请求处理耗时。
客户端服务器设置了数秒的超时时间,QA同学在App上测试时,时不时报⼀些超时,我们通过traceId来看,发现⽹关的⼊⼝时间和客户端服务器的请求时间差了⼀两秒,刚开始怀疑是不是外⽹环境不稳定,后⾯发现不应该,⽽且还有⼀个奇怪现象,SkyWalking上显⽰的⽹关请求到达时间⽐LogGlobalFilter要早⼀两秒,就是说SkyWalking上显⽰的请求到达时间才是符合预期的(和客户端服务发起时间相差⼏⼗毫秒)。
奇怪,SkyWalking是如何做到⽐LogGlobalFilter更准确(更早)的统计到请求⼊⼝时间的带着这个疑问,我顺便看了下SkyWalking的代码:
我们发现SkyWalking是针对 Spring WebFlux 的核⼼消息派发处理器 DispatcherHandler 做了字节码增强(可以理解类似AOP的效果)来统计这个时间的,于是我们修改了记录请求⼊⼝时间的策略,通过对 DispatcherHandler 做 AOP 来记录请求⼊⼝时间,以下是代码⽚
段:
@Component
public class DispatcherHandlerMethodInterceptor implements MethodInterceptor {
@Override
public Object invoke(MethodInvocation methodInvocation) throws Throwable {
if ("handle".Method().getName()) &&
ServerWebExchange exchange = (ServerWebExchange) Arguments()[0];
// 记录请求开始时间
log.info("Gateway receive request, path:{}, header:{}, params:{}",
}
return methodInvocation.proceed();
}
}
@Import({DispatcherHandlerMethodInterceptor.class})
@Configuration
public class ConfigurableAdvisorConfig {
private static final String DISPATCHER_HANDLER_POINTCUT =
"execution(public * org.active.DispatcherHandler.handle(..))";
@Autowired
private DispatcherHandlerMethodInterceptor dispatcherHandlerMethodInterceptor;
@Bean
public AspectJExpressionPointcutAdvisor buildDispatcherHandlerPointcutAdvisor() {
AspectJExpressionPointcutAdvisor advisor = new AspectJExpressionPointcutAdvisor();
advisor.setExpression(DISPATCHER_HANDLER_POINTCUT);
advisor.setAdvice(dispatcherHandlerMethodInterceptor);
return advisor;
}
}
坑⼆:reactor-netty的epoll&kqueue模式
"reactor-http-epoll-32" #204 daemon prio=5 os_prio=0 tid=0x00007f020c1e6320 nid=0xdd runnable [0x00007f0200bf7000]
springboot aopjava.lang.Thread.State: RUNNABLE
at ioty.channel.epoll.Native.epollWait0(Native Method)
at ioty.channel.epoll.Native.epollWait(Native.java:114)
at ioty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:256)
at ioty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:281)
at urrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
at ioty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at urrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"reactor-http-epoll-31" #203 daemon prio=5 os_prio=0 tid=0x00007f020c068df0 nid=0xdc runnable [0x00007f0200cf8000]
java.lang.Thread.State: RUNNABLE
at ioty.channel.epoll.Native.epollWait0(Native Method)
at ioty.channel.epoll.Native.epollWait(Native.java:114)
at ioty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:256)
at ioty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:281)
at urrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
at ioty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at urrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"reactor-http-epoll-30" #202 daemon prio=5 os_prio=0 tid=0x00007f020c0678e0 nid=0xdb runnable [0x00007f0200df9000]
java.lang.Thread.State: RUNNABLE
at ioty.channel.epoll.Native.epollWait0(Native Method)
at ioty.channel.epoll.Native.epollWait(Native.java:114)
at ioty.channel.epoll.EpollEventLoop.epollWait(EpollEventLoop.java:256)
at ioty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:281)
at urrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
at ioty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at urrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
突然想起JDK8还没修复的⼀个问题,Java运⾏时取CPU核数没有考虑容器能⽤的资源,取的是宿主机的CPU核数,正好是32,是不是正是因为epoll线程过多,导致流量增加时CPU消耗增加⽽容易超时?为了验证这个问题,我们得想办法把epoll线程数改成6,匹配该Docker 实例能使⽤的CPU数。怎么改?⽹上了⼀堆资料没见着,于是还是得回归源码,翻了下 reactor-netty 的源码,如下:

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。