zl程序教程

您现在的位置是:首页 >  数据库

当前栏目

【Redis连接超时】记录线上RedisConnectionFailureException异常排查过程

Redis异常连接 过程 记录 排查 超时 线上
2023-09-27 14:25:05 时间

项目架构:

  部分组件如下:

  SpringCloudAlibaba(Nacos+Gateway+OpenFeign)+SpringBoot2.x+Redis

问题背景:

  最近由于用户量增大,在高峰时期,会导致用户服务偶尔Redis出现连接超时的情况,

  例如:从Redis中获取手机验证码 ,登录成功后,将token存入Redis,以及涉及到使用Redis的场景都会出现RedisConnectionFailureException

  异常日志:

237614  2021-03-02 17:24:42.595 ERROR [d03f845825644cee8753539f24d840ad] [http-nio-7122-exec-32] c.l.c.b.e.GlobalExceptionHandler -java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
237615  org.springframework.data.redis.RedisConnectionFailureException: java.net.SocketTimeoutException: Read timed out; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Readtimed out
237616          at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:65)
237617          at org.springframework.data.redis.connection.jedis.JedisExceptionConverter.convert(JedisExceptionConverter.java:42)
237618          at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
237619          at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
237620          at org.springframework.data.redis.connection.jedis.JedisConnection.convertJedisAccessException(JedisConnection.java:135)
237621          at org.springframework.data.redis.connection.jedis.JedisStringCommands.convertJedisAccessException(JedisStringCommands.java:751)
237622          at org.springframework.data.redis.connection.jedis.JedisStringCommands.get(JedisStringCommands.java:67)
237623          at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:260)
237624          at org.springframework.data.redis.connection.DefaultStringRedisConnection.get(DefaultStringRedisConnection.java:398)
237625          at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57)
237626          at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:60)
237627          at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:228)
237628          at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:188)
237629          at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96)
237630          at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53)
237631          at com.xxxx.xxx.xxx.utils.RedisUtil.get(RedisUtil.java:242)

  Maven相关的Redis依赖:

  <!-- redis -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>io.lettuce</groupId>
                    <artifactId>lettuce-core</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
        </dependency>

 

  Redis配置(单节点配置,没有做分布式部署)

spring: 
    redis:
      pool:
      maxActive: 300
      maxIdle: 100
      maxWait: 1000
      host: xxxxxxxxx
      port: 6379
      password:
      timeout: 2000
      database: 5

 

排查过程:

  这里分析可能的原因如下:

  原因1.代码中是否有keys *类似的查询,由于Redis是单线程的,数据量大,单个命令执行时间过长,导致Redis客户端请求超时,keys *类似的查询非常消耗Redis的性能;

  原因2.Redis配置文件配置的 timeout 超时时间过短,上一个请求还没有执行结束,下一个请求无法获执行,最终超时导致请求失败;

  原因3.Redis连接池配置的链接数太小,通过Prometheus 监控发现用户服务  高峰时间请求量最高为180,考虑是否是连接数太小导致无法获取Redis连接,从而失败;

  

  针对原因1:

    这边排查了项目中的代码,没有类似keys * 查询,因此排除了这个可能行

  针对原因2:

    这边在观察了在出现 RedisConnectionFailureException时候,确认当前服务器Redis连接数峰值为15,配置文件中配置的超时时间配置为2000ms,由于确认原因1中的没有非常耗时的查询

    所以这种可能行也被排除了;

  

  由于以上原因1和原因2都排除了,这里考虑原因3,是连接数的问题

  查看配置发现最大连接数是300,远大于峰值180,配置数据似乎没问题,

  于是,在开发环境测试该配置,由于项目中使用的是Jedis连接池,没有使用lettuce连接池(注意:SpringBoot2.x对应的Spring-Boot-Data-Redis依赖默认使用的连接池是lettuce,如果要使用Jedis连接池,需要排除默认连接池配置,引入Jedis连接池,见上面的Maven依赖)

  进一步追踪源码发现

  配置连接数相关的类为:

package org.apache.commons.pool2.impl;

public class GenericObjectPoolConfig<T> extends BaseObjectPoolConfig<T> {
    public static final int DEFAULT_MAX_TOTAL = 8;
    public static final int DEFAULT_MAX_IDLE = 8;
    public static final int DEFAULT_MIN_IDLE = 0;
    private int maxTotal = 8;
    private int maxIdle = 8;
    private int minIdle = 0;
...

}

  加载该配置类的时机是在项目启动初始化连接池的时候

    

package org.springframework.data.redis.connection.jedis;

import java.time.Duration;
import java.util.Optional;

import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.SSLParameters;
import javax.net.ssl.SSLSocketFactory;

import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
import org.springframework.lang.Nullable;

/**
 * Default implementation of {@literal JedisClientConfiguration}.
 *
 * @author Mark Paluch
 * @author Christoph Strobl
 * @since 2.0
 */
class DefaultJedisClientConfiguration implements JedisClientConfiguration {

    private final boolean useSsl;
    private final Optional<SSLSocketFactory> sslSocketFactory;
    private final Optional<SSLParameters> sslParameters;
    private final Optional<HostnameVerifier> hostnameVerifier;
    private final boolean usePooling;
    private final Optional<GenericObjectPoolConfig> poolConfig;
    private final Optional<String> clientName;
    private final Duration readTimeout;
    private final Duration connectTimeout;

    DefaultJedisClientConfiguration(boolean useSsl, @Nullable SSLSocketFactory sslSocketFactory,
            @Nullable SSLParameters sslParameters, @Nullable HostnameVerifier hostnameVerifier, boolean usePooling,
            @Nullable GenericObjectPoolConfig poolConfig, @Nullable String clientName, Duration readTimeout,
            Duration connectTimeout) {

        this.useSsl = useSsl;
        this.sslSocketFactory = Optional.ofNullable(sslSocketFactory);
        this.sslParameters = Optional.ofNullable(sslParameters);
        this.hostnameVerifier = Optional.ofNullable(hostnameVerifier);
        this.usePooling = usePooling; 
        this.poolConfig = Optional.ofNullable(poolConfig);
        this.clientName = Optional.ofNullable(clientName);
        this.readTimeout = readTimeout;
        this.connectTimeout = connectTimeout;
    }

  Debug发现加载后仍然使用的是默认的连接数 

    public static final int DEFAULT_MAX_TOTAL = 8;
    public static final int DEFAULT_MAX_IDLE = 8;
    public static final int DEFAULT_MIN_IDLE = 0;
    private int maxTotal = 8;
    private int maxIdle = 8;
    private int minIdle = 0;

这里可能就是问题所在,配置文件中配置的最大连接数未生效,于是发现配置中这段配置已经失效
 redis:
      pool:
      maxActive: 300
      maxIdle: 100
      maxWait: 1000
 需要改为
  redis:
      jedis:
        pool:
          maxActive: 300
          maxIdle: 100
          max-wait: 1000ms

  修改后重启生效,如配置的数据一致