错误日志
最近开发同事反馈,Java 程序报错:java.net.UnknownHostException: ads.google.com
,从错误信息来看,很显然是因为域名解析异常。
2024-03-10 14:08:52.000 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#14-1] [ERROR] c.b.s.c.biz.util.1433.ImageUtil [b07f9cbefd29ed] - :java.net.UnknownHostException: ads.google.com
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:293)
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1572)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
at java.net.URL.openStream(URL.java:1093)
问题分析
从错误日志时间线来看,主要集中发生在某一分钟内的多次并发请求。
问题分析的根本在于,从哪些角度、维度出发进行分析?
像这种 DNS 异常,从运维的角度出发,第一反应是直接到对应应用程序服务器上进行 nslookup ads.google.com
,看看是否存在解析异常,但很显示这是很难有答案的。从上图的时间线来看,它是偶发性的,也就是说当您 nslookup ads.google.com
时,不一定碰得上解析异常,除非写一个脚本循环持续检测,这是可以做为根本原因分析的一个方向和策略的。但从经验的角度来说,DNS 抖动是网络中常见的情况,主要原因有以下几个:
- 网络抖动。
- 本身 DNS 服务器性能不足。
- 域名返回 cname 信息过多,造成解析超时。
- 客户端超时时间设置过短,比如:客户端限制的最长超时时间 1 秒。
当然也会产生对应解决方案:
- 优化网络稳定性。
- 扩容 DNS 集群。
- 优化解析记录条数。
- 调整请求超时时间。
但这里面并不是所有问题都可以自己解决,比如:
- 国际互联网的抖动。
- DNS 集群并不是自建的,而是第三方的,但是可以购买第三方的付费服务。
- 域名不是自己的,也就没有办法优化解析记录,除非 DNS 劫持。
但还可以从开发视角出发,基本上所有开发语言,从性能角度考虑,都会自带 DNS 缓存管理机制,所以可以试着从这个角度分析并解决问题。
从 Java 官方源码来看,NameService 接口主要定义了两个方法:
- 通过主机名获取对应 IP 地址
- 通过 IP 地址获取对应主机名
private interface NameService {
/**
* Lookup a host mapping by name. Retrieve the IP addresses
* associated with a host
*
* @param host the specified hostname
* @return array of IP addresses for the requested host
* @throws UnknownHostException
* if no IP address for the {@code host} could be found
*/
InetAddress[] lookupAllHostAddr(String host)
throws UnknownHostException;
/**
* Lookup the host corresponding to the IP address provided
*
* @param addr byte array representing an IP address
* @return {@code String} representing the host name mapping
* @throws UnknownHostException
* if no host found for the specified IP address
*/
String getHostByAddr(byte[] addr) throws UnknownHostException;
}
DNS 缓存策略
其中有一段代码用于处理 DNS 缓存策略,详细参考:openjdk-jdk11/src/java.base/share/classes/java/net/InetAddress.java at master · AdoptOpenJDK/openjdk-jdk11 · GitHub
public InetAddress[] get() throws UnknownHostException {
Addresses addresses;
// only one thread is doing lookup to name service
// for particular host at any time.
synchronized (this) {
// re-check that we are still us + re-install us if slot empty
addresses = cache.putIfAbsent(host, this);
if (addresses == null) {
// this can happen when we were replaced by CachedAddresses in
// some other thread, then CachedAddresses expired and were
// removed from cache while we were waiting for lock...
addresses = this;
}
// still us ?
if (addresses == this) {
// lookup name services
InetAddress[] inetAddresses;
UnknownHostException ex;
int cachePolicy;
try {
inetAddresses = getAddressesFromNameService(host, reqAddr);
ex = null;
cachePolicy = InetAddressCachePolicy.get();
} catch (UnknownHostException uhe) {
inetAddresses = null;
ex = uhe;
cachePolicy = InetAddressCachePolicy.getNegative();
}
// remove or replace us with cached addresses according to cachePolicy
if (cachePolicy == InetAddressCachePolicy.NEVER) {
cache.remove(host, this);
} else {
CachedAddresses cachedAddresses = new CachedAddresses(
host,
inetAddresses,
cachePolicy == InetAddressCachePolicy.FOREVER
? 0L
// cachePolicy is in [s] - we need [ns]
: System.nanoTime() + 1000_000_000L * cachePolicy
);
if (cache.replace(host, this, cachedAddresses) &&
cachePolicy != InetAddressCachePolicy.FOREVER) {
// schedule expiry
expirySet.add(cachedAddresses);
}
}
if (inetAddresses == null) {
throw ex == null ? new UnknownHostException(host) : ex;
}
return inetAddresses;
}
// else addresses != this
}
// delegate to different addresses when we are already replaced
// but outside of synchronized block to avoid any chance of dead-locking
return addresses.get();
}
}
这段 Java 代码片段实现了一个用于查找并缓存主机地址的方法。具体来说,这个代码通过同步机制确保同一时间内只有一个线程在进行某个主机的域名服务查找。以下是对代码的逐步解释:
-
声明与初始化部分:
public InetAddress[] get() throws UnknownHostException { Addresses addresses;
这里定义了一个
InetAddress[] get()
方法,这个方法可能会抛出UnknownHostException
。方法中声明了一个Addresses
类型的局部变量addresses
。 -
同步块:
synchronized (this) {
进入一个同步块,使用当前对象(this)作为锁,确保同一时间只有一个线程能执行同步代码块中的内容。
-
从缓存中查找或插入当前对象:
addresses = cache.putIfAbsent(host, this); if (addresses == null) { addresses = this; }
试图将当前对象插入缓存,如果缓存中已经存在则返回该地址对象,否则将当前对象放入缓存中。如果
addresses
为空,说明当前对象被成功插入缓存。 -
判断当前对象是否为缓存中的对象:
if (addresses == this) {
判断
addresses
是否是当前对象,如果是,则进行域名查找。 -
域名查找与缓存:
InetAddress[] inetAddresses; UnknownHostException ex; int cachePolicy; try { inetAddresses = getAddressesFromNameService(host, reqAddr); ex = null; cachePolicy = InetAddressCachePolicy.get(); } catch (UnknownHostException uhe) { inetAddresses = null; ex = uhe; cachePolicy = InetAddressCachePolicy.getNegative(); }
通过
getAddressesFromNameService
方法进行域名查找。如果成功,获取到的地址保存在inetAddresses
中,异常设为null
,设置缓存策略来自InetAddressCachePolicy.get()
。若查找失败,则捕获异常,设为ex
,设置缓存策略来自InetAddressCachePolicy.getNegative()
。 -
缓存更新与过期处理:
if (cachePolicy == InetAddressCachePolicy.NEVER) { cache.remove(host, this); } else { CachedAddresses cachedAddresses = new CachedAddresses( host, inetAddresses, cachePolicy == InetAddressCachePolicy.FOREVER ? 0L : System.nanoTime() + 1000_000_000L * cachePolicy ); if (cache.replace(host, this, cachedAddresses) && cachePolicy != InetAddressCachePolicy.FOREVER) { expirySet.add(cachedAddresses); } }
如果缓存策略是
InetAddressCachePolicy.NEVER
,就移除缓存。如果缓存策略是InetAddressCachePolicy.FOREVER
,则将缓存时间设置为0L
,否则,将缓存的过期时间设为当前时间加上策略定义的时间。通过System.nanoTime()
获取当前时间,再加上cachePolicy
(以秒为单位) 乘以1000000000
(将秒转换为纳秒),计算出具体的过期时间。 -
抛出异常或返回结果:
if (inetAddresses == null) { throw ex == null ? new UnknownHostException(host) : ex; } return inetAddresses;
如果
inetAddresses
为空,则抛出异常。否则返回查找到的地址数组。 -
处理缓存已被更新的情况:
} else { return addresses.get(); } }
如果当前对象已经被缓存中其他地址对象替换,则调用替换后的地址对象的
get()
方法来获取地址数组,并返回。
缓存策略原文说明:openjdk-jdk11/src/java.base/share/classes/java/net/InetAddress.java at master · AdoptOpenJDK/openjdk-jdk11 · GitHub
对应缓存策略源代码参考:openjdk-jdk11/src/java.base/share/classes/sun/net/InetAddressCachePolicy.java at master · AdoptOpenJDK/openjdk-jdk11 · GitHub
* <dl style="margin-left:2em">
* <dt><b>networkaddress.cache.ttl</b></dt>
* <dd>Indicates the caching policy for successful name lookups from
* the name service. The value is specified as an integer to indicate
* the number of seconds to cache the successful lookup. The default
* setting is to cache for an implementation specific period of time.
* <p>
* A value of -1 indicates "cache forever".
* </dd>
* <dt><b>networkaddress.cache.negative.ttl</b> (default: 10)</dt>
* <dd>Indicates the caching policy for un-successful name lookups
* from the name service. The value is specified as an integer to
* indicate the number of seconds to cache the failure for
* un-successful lookups.
* <p>
* A value of 0 indicates "never cache".
* A value of -1 indicates "cache forever".
* </dd>
* </dl>
networkaddress.cache.ttl:表示解析成功的缓存策略。按指定时间缓存,如果值为 -1
表示“永久缓存”。
networkaddress.cache.negative.ttl:表示解析失败的缓存策略。按指定时间缓存,默认缓存时间为 10 秒,值为 0
表示“永不缓存”。值为 -1
表示“永久缓存”。
所以,从上面的源码分析来看,networkaddress.cache.negative.tt
的默认缓存 10 秒,是造成 UnknownHostException
的元凶。
设置缓存参数
JVM 启动参数
在 Java 应用程序启动时,设置启动参数:
-Dsun.net.inetaddr.ttl=30 -Dsun.net.inetaddr.negative.ttl=0
应用程序 Runtime 设置
java.security.Security.setProperty("networkaddress.cache.ttl" , "30");
java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "0");
设置 JVM 配置
编辑 java.security 文件,增加以下配置:
# Java8+
$JAVA_HOME/jre/lib/security/java.security
# Java11+
$JAVA_HOME/conf/security/java.security
# 参数配置
networkaddress.cache.ttl = 30
networkaddress.cache.negative.ttl = 0
参考设置:
設定 DNS 名稱查詢的 JVM TTL – AWS SDK for Java 2.x
DNS cache TTL in Java. You usually don’t need to take care of… | by yongjoon | Medium
参考文献
在 AWS 官网找到一篇非常好的文章,虽然是针对 AWS S3 SDK 的,但都是 Java 系,作者对整个 Java DNS 缓存策略进行了一次详细的实战抓包分析,非常值得一看。
[1] 从 UnknownHostException 错误来分析 Java 的 DNS 解析和缓存机制 | 亚马逊AWS官方博客
[2] Networking Properties