目录
1 环境前提
2 可能的诱因 & 现象结果
3 Related
3.1 github issue
环境前提
有kube-proxy组件且工作在iptables模式下
可有可无的条件: calico CNI
可能的诱因 & 现象结果
overlay POD 与集群外服务通讯
underlay与overlay网络通讯(去程overlay 回程underlay导致 asymmetrical routing 即非对称路由)
conntrack saturation? (conntrack 饱和)
产生偶发性大耗时 或者 偶发性断流现象
在 kube-proxy 所维护的filter KUBE-FORWARD iptables规则链中,存在一条规则-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
[root@gzu-prd ~]# iptables -L KUBE-FORWARD --line -nv
Chain KUBE-FORWARD (1 references)
num pkts bytes target prot opt in out source destination
1 0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 ctstate INVALID
2 4 240 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000
3 11412 33M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
4 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
这一条规则会导致在connection track标记为INVALID的流量被DROP处理,同时这一行为目前不支持配置禁用(除非改代码重新编译)
其中关于TCP的connection track状态可以在conntrack -L 或者 cat /proc/net/nf_conntrack中查到(例如[UNREPLIED]之类的)
kube-proxy会在endpoint发生变动的时候粗暴地Flush iptables规则,导致不能简单地在KUBE-FORWARD中插入一条ACCEPT规则来避免这种问题
同样在calico所维护的各种iptables filter表中,每一个cali-fw-cali****链基本也存在规则-m conntrack --ctstate INVALID -j DROP
[root@gzu-prd ~]# iptables-save -t filter|grep INVALID
-A cali-fw-cali02fca994756 -m comment --comment "cali:Zgj-5PhkyRyRGc5v" -m conntrack --ctstate INVALID -j DROP
-A cali-fw-cali091fd1acd82 -m comment --comment "cali:vySNraYuHVkcwzZC" -m conntrack --ctstate INVALID -j DROP
-A cali-fw-cali0945b5ec7e6 -m comment --comment "cali:YpO6T4K2fN2biMqp" -m conntrack --ctstate INVALID -j DROP
-A cali-fw-cali09725d6075c -m comment --comment "cali:3Q23jKsPGkXWWHjs" -m conntrack --ctstate INVALID -j DROP
但是这一行为是可以通过FELIX_DISABLECONNTRACKINVALIDCHECK环境变量关闭
具体是否受影响,利用iptables命中计数器是观测手段之一
iptables -w 3 -L --line -nv|grep DROP|sort -rn -k 2|head -n 10
[root@gzu-prd ~]# iptables -w 3 -L --line -nv|grep DROP|sort -rn -k 2|head -n 10
2 19020 773K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:kRQn4VHUEHOpigCm */ ctstate INVALID
2 15617 937K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:DTf_pGZFWLZaqlg8 */ ctstate INVALID
2 7068 283K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:HGKygSKf4SfkbRyf */ ctstate INVALID
2 3845 154K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:t5nJs-UfMTVjRtBI */ ctstate INVALID
2 2312 139K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:h3VJGUlERuK34Tcz */ ctstate INVALID
2 2115 110K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:dTQ4mHZc378Z1e33 */ ctstate INVALID
2 1828 110K DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:kp1Tzme9aWaPgdKP */ ctstate INVALID
2 1556 62240 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:VaeGtNK_681jKlg9 */ ctstate INVALID
2 1330 69160 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:meQqPUz96UN62T8l */ ctstate INVALID
2 1025 53300 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:mIIn1Wh34t2SZwbR */ ctstate INVALID
如果在不修改kube-proxy和calico-node参数的情况下,想避免这种情况,可以简单粗暴地在集群中设置一个daemonset
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: iptables-conntrack-hacker
namespace: kube-system
labels:
app: iptables-conntrack
spec:
selector:
matchLabels:
app: iptables-conntrack-hacker
template:
metadata:
name: iptables-conntrack-hacker
labels:
app: iptables-conntrack-hacker
spec:
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
type: ''
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: ''
containers:
- name: iptables-conntrack-hacker
image: 'your-registry-address/kube-system/kube-proxy:v1.18.20'
command:
- /bin/sh
- '-ce'
- |
export TZ=Asia/Shanghai;
echo "$(date) Container started...";
echo "Current iptables rule state:"
iptables -w 10 -L --line -nv|grep INVALID || true
while (true)
do
iptables -C FORWARD -w 15 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet or calico" --ctstate INVALID -j ACCEPT || \
(iptables -I FORWARD -w 10 -m conntrack -m comment --comment "To avoid invalid tcp traffic dropped by kubelet or calico" --ctstate INVALID -j ACCEPT && echo "$(date) Adding iptables rules ...");
sleep 60
done
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 100m
memory:64Mi
volumeMounts:
- name: lib-modules
mountPath: /lib/modules
- name: xtables-lock
mountPath: /run/xtables.lock
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
runAsUser: 0
restartPolicy: Always
terminationGracePeriodSeconds: 5
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext: {}
schedulerName: default-scheduler
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
effect: NoExecute
- operator: Exists
effect: NoSchedule
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 50%
revisionHistoryLimit: 5
这个Daemonset只有在启动的时候会去操作宿主机的iptables以粗暴地插入一条INVALID ACCEPT规则
有条件的同学可以修改为死循环并且每10 - 30秒检测一次iptables是否存在ACCEPT规则,不存在则插入
注意使用这个Daemonset还存在一个前提约束,如果使用的overlay CNI为calico,需要确认calico-node的iptables操作模式为追加模式
将 FELIX_CHAININSERTMODE环境变量要修改为Append ,否则cali-FORWARD这个链会被插在FORWARD链最前面,导致INVALID ACCEPT规则失效
Related
kube-proxy(v1.18.20) code: https://github.com/kubernetes/kubernetes/blob/1f3e19b7beb1cc0110255668c4238ed63dadb7ad/pkg/proxy/iptables/proxier.go#L1503-L1511
calico v3.16 config(FELIX_DISABLECONNTRACKINVALIDCHECK): https://docs.tigera.io/archive/v3.16/reference/felix/configuration
github issue
https://github.com/kubernetes/kubernetes/issues/74839
https://github.com/kubernetes/kubernetes/issues/94861
https://technology.lastminute.com/chasing-k8s-connection-reset-issue/
calico: https://github.com/projectcalico/calico/issues/2609