跳至主要内容

[25] 為什麼 EKS add-on 可以管理 Kubernetes plugin

在 Kubernetes 中,我們經常地使用第三方的 plugin 來整合 AWS 服務,而在 EKS 上也提供對應的 plugin 版本,如 VPC CNI plugin、CoreDNS、kube-proxy、EBS CSI driver 等 plugins。而 EKS add-on 功能則提供經管理的 add-on 包含安全性更新、修復漏洞皆是由 EKS add-on 功能所管理並進行更新。

故本文將探討為何 EKS 環境可以主動更新已經部署的 Kubernetes plugin,那我自己所設定的環境變數為什麼不會被修改。

啟用 EKS add-on

以下將原先 VPC CNI plugin 更新使用 EKS add-on 作為範例:

  1. 查看 Service Account aws-node 所使用關聯的 IRSA(IAM roles for service accounts) [2] IAM role arn。
$ kubectl -n kube-system get sa aws-node -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"name":"aws-node","namespace":"kube-system"}}
creationTimestamp: "2022-09-14T09:46:16Z"
labels:
app.kubernetes.io/managed-by: eksctl
name: aws-node
namespace: kube-system
resourceVersion: "1657"
uid: 10f55f1c-cff2-47cd-96e0-3ca6d37e69ce
secrets:
- name: aws-node-token-9z9wq
  1. 確認目前使用的 VPC CNI plugin 版本為 v1.10.1
$ kubectl -n kube-system get ds aws-node -o yaml | grep "image: "
image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.10.1-eksbuild.1
image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni-init:v1.10.1-eksbuild.1
  1. 透過 eksctl utils describe-addon-versions 命令檢視,目前 EKS cluster 所支援的 add-on 版本。以下擷取 vpc-cni 最新版本為 v1.11.4-eksbuild.1
$ eksctl utils describe-addon-versions --cluster=ironman-2022 | tail -n 1 | jq .
...
...
...

{
"AddonName": "vpc-cni",
"AddonVersions": [
{
"AddonVersion": "v1.11.4-eksbuild.1",
"Architecture": [
"amd64",
"arm64"
],
"Compatibilities": [
{
"ClusterVersion": "1.22",
"DefaultVersion": false,
"PlatformVersions": [
"*"
]
}
]
},
  1. 使用 eksctl create addon 命令更新 VPC CNI plugin 至 v1.11.4-eksbuild.1 並綁定對應 Service Account IAM role ARN。
$ eksctl create addon --name vpc-cni --version v1.11.4-eksbuild.1 --service-account-role-a
rn="arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL" --cluster=ironman-2022
2022-10-10 11:22:30 [ℹ] Kubernetes version "1.22" in use by cluster "ironman-2022"
2022-10-10 11:22:30 [ℹ] when creating an addon to replace an existing application, e.g. CoreDNS, kube-proxy & VPC-CNI the --force
flag will ensure the currently deployed configuration is replaced
2022-10-10 11:22:30 [ℹ] using provided ServiceAccountRoleARN "arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccou
nt-kube-Role1-EMRZZYBVKCRL"
2022-10-10 11:22:30 [ℹ] creating addon
Error: addon status transitioned to "CREATE_FAILED"

由上述錯誤資訊,可以觀察到 addon 建立失敗 CREATE_FAILED。 5. 嘗試使用 --force 參數強制更新。

$ eksctl create addon --name vpc-cni --version v1.11.4-eksbuild.1 --service-account-role-arn="arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL" --cluster=ironman-2022 --force
2022-10-10 11:24:02 [ℹ] Kubernetes version "1.22" in use by cluster "ironman-2022"
2022-10-10 11:24:02 [ℹ] using provided ServiceAccountRoleARN "arn:aws:iam::111111111111:role/eksctl-ironman-2022-addon-iamserviceaccount-kube-Role1-EMRZZYBVKCRL"
2022-10-10 11:24:02 [ℹ] creating addon
2022-10-10 11:27:32 [ℹ] addon "vpc-cni" active
  1. 確認 VPC CNI plugin 更新 image 至 v1.11.4-eksbuild.1 版本。
$ kubectl -n kube-system get ds aws-node -o yaml | grep "image: "
image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.11.4-eksbuild.1
image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni-init:v1.11.4-eksbuild.1

解析

不免俗過往套路,有什麼更新先找 audit log 來一探究竟,使用以下 CloudWatch Logs Insights syntax 查看 DaemonSet aws-node 更新:

filter @logStream like /^kube-apiserver-audit/
| fields @timestamp, @message
| sort @timestamp asc
| filter objectRef.name == 'aws-node' AND objectRef.resource == 'daemonsets' AND verb == 'patch'
| limit 10000

第一筆為透過 eksctl 命令第一次更新失敗的資訊,其失敗原因為 .spec.template.spec.containers.spec.template.spec.initContainers 欄位 FieldManagerConflict :

    "@message": {
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "RequestResponse",
"auditID": "b7e38710-2b5c-4317-a899-7a8b0f603b80",
"stage": "ResponseComplete",
"requestURI": "/apis/apps/v1/namespaces/kube-system/daemonsets/aws-node?dryRun=All&fieldManager=eks&force=false&timeout=10s",
"verb": "patch",
"user": {
"username": "eks:addon-manager",
"uid": "aws-iam-authenticator:348370419699:AROAVCHD6X7Z4UOOTWH7N",
"groups": [
"system:authenticated"
],
...
"userAgent": "Go-http-client/1.1",
"objectRef": {
"resource": "daemonsets",
"namespace": "kube-system",
"name": "aws-node",
"apiGroup": "apps",
"apiVersion": "v1"
},
"responseStatus": {
"metadata": {},
"status": "Failure",
"reason": "Conflict",
"code": 409
},
...
...
...
"responseObject": {
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Apply failed with 2 conflicts: conflicts with \"kubectl-client-side-apply\" using apps/v1:\n- .spec.template.spec.containers[name=\"aws-node\"].image\n- .spec.template.spec.initContainers[name=\"aws-vpc-cni-init\"].image",
"reason": "Conflict",
"details": {
"causes": [
{
"reason": "FieldManagerConflict",
"message": "conflict with \"kubectl-client-side-apply\" using apps/v1",
"field": ".spec.template.spec.containers[name=\"aws-node\"].image"
},
{
"reason": "FieldManagerConflict",
"message": "conflict with \"kubectl-client-side-apply\" using apps/v1",
"field": ".spec.template.spec.initContainers[name=\"aws-vpc-cni-init\"].image"
}
]
},
"code": 409
},
"requestReceivedTimestamp": "2022-10-10T11:22:31.607660Z",
"stageTimestamp": "2022-10-10T11:22:31.613812Z",

第二筆為透過 --forece 成功更新,可以觀察到:

  • requestURI:使用了 force=true
  • responseObject:多了 managedFields 欄位,其 manager 為 eks。
    "@message": {
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "RequestResponse",
"auditID": "40e4c444-65f0-49a8-8efa-f019a0e38508",
"stage": "ResponseComplete",
"requestURI": "/apis/apps/v1/namespaces/kube-system/daemonsets/aws-node?fieldManager=eks&force=true&timeout=10s",
"verb": "patch",
"user": {
"username": "eks:addon-manager",
"uid": "aws-iam-authenticator:348370419699:AROAVCHD6X7Z4UOOTWH7N",
"groups": [
"system:authenticated"
],
...
...
...
"userAgent": "Go-http-client/1.1",
"objectRef": {
"resource": "daemonsets",
"namespace": "kube-system",
"name": "aws-node",
"apiGroup": "apps",
"apiVersion": "v1"
},
"responseStatus": {
"metadata": {},
"code": 200
},
...
...
"responseObject": {
"kind": "DaemonSet",
"apiVersion": "apps/v1",
"metadata": {
"name": "aws-node",
"namespace": "kube-system",
"uid": "19ae2e66-9bf4-4d21-9161-c0f564a62b34",
"resourceVersion": "7590998",
"generation": 8,
"creationTimestamp": "2022-09-14T09:46:16Z",
"labels": {
"k8s-app": "aws-node"
},
...
...
...
"managedFields": [
{
"manager": "eks",
"operation": "Apply",
"apiVersion": "apps/v1",
"time": "2022-10-10T11:24:03Z",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:metadata": {
"f:labels": {
"f:k8s-app": {}
}
},
"f:spec": {
"f:selector": {},
"f:template": {
"f:metadata": {
"f:labels": {
"f:app.kubernetes.io/name": {},
"f:k8s-app": {}
}
},
"f:spec": {
"f:affinity": {
"f:nodeAffinity": {
"f:requiredDuringSchedulingIgnoredDuringExecution": {}
}
},
"f:containers": {
"k:{\"name\":\"aws-node\"}": {
".": {},
"f:env": {
"k:{\"name\":\"AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER\"}": {
".": {},
"f:name": {},
"f:value": {}
},
...
...

故我們也能使用 kubectl 命令 --show-managed-fields 參數也能觀察 Field management。

$ kubectl -n kube-system get ds aws-node --show-managed-fields -o yaml | grep -A5 "managedFields"
managedFields:
- apiVersion: apps/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:labels:

Server-Side Apply

根據 EKS add-on [1] 及 blog [3] 皆提及 EKS add-on 功能使用 Kubernetes 1.18 feature - Server-Side Apply [4]。此 Server-Side Apply 功能將 kubectl apply 功能移植到 Kubernetes API server side,當設定檔欄位有所衝突時提供對應解決衝突的合併演算法。

為追蹤管理 Field Management 則被處存為 metadata object managedFields [5] 欄位。

回到 EKS 如何使用 Field Management [6] 可以分為以下兩類型:

Fully managed:所有 Field Management 指定了 f: 但沒有 k:,則代表 Fully managed,任何修改都會造成衝突

以下以 aws-node 為例,此 container name aws-node 是由 EKS 所管理,其中 argsimageimagePullPolicy sub-fields 皆是由 EKS 所管理。修改任何數值於這些欄位都會造成衝突。因此在上述第一次啟用 EKS addon 時,其中 .spec.template.spec.containers.spec.template.spec.initContainers 衝突而失敗。

  f:containers:
k:{"name":"aws-node"}:
.: {}
f:args: {}
f:image: {}
f:imagePullPolicy: {}
...
manager: eks

Partially managed:如果 Field Management key 有指定值, 修改 key 則會造成衝突

以下以 aws-node 為例環境變數為例,eks 管理包含以 ADDITIONAL_ENI_TAGSAWS_VPC_CNI_NODE_PORT_SUPPORT... 為 key 的環境變數。因此修改環境變數名稱時則會造成衝突,故我們仍可以保留 EKS add-on 的環境變數值。

f:containers:
k:{"name":"aws-node"}:
.: {}
f:env:
.: {}
k:{"name":"ADDITIONAL_ENI_TAGS"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"AWS_VPC_CNI_NODE_PORT_SUPPORT"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"AWS_VPC_ENI_MTU"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG"}:
.: {}
f:name: {}
f:value: {}
k:{"name":"AWS_VPC_K8S_CNI_EXTERNALSNAT"}:
.: {}
f:name: {}
f:value: {}

總結

EKS add-on 功能主要導入 Kubernetes Server-Side Apply,雖然由 AWS 來維護安全性漏洞或是版本上更新較為方便。若是需要自定義時,仍需要對照 Field Management 欄位。如同 CoreDNS 在部分使用場境會設定為 DaemonSet 方式部署,則可能無法藉由 add-on 方式進行。

參考文件

  1. Amazon EKS add-on configuration - https://docs.aws.amazon.com/eks/latest/userguide/add-ons-configuration.html
  2. IAM roles for service accounts - https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
  3. Introducing Amazon EKS add-ons: lifecycle management for Kubernetes operational software - https://aws.amazon.com/blogs/containers/introducing-amazon-eks-add-ons/
  4. Server-Side Apply - https://kubernetes.io/docs/reference/using-api/server-side-apply
  5. ObjectMeta v1 meta - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#objectmeta-v1-meta
  6. Amazon EKS add-on configuration - Understanding field management syntax in the Kubernetes API - https://docs.aws.amazon.com/eks/latest/userguide/add-ons-configuration.html#add-on-config-management-understanding-field-management