跳至主要内容

[11] 為什麼 Managed node groups 可以保持應用程式可用性(二)

延續前一篇,本文將繼續探討「node 更新過程 managed node group 協助了什麼 」。

建置測試步驟

  1. 透過 kubectl label 過濾 managed node group demo-ng
$ kubectl get node -l eks.amazonaws.com/nodegroup=demo-ng
NAME STATUS ROLES AGE VERSION
ip-192-168-18-230.eu-west-1.compute.internal Ready <none> 9m30s v1.22.12-eks-ba74326
  1. 查看在此 node 所運行的 Pod。
$ kubectl describe no ip-192-168-18-230.eu-west-1.compute.internal | grep -A 6 "Non-terminated Pods"
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
demo nginx-demo-deployment-7587998bf4-dkd5t 0 (0%) 0 (0%) 0 (0%) 0 (0%) 8m27s
kube-system aws-node-j2c7p 25m (1%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system coredns-5947f47f5f-t4f89 100m (5%) 0 (0%) 70Mi (1%) 170Mi (2%) 8m27s
kube-system kube-proxy-4pjhk 100m (5%) 0 (0%) 0 (0%) 0 (0%) 11m
  1. 為確保終止 node ip-192-168-18-230.eu-west-1.compute.internal,透過以下 AWS CLI 終止 EKS worker node 同時進行 scale-in 至 0。
aws autoscaling terminate-instance-in-auto-scaling-group --instance-id i-07e9ea9eb19a67337 --should-decrement-desired-capacity

分析 log

Nodes

使用以下 CloudWatch Logs Insights syntax 分析 nodes,並由 Lambda function 所更新 node:

filter @logStream like /^kube-apiserver-audit/
| fields objectRef.name, objectRef.resource, @message
| sort @timestamp asc
| filter userAgent like 'vpcLambda' AND verb != 'get' AND verb != 'list' AND objectRef.resource == 'nodes'
| limit 10000

以下擷取部分 logs 較重要部分:

      "userAgent": "vpcLambda/v0.0.0 (linux/amd64) kubernetes/$Format",
"objectRef": {
"resource": "nodes",
"name": "ip-192-168-83-56.eu-west-1.compute.internal",
"apiVersion": "v1"
},
"responseStatus": {
"metadata": {},
"code": 200
},
"requestObject": {
"metadata": {
"labels": {
"node.kubernetes.io/exclude-from-external-load-balancers": "true"
}
}
},
...
...
"requestReceivedTimestamp": "2022-09-25T13:58:43.356273Z",
"stageTimestamp": "2022-09-25T13:58:43.375313Z",
      "userAgent": "vpcLambda/v0.0.0 (linux/amd64) kubernetes/$Format",
"objectRef": {
"resource": "nodes",
"name": "ip-192-168-83-56.eu-west-1.compute.internal",
"apiVersion": "v1"
},
"responseStatus": {
"metadata": {},
"code": 200
},
"requestObject": {
"spec": {
"unschedulable": true
}
},
...
...
"requestReceivedTimestamp": "2022-09-25T13:58:43.939359Z",
"stageTimestamp": "2022-09-25T13:58:43.947171Z",

由上述 logs 可以確認當 Nodes 終止時,EKS 會替 Node 更新以下兩點:

  1. 更新 label node.kubernetes.io/exclude-from-external-load-balancers,其用途是將 cloud-provider 所建立的 loadbalacer 排除 node。此 feature gate ServiceNodeExclusion1 於 Kubernetes 1.21 版本起 GA 預設為 enable。
  2. 更新 node 為 Unschedulable ture。

Pods

使用以下 CloudWatch Logs Insights syntax 分析 nodes,並由 Lambda function 所更新 pods:

filter @logStream like /^kube-apiserver-audit/
| fields objectRef.name, objectRef.resource, @message
| sort @timestamp asc
| filter userAgent like 'vpcLambda' AND verb != 'get' AND verb != 'list' AND objectRef.resource == 'pods'
| limit 10000

以下擷取部分 logs 較重要部分:

      "userAgent": "vpcLambda/v0.0.0 (linux/amd64) kubernetes/$Format",
"objectRef": {
"resource": "pods",
"namespace": "demo",
"name": "nginx-demo-deployment-7587998bf4-dkd5t",
"apiVersion": "v1",
"subresource": "eviction"
},
"responseStatus": {
"metadata": {},
"status": "Success",
"code": 201
},
"requestObject": {
"kind": "Eviction",
"apiVersion": "policy/v1beta1",
"metadata": {
"name": "nginx-demo-deployment-7587998bf4-dkd5t",
"namespace": "demo",
"creationTimestamp": null
}
},
...
...
"requestReceivedTimestamp": "2022-09-25T21:28:10.861815Z",
"stageTimestamp": "2022-09-25T21:28:10.883851Z",
      "userAgent": "vpcLambda/v0.0.0 (linux/amd64) kubernetes/$Format",
"objectRef": {
"resource": "pods",
"namespace": "kube-system",
"name": "coredns-5947f47f5f-t4f89",
"apiVersion": "v1",
"subresource": "eviction"
},
"responseStatus": {
"metadata": {},
"status": "Success",
"code": 201
},
"requestObject": {
"kind": "Eviction",
"apiVersion": "policy/v1beta1",
"metadata": {
"name": "coredns-5947f47f5f-t4f89",
"namespace": "kube-system",
"creationTimestamp": null
}
},
...
...
"requestReceivedTimestamp": "2022-09-25T21:28:10.995432Z",
"stageTimestamp": "2022-09-25T21:28:11.015341Z",

由上述 logs 可以確認當 Nodes 終止時,EKS 會替 Pods 進行 eviction。原先執行於 Node ip-192-168-18-230.eu-west-1.compute.internal 上有四個 Pod,其中被執行 eviction 為 Deployment 的 CoreDNS 及 Nginx Pod。

為什麼不是使用 eksctl 命令 scale in 而是使用 ASG 命令也可以與 managed node group 互通 ?

在上述 Auto Scaling Group scale in 之後,可以透過 eksctl get ng 命令查看 node group desired capacity 成功設置為 0。

$ eksctl get ng --cluster=ironman-2022
CLUSTER NODEGROUP STATUS CREATED MIN SIZE MAX SIZE DESIRED CAPACITY INSTANCE TYPE IMAGE ID ASG NAME TYPE
ironman-2022 demo-ng ACTIVE 2022-09-25T12:59:13Z 0 10 0 m5.large AL2_x86_64 eks-demo-ng-02c1ba5d-75b6-fbdf-4569-aa31827ad412 managed
ironman-2022 demo-self-ng CREATE_COMPLETE 2022-09-25T14:03:51Z 1 1 1 m5.large ami-08702177b0dcfc054 eksctl-ironman-2022-nodegroup-demo-self-ng-NodeGroup-1Q19K6MP0HTFJ unmanaged
ironman-2022 ng1-public-ssh ACTIVE 2022-09-14T09:54:36Z 0 10 0 m5.large AL2_x86_64 eks-ng1-public-ssh-62c19db5-f965-bdb7-373a-147e04d9f124 managed

為了確認 node group 與 Auto Scaling Group 關係,以及 managed node group 否做了什麼。因此我們分別透過 EKS 及 CloudFormation 方式查看 Auto Scaling Group 資源。

managed node group

我們可以透過 aws eks describe-nodegroup2 命令查看 managed node group 的 Auto Scaling Group 名稱。

$ aws eks describe-nodegroup --cluster=ironman-2022 --nodegroup-name=demo-ng --query nodegroup.resources.autoScalingGroups[0].name
"eks-demo-ng-02c1ba5d-75b6-fbdf-4569-aa31827ad412"
$ aws autoscaling describe-lifecycle-hooks --auto-scaling-group-name eks-demo-ng-02c1ba5d-75b6-fbdf-4569-aa31827ad412
{
"LifecycleHooks": [
{
"LifecycleHookName": "Launch-LC-Hook",
"AutoScalingGroupName": "eks-demo-ng-02c1ba5d-75b6-fbdf-4569-aa31827ad412",
"LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
"NotificationTargetARN": "arn:aws:sns:eu-west-1:346952985317:eks-asg-lifecycle-hook-topic",
"HeartbeatTimeout": 1800,
"GlobalTimeout": 172800,
"DefaultResult": "CONTINUE"
},
{
"LifecycleHookName": "Terminate-LC-Hook",
"AutoScalingGroupName": "eks-demo-ng-02c1ba5d-75b6-fbdf-4569-aa31827ad412",
"LifecycleTransition": "autoscaling:EC2_INSTANCE_TERMINATING",
"NotificationTargetARN": "arn:aws:sns:eu-west-1:346952985317:eks-asg-lifecycle-hook-topic",
"HeartbeatTimeout": 1800,
"GlobalTimeout": 172800,
"DefaultResult": "CONTINUE"
}
]
}

self-managed

$ aws cloudformation list-stack-resources --stack-name eksctl-ironman-2022-nodegroup-demo-self-ng
...
...
{
"LogicalResourceId": "NodeGroup",
"PhysicalResourceId": "eksctl-ironman-2022-nodegroup-demo-self-ng-NodeGroup-1Q19K6MP0HTFJ",
"ResourceType": "AWS::AutoScaling::AutoScalingGroup",
"LastUpdatedTimestamp": "2022-09-25T14:07:16.122000+00:00",
"ResourceStatus": "CREATE_COMPLETE",
"DriftInformation": {
"StackResourceDriftStatus": "NOT_CHECKED"
}
},
...
$ aws autoscaling describe-lifecycle-hooks --auto-scaling-group-name eksctl-ironman-2022-nodegroup-demo-self-ng-NodeGroup-1Q19K6MP0HTFJ
{
"LifecycleHooks": []
}

由上述比對兩組 Auto Scaling Group,可以確定 EKS managed node group 額外設定了 Auto Scaling lifecycle hooks3。而 lifecycle hook 主要目的是協助建立 Auto Scaling group 在對應的生命週期事件發生時,可以額外定義觸發相應動作。由上述輸出我們可以確認 EKS 在 scale-in 及 scale-out 事件皆會通知 SNS arn:aws:sns:eu-west-1:346952985317:eks-asg-lifecycle-hook-topic

  • autoscaling:EC2_INSTANCE_LAUNCHING:此代表 scale-out 事件
  • autoscaling:EC2_INSTANCE_TERMINATING :此代表 scale-in 事件

又基於先前 audit log 多數 user agent 皆是 VPC Lambda,不難想像此 SNS 則會接續叫用 Lambda function 來設定我們所觀察到更新 node label、Pod eviction 等行為。 然而非常遺憾的是,後續 SNS topic 並不屬於客戶端帳號,因此本文也無法繼續查看對應的 Lambda function 內容。

結論

EKS managed node group 透過 Auto Scaling lifecycle hooks3 方式更新 node 並讓 Pod eviction,相較於 self-managed node group 在 scale-in 或 scale-out 皆需要手動透過 kubectl coredon 命令確實相較方便。

參考文件

Footnotes

  1. Feature Gates Kubernetes - https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

  2. aws eks describe-nodegroup - https://docs.aws.amazon.com/cli/latest/reference/eks/describe-nodegroup.html

  3. Amazon EC2 Auto Scaling lifecycle hooks - https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html 2