Linux Pod 管理与调度实践

2次阅读

pod pending 且显示“0/3 nodes available”通常因资源请求超限、污点或亲和性规则导致调度失败，需检查events、resources.requests、节点taints及allocatable资源。

Linux Pod 管理与调度实践

Pod 被 Pending 但 `kubectl describe pod` 显示 “0/3 nodes are available”

这通常不是节点真的全挂了，而是调度器被资源请求、污点或亲和性规则拦住了。重点看 Events 区域最后一行的提示，比如 node(s) had taint {node-role.kubernetes.io/control-plane:NoSchedule} 或 Insufficient cpu。

实操建议：

检查 Pod 的 resources.requests 是否远超节点实际可用量（注意：requests 决定能否调度，limits 不影响）
用 kubectl get nodes -o wide 看节点状态和角色，再用 kubectl describe node <name></name> 查 Taints 和 Allocatable
若 Pod 必须跑在 control-plane 节点上，加容忍：tolerations: [{key: "node-role.kubernetes.io/control-plane", operator: "Exists", effect: "NoSchedule"}]
避免写死大数值如 requests: {cpu: "8"} —— 实际节点可能只有 4 核，且已预留 1 核给系统

`kubectl logs -f` 报错 “container is not found” 或日志空白

容器可能已崩溃重启过，或 Pod 启动失败根本没拉起容器；也可能是指定了错误的 -c 名称。

实操建议：

先确认容器名：kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].name}'</pod-name>
查容器状态：kubectl get pod <pod-name> -o wide</pod-name> 看 STATUS 列，CrashLoopBackOff 或 Error 表示启动失败
查上次终止原因：kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[?(@.name=="<container-name>")].state.waiting.reason}'</container-name></pod-name>
若容器已退出，kubectl logs --previous 才能拿到上一轮日志

Node NotReady 但 `systemctl status kubelet` 显示 active

Kubelet 活着不等于它能正常上报状态——常见原因是网络插件异常、cgroup 驱动不匹配，或磁盘/内存压力触发了 NodeCondition。

实操建议：

查 kubelet 日志关键线索：journalctl -u kubelet -n 100 --no-pager | grep -E "(failed|error|cgroup|network)"
确认 cgroup 驱动是否与容器运行时一致：cat /var/lib/kubelet/config.yaml | grep cgroupDriver 和 crictl info | jq .cgroupDriver
检查节点资源压力：kubectl describe node <node-name></node-name> 中 Conditions 下的 MemoryPressure 或 DiskPressure 是否为 True
临时绕过 kube-proxy 或 CNI 故障：删掉 /var/lib/kubelet/pki/kubelet-client-current.pem 并重启 kubelet（仅调试用）

Pod 间 DNS 解析失败，`nslookup nginx.default.svc.cluster.local` 超时

不是 DNS 服务挂了，就是 Pod 网络路径断在某一层：CoreDNS Pod 本身不可达、Service ClusterIP 不通、或 Pod 的 /etc/resolv.conf 配置错误。

实操建议：

确认 CoreDNS Pod 运行正常且就绪：kubectl get pods -n kube-system -l k8s-app=kube-dns
进一个正常 Pod，手动 ping CoreDNS 的 ClusterIP：ping <coredns-clusterip></coredns-clusterip>；不通说明 Service 或网络插件问题
检查故障 Pod 的 /etc/resolv.conf：第一行必须是 nameserver <cluster-ip-of-coredns></cluster-ip-of-coredns>，不能是 127.0.0.1 或宿主机 DNS
若用了 hostNetwork，DNS 会退化到宿主机配置，此时 svc.cluster.local 域名必然解析失败

事情说清了就结束。Pod 调度和网络问题最常卡在“看起来正常，其实某个链路早断了”——别只盯着报错那行，得顺着 kubectl describe 里的 Events、Conditions、Containers 状态一层层往下翻。

发表于：web3.0

近一天内

复制链接

C++里的std::vector是如何动态扩容的？（1.5倍或2倍的内存重新分配）

欧易OKX手机端app下载入口欧易OKX官方App最新版v6.168.2安装下载

欧易交易所APP官方版 v6.139.0 安卓安装包下载

深度分析：下一轮加密周期会是什么样？

在 Google Chart 上添加可动态更新的居中覆盖文本（Overlay）

Linux Pod 管理与调度实践

Pod 被 Pending 但 `kubectl describe pod` 显示 “0/3 nodes are available”

`kubectl logs -f` 报错 “container is not found” 或日志空白

Node NotReady 但 `systemctl status kubelet` 显示 active

Pod 间 DNS 解析失败，`nslookup nginx.default.svc.cluster.local` 超时

美元指数如何影响加密货币市场？完整关联解析

Python JWT 的安全隐患分析

sublime怎么自定义快捷键_sublime修改默认按键绑定教程【详解】

Sublime如何一键生成Lorem ipsum Sublime随机填充文本生成【插件】

Golang内联函数对性能优化的作用

Linux rsync 增量备份高级技巧

宝塔PHP内存占用高怎么查_进程监控与释放多余负载【解答】

case when可以写在select里吗_mysql语法位置解析

css 十六进制颜色和 rgb 有什么区别_从颜色组成方式角度进行对比说明

C#如何获取电脑硬件信息 C# ManagementObjectSearcher用法

Linux Pod 管理与调度实践

Pod 被 Pending 但 kubectl describe pod 显示 “0/3 nodes are available”

kubectl logs -f 报错 “container is not found” 或日志空白

Node NotReady 但 systemctl status kubelet 显示 active

Pod 间 DNS 解析失败，nslookup nginx.default.svc.cluster.local 超时

Pod 被 Pending 但 `kubectl describe pod` 显示 “0/3 nodes are available”

`kubectl logs -f` 报错 “container is not found” 或日志空白

Node NotReady 但 `systemctl status kubelet` 显示 active

Pod 间 DNS 解析失败，`nslookup nginx.default.svc.cluster.local` 超时