Fixing Kubernetes NotReady State: Node Taints, SR-IOV & nodeSelector Conflicts [2025 Guide]
Overview
In complex Kubernetes environments, particularly shared clusters, applications can sometimes enter a NotReady state due to misaligned node-level configurations issues. This article explain a real-world scenario that involves a Kubernetes workload that failed due to node taints, nodeSelector constraints and SR-IOV interface requirements.
Our goal is to help DevOps engineers and administrators to prevent similar issues by understanding how taints and scheduling constraints interact in a production environment.
Root Cause: Node Taint Preventing Pod Scheduling
The below is our observations for the above issue:-
The application is deployed with a nodeSelector and added to that the application demands “SRIOV” interfaces to handle SR-IOV workloads:
“SR-IOV (Single Root I/O Virtualization) is a hardware based network virtualization technology that enables high-performance, low-latency networking.”
The app manifest file having node-selector as follows:-
nodeSelector:
node-type: app1
Command to check/add labels on node :-
kubectl get nodes --show-labels | grep app1
kubectl get nodes --show-labels | grep node1
kubectl label node node1 node=app1
Command to unlabel node:-
kubectl label node node1 node=app1-
As per the finding, the node1 had SR-IOV (Single Root I/O Virtualization) interfaces enabled to support high-performance networking needs.
The issue was triggered on a specific date, when a following NoSchedule taint was added to the node1 as follows:
Taints: node-role=voice-services:NoSchedule
Once this taint was applied, Kubernetes’ scheduler prevented the application’s pods from being assigned to the node, causing them to enter a NotReady or unschedulable state.
Solution:
To Restore Pod Scheduling, we started with Updating Node Configuration as follows
For the validation purpose ,the taint was removed from the affected node. This change allowed the pods to reschedule correctly and the application returned to a Ready state immediately.
Commands to taint/untaint node:-
kubectl taint node node1 node=voice-services:NoSchedule
kubectl taint node node1 node=voice-services:NoSchedule-
Deployment Configuration Notes
The application’s networking settings are defined using a YAML file during deployment. A sample of this configuration might be such as the following:
ippools: ["ipv6-pool", "ipv4-pool"]
roles:
- name: high-perf-network
ippools:
- ippool: ipv4-pool
static_ips: ["<static-ip-v4>"]
- ippool: ipv6-pool
static_ips: ["<static-ip-v6>"]
primary: ipv4-pool
These settings are usually applied only once at the time of deployment. After that, if there are any changes made to the Kubernetes nodes like adding taints, updating labels or removing required features like SR-IOV network support, it could cause the application to fail or go into a NotReady state.
To avoid the issues, always ensure that the application’s configuration matches the current state of the node it’s scheduled to run on.
“In other scenarios, Where removing a node taint is not desirable such as in production or shared environments a better solution is to add a matching toleration to the application manifest. This approach allows Kubernetes to schedule the application on tainted nodes intentionally and safely.”
Best Practices & Recommendations:
- Align node taints and labels with your application’s nodeSelector and tolerations.
- Ensure that required network interfaces (like SR-IOV) are consistently configured on designated nodes.
- Define critical network and IP pool parameters in your Kubernetes manifests or Helm charts.
- Always communicate infrastructure changes especially taints and node roles across all relevant teams.
- Test taint and toleration interactions in a staging environment before rolling out to production.
![Fixing Kubernetes NotReady State: Node Taints, SR-IOV & nodeSelector Conflicts [2025 Guide] 3 Ayushi](https://i0.wp.com/www.thefeedline.com/wp-content/uploads/2025/05/Ayushi.jpeg?resize=100%2C100&ssl=1)