fix(remove-node): Ensure safety and validation for node removal process (#12085)

This commit enhances the node removal playbook's reliability and safety by implementing the following changes:

1. **Node Validation**: Added a validation step using assert to ensure the `node` variable is defined and contains nodes. If the list is empty or undefined, the playbook fails early, preventing accidental operations on the entire cluster.

2. **Removed Defaulting for Hosts**: Updated tasks to enforce explicit `node` variable input without defaulting to critical groups (e.g., `etcd:k8s_cluster:calico_rr`). By validating `node` beforehand, tasks now solely rely on user-provided input and safely avoid unintended targeting.

3. **Explicit User Confirmation**: Enhanced the confirmation prompt to clarify the scope of the operation. The admin is now required to explicitly confirm node state deletion, ensuring a deliberate decision before proceeding.

These improvements strengthen the reliability and safety of the `remove-node.yml` playbook by eliminating ambiguous behavior, preventing misconfigurations, and ensuring clear interaction during node removal tasks.
This commit is contained in:
Farshad Asadpour
2025-03-27 16:40:34 +03:30
committed by GitHub
parent 4a5b524b98
commit 1513254622
2 changed files with 15 additions and 3 deletions

View File

@@ -59,6 +59,8 @@ ansible-playbook -i inventory/mycluster/hosts.yml remove-node.yml -b -v \
--extra-vars "node=nodename,nodename2"
```
> Note: The playbook does not currently support the removal of the first control plane or etcd node. These nodes are essential for maintaining cluster operations and must remain intact.
If a node is completely unreachable by ssh, add `--extra-vars reset_nodes=false`
to skip the node reset step. If one node is unavailable, but others you wish
to remove are able to connect via SSH, you could set `reset_nodes=false` as a host