Compare commits

...

3 Commits

Author SHA1 Message Date
k8s-infra-cherrypick-robot
683ee4233f wait for control plane node to become ready after joining (#12924)
When joining a control plane node and "upgrading" the cluster setup (for
example, to update etcd addresses after adding a new etcd) in the same
playbook run, the node can take a bit of time to become ready after
joining.
This triggers a kubeadm preflight check (ControlPlaneNodesReady) in
kubeadm upgrade, which is run directly after the join tasks.

Add a configurable wait for the control plane node to become Ready to
fix this race condition.

Co-authored-by: Max Gautier <mg@max.gautier.name>
2026-01-29 14:47:50 +05:30
k8s-infra-cherrypick-robot
4ff716dddd etcd-certs: only change necessary permissions (#12914)
We currently **recursively** set the permissions of /etc/ssl/etcd/ssl
(default path) to 700. But this removes group permission from the files
under it, and certain composents (like calio with etcd datastore) rely
on it ; thus, the upgrade of a cluster can fail because the
calico-kube-controller can't access the certs, and thus the etcd.

This works in other case because as far as I can tell, the apiserver
which do access the etcd run as root (the owner of the files, not just
the "group owner")

We also for some reasons do this twice.

Only create the etcd cert directory with the correct permissions once,
not recursively.

Co-authored-by: Max Gautier <mg@max.gautier.name>
2026-01-27 20:29:51 +05:30
k8s-infra-cherrypick-robot
73fcc6075d Docs: cilium_kube_proxy_replacement change boolean (#12911)
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Co-authored-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
2026-01-27 17:03:49 +05:30
5 changed files with 21 additions and 14 deletions

View File

@@ -56,8 +56,8 @@ cilium_l2announcements: false
#
# Only effective when monitor aggregation is set to "medium" or higher.
# cilium_monitor_aggregation_flags: "all"
# Kube Proxy Replacement mode (strict/partial)
# cilium_kube_proxy_replacement: partial
# Kube Proxy Replacement mode (true/false)
# cilium_kube_proxy_replacement: false
# If upgrading from Cilium < 1.5, you may want to override some of these options
# to prevent service disruptions. See also:

View File

@@ -5,8 +5,7 @@
group: "{{ etcd_cert_group }}"
state: directory
owner: "{{ etcd_owner }}"
mode: "{{ etcd_cert_dir_mode }}"
recurse: true
mode: "0700"
- name: "Gen_certs | create etcd script dir (on {{ groups['etcd'][0] }})"
file:
@@ -145,15 +144,6 @@
- ('k8s_cluster' in group_names) and
sync_certs | default(false) and inventory_hostname not in groups['etcd']
- name: Gen_certs | check certificate permissions
file:
path: "{{ etcd_cert_dir }}"
group: "{{ etcd_cert_group }}"
state: directory
owner: "{{ etcd_owner }}"
mode: "{{ etcd_cert_dir_mode }}"
recurse: true
# This is a hack around the fact kubeadm expect the same certs path on all kube_control_plane
# TODO: fix certs generation to have the same file everywhere
# OR work with kubeadm on node-specific config

View File

@@ -18,7 +18,6 @@ etcd_backup_retention_count: -1
force_etcd_cert_refresh: true
etcd_config_dir: /etc/ssl/etcd
etcd_cert_dir: "{{ etcd_config_dir }}/ssl"
etcd_cert_dir_mode: "0700"
etcd_cert_group: root
# Note: This does not set up DNS entries. It simply adds the following DNS
# entries to the certificate

View File

@@ -2,6 +2,9 @@
# disable upgrade cluster
upgrade_cluster_setup: false
# Number of retries (with 5 seconds interval) to check that new control plane nodes
# are in Ready condition after joining
control_plane_node_become_ready_tries: 24
# By default the external API listens on all interfaces, this can be changed to
# listen on a specific address/interface.
# NOTE: If you specific address/interface and use loadbalancer_apiserver_localhost

View File

@@ -99,3 +99,18 @@
when:
- inventory_hostname != first_kube_control_plane
- kubeadm_already_run is not defined or not kubeadm_already_run.stat.exists
- name: Wait for new control plane nodes to be Ready
when: kubeadm_already_run.stat.exists
run_once: true
command: >
{{ kubectl }} get nodes --selector node-role.kubernetes.io/control-plane
-o jsonpath-as-json="{.items[*].status.conditions[?(@.type == 'Ready')]}"
register: control_plane_node_ready_conditions
retries: "{{ control_plane_node_become_ready_tries }}"
delay: 5
delegate_to: "{{ groups['kube_control_plane'][0] }}"
until: >
control_plane_node_ready_conditions.stdout
| from_json | selectattr('status', '==', 'True')
| length == (groups['kube_control_plane'] | length)