Before "5ca23e3bf (Changed to use first_kube_control_plane to parse
kubeadm_certificate_key (#11875), 2025-01-14)", kubespray would have
problem adding new control planes when the order of the nodes in kubectl
output and the ansible inventory were not the same.
But the underlying problem is that the operation is fundamentally
something that should be done only once, and recorded for all host in
play.
Since `register` and `sef_fact` when used with `run_once` set the
variable for all the hosts, use it. Also allows to use the variable
directly instead of relying on hostvars to make the task more readable.
Most variables should have a default instead of relying on the default
filter.
(Note that the variable is misnomed, this should be certs and not keys,
but it's not worth breaking compat).
When loadbalancer_apiserver_localhost is enabled, Calico falls back to the
Kubernetes service IP because the kubernetes-services-endpoint ConfigMap is
empty. CNI then fails to reach the API server even though an nginx proxy is
listening on localhost.
Update kube_apiserver_global_endpoint to always reference the localhost load
balancer (respecting the configured port) and populate the ConfigMap for both
eBPF and localhost LB modes.
* control-plane: fix first_kube_control_plane delegation with kube_override_hostname
When kube_override_hostname is configured, the node names reported by
`kubectl get nodes` differ from the inventory_hostname known to Ansible.
This causes delegation failures in subsequent tasks since Ansible cannot
resolve the hostname from kubectl output to an inventory host.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
* control-plane: remove fragile first_control_plane selection logic
Current implementation breaks with kube_override_hostname and has
multiple edge cases. Drop until proper kubectl-based node lookup
can be implemented.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
---------
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
This should make 'no space left on device' problems easier to handle
Use /tmp/releases as local_release_dir CI created machine, while keeping
the same folder on the runner (needed for gitlab-ci runner pods)
* CI: Try a full ssh connection on hosts instead of only checking the port
If we only try the port, we can try to connect in the playbook which is
executed next even though the managed node has not yet completed it's
boot-up sequence ("System is booting up. Unprivileged users are not
permitted to log in yet. Please come back later. For technical details,
see pam_nologin(8).")
This does not account for python-less hosts, but we don't use those in
CI anyway (for now, at least).
* CI: Remove connection method override when creating VMs
This prevented wait_for_connection to work correctly by hijacking the
connection to localhost, thus bypassing the connection check.
Add missing RBAC permissions for Calico apiserver to function correctly
with Kubernetes 1.33+
Changes:
1. Add K8s 1.33 ValidatingAdmissionPolicy resources to calico-webhook-reader
- validatingadmissionpolicies
- validatingadmissionpolicybindings
Kubernetes 1.33 introduced ValidatingAdmissionPolicy resources (KEP-3488)
that require explicit RBAC permissions. Without these changes, Calico
apiserver on k8s 1.33+ will not work and needless errors are logged
* Remove etcd member by peerURLs
The way to obtain the IP of a particular member is convoluted and depend
on multiple variables. The match is also textual and it's not clear
against what we're matching
It's also broken for etcd member which are not also Kubernetes nodes,
because the "Lookup node IP in kubernetes" task will fail and abort the
play.
Instead, match against 'peerURLs', which does not need new variable, and
use json output.
* Add testcase for etcd removal on external etcd
* do not merge
* fixup! Remove etcd member by peerURLs
* fixup! Remove etcd member by peerURLs
The 'old' playbook and the collection use '-' and '_' as separator,
which breaks the logic in scripts/testcases_run.sh.
Add aliases using the old schemes to make the test work and avoid
breaking anything.
Both '-' and '_' variants will be deleted once we switch to supporting
collection only.
fixed kubelet condition
CRI-O: fix for handling of container runtime switching
refactored kubelet start condition
stop/start kubelet and crio only when default runtime is changed
fixed condition for runtime_matches fact variable
fixed set facts for existing container runtime
added crio runtime switch variable
changed condition to use runtime switch variable
added comment for not-found for readers
Allow setting deployment replicas through `coredns_replicas` when
`enable_dns_autoscaler` is set to false.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
The option ara_default was still present in ansible.cfg under callbacks_enabled.
This is a leftover from commit b9e9364 ("Remove ara support in CI") and should
have been removed together with the rest of the ara integration.
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Enable reserved variable name checks and fix violations
Updated .ansible-lint configuration to skip only var-naming[pattern]
and var-naming[no-role-prefix] instead of skipping the entire var-naming rule.
This enables the check for reserved variable names.
Renamed variables that used reserved names to avoid conflicts.
Updated all references in tasks, variables, and templates.
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Rename namespace variable inside tasks instead of deleting it
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Change hosts variable to vm_hosts
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Use k8s_namespace instead of dashboard_namespace in dashboard.yml.j2 template
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
---------
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
Fixes a bug where `kube-apiserver` fails to start if the PodSecurity
configuration file doesn't have the `apiVersion` and `kind` keys.
Signed-off-by: Alejandro Macedo <alex.macedopereira@gmail.com>
Retrying to load conntrack modules was bound to fail due to the way, the current `when` conditions were utilized.
It was based on the assumption, that in case of success, the registered variable would have an `rc` attribute with the value `0`.
Unfortunately, the `rc` attribute is only present in case of a failure, where it's value is >1.
The result of `community.general.modprobe` in case of success looks like this:
```
{
"changed": false,
"msg": "All items completed",
"results": [
{
"ansible_loop_var": "item",
"changed": false,
"failed": false,
"invocation": {
"module_args": {
"name": "nf_conntrack",
"params": "",
"persistent": "present",
"state": "present"
}
},
"item": "nf_conntrack",
"name": "nf_conntrack",
"params": "",
"state": "present"
}
],
"skipped": false
}
```
While it looks like this in case of a failure:
```
{
"changed": false,
"failed": true,
"msg": "One or more items failed",
"results": [
{
"ansible_loop_var": "item",
"attempts": 3,
"changed": false,
"failed": true,
"invocation": {
"module_args": {
"name": "nf_conntrack_doesnotexist",
"params": "",
"persistent": "present",
"state": "present"
}
},
"item": "nf_conntrack_doesnotexist",
"msg": "modprobe: FATAL: Module nf_conntrack_doesnotexist not found in directory /lib/modules/5.14.0-570.32.1.el9_6.x86_64\n",
"name": "nf_conntrack_doesnotexist",
"params": "",
"rc": 1,
"state": "present",
"stderr": "modprobe: FATAL: Module nf_conntrack_doesnotexist not found in directory /lib/modules/5.14.0-570.32.1.el9_6.x86_64\n",
"stderr_lines": [
"modprobe: FATAL: Module nf_conntrack_doesnotexist not found in directory /lib/modules/5.14.0-570.32.1.el9_6.x86_64"
],
"stdout": "",
"stdout_lines": []
}
],
"skipped": false
}
```
By evaluating `failed` instead, this issue can be prevented.
See also:
- https://github.com/kubernetes-sigs/kubespray/issues/11340
Co-authored-by: Max Gautier <mg@max.gautier.name>
The Prometheus Operator CRDs are commonly used for monitoring and are
used by some CNIs (such as Cilium). Kubespray can be installed first,
and the subsequent installation of the operator can be handled by the
user (or later extensions).
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Add variable to set kubelet staticPodPath location.
It can be set to empty so that we can choose to disable it for some nodes.
STIG recommendation is to disable it.
Signed-off-by: Shaleen Bathla <shaleenbathla@gmail.com>
Co-authored-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Debian Trixie recently removed the package `software-properties-common`,
add the condition not on Debian Trixie.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Remove --auth-anonymous if kube_api_anonymous_auth in undefined, to avoid
compatibility errors with other arguments of the kube-apiserver, such as
--authentication-config when anonymous field is configured.
* Add header configuration in containerd hosts.toml
Signed-off-by: Alexander Gil <pando855@gmail.com>
* Disable log output on containerd mirrors settings if required
Signed-off-by: Alexander Gil <pando855@gmail.com>
---------
Signed-off-by: Alexander Gil <pando855@gmail.com>
* docs: remove obsolete reference to `gen_tags.sh`
`scripts/gen_tags.sh` was removed in 373b952a0c
* docs: fix 404 links
Merge the `Requirements` section with the `Usage` section and just
reference the inventory documentation, which then points to all further
information related to group vars etc.
* feat(subnet): Ensure Vagrant subnet not in use by localhost
This commit ensures that Vagrantfile supplied $subnet is not in use by
the localhost. Previously, if the subnet is in use by localhost (i.e.
bridge network), Vagrant VM boxes can not communicate.
* refactor(socket): Use ruby Socket library to find addrs
This commit reverts the usage of Ruby .scan() which may result in
failure if program is not provided. Instead, this commit refactors to
use Socket library to determine interfaces in use, then proceeds to
compare with Vagrantfile supplied subnets. Additionally, the commit
supports IPv6 comparisons.
This allows to use kubespray_defaults (once) instead of redefining
defaults in the tests.
Test test files becomes imported tasks rather thand standalone
playbooks.
* Test: molecule replace ubuntu2004 with ubuntu2204 ubuntu2404
cri-dockerd, adduser and bastion-ssh-config can't run ubuntu2404, maybe needs to check login.
"System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)."
Signed-off-by: ChengHao Yang
<17496418+tico88612@users.noreply.github.com>
* Test: replace ubuntu-2004 with ubuntu-2404
All ubuntu-2004 tests are removed.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
* Docs: update ci.md
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
* Docs: update README.md
Remove Ubuntu 20.04 support
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
---------
Signed-off-by: ChengHao Yang
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Not really a reason not to, and this actually breaks daily-ci because
some jobs depends on this one so the whole pipeline is invalid if it's
not created.
This uses the same logic than the other versions, with simplications for
crictl and crio whose versionning scheme is tied to upstream kubernetes.
Also move some version variables in vars/ rather than defaults/, because
they are not used elsewhere and don't really make sense as modifiable by
the user.
Currently, there is no reliable way to obtain individual CRD files, so
the only solution is to update first.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
When installing or upgrading in the past, there was no validation
config. Check if the file exists first to prevent subsequent validation
errors.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
The validation step is moved to the end to avoid the loss of files that
may lead to verification failure.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
`cilium install` is equivalent to `helm install`, it will failed if
cilium relase exist. `cilium version` can know the release exist without
helm binary
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Give users two options: besides skip Cilium, add
`cilium_remove_old_resources`, default is `false`, when set to `true`,
it will remove the content of the old version, but it will cause the
downtime, need to be careful to use.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
This patch fixes the indentation in the `encryption` section.
Previously configuration like this:
```yml
cilium_encryption_enabled: true
cilium_encryption_type: wireguard
```
Would template to a `values.yaml` file with indentation that looks like this:
```yml
encryption:
enabled: True
type: wireguard
nodeEncryption: False
```
instead of this:
```yml
encryption:
enabled: true
type: wireguard
nodeEncryption: false
```
This syntax issue causes an error during Cilium installation.
This patch also makes all boolean values in this template file go through the `to_json` filter.
Since values like `True` and `False` are not compliant with the YAML v1.2 spec,
avoiding them is preferable.
`to_json` may be used for all other values in this template to ensure we end up with
a valid YAML document in all cases (even when various strings include special characters),
but this was left for another (future) patch.
* Fix: check expiraty before renew
Since certificate renewal and container restarts involve higher risks,
they should be executed with extra caution.
* squash to Fix: check expiraty before renew
* squash to Fix: address more comments from VannTen
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
---------
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
* Remove tag master
Following it's deprecation in 4b324cb0f (Rename master to control plane
- non-breaking changes only (#11394), 2024-09-06)
* Add fail fast path when using removed tags
- Used for the master tag, but this could be used for other things in
the future
@@ -183,9 +182,6 @@ You can choose among ten network plugins. (default: `calico`, except Vagrant use
- [cilium](http://docs.cilium.io/en/latest/): layer 3/4 networking (as well as layer 7 to protect and secure application protocols), supports dynamic insertion of BPF bytecode into the Linux kernel to implement security services, networking and visibility logic.
- [weave](docs/CNI/weave.md): Weave is a lightweight container overlay network that doesn't require an external K/V database cluster.
(Please refer to `weave` [troubleshooting documentation](https://www.weave.works/docs/net/latest/troubleshooting/)).
- [kube-ovn](docs/CNI/kube-ovn.md): Kube-OVN integrates the OVN-based Network Virtualization with Kubernetes. It offers an advanced Container Network Fabric for Enterprises.
- [kube-router](docs/CNI/kube-router.md): Kube-router is a L3 CNI for Kubernetes networking aiming to provide operational
To ensure compatibility with older distributions (such as Debian 11), you can use a static containerd binary. By default, this is static binary if the system's glibc version is less than 2.34; otherwise, it is the default binary.
- [Create TLS Root CA Certificate and Key](#create-tls-root-ca-certificate-and-key)
Cert-Manager is a native Kubernetes certificate management controller. It can help with issuing certificates from a variety of sources, such as Let’s Encrypt, HashiCorp Vault, Venafi, a simple signing key pair, or self signed. It will ensure certificates are valid and up to date, and attempt to renew certificates at a configured time before expiry.
*Warning:* Mitogen support is now deprecated in kubespray due to upstream not releasing an updated version to support ansible 4.x (ansible-base 2.11.x) and above. The CI support has been stripped for mitogen and we are no longer validating any support or regressions for it. The supporting mitogen install playbook and integration documentation will be removed in a later version.
[Mitogen for Ansible](https://mitogen.networkgenomics.com/ansible_detailed.html) allow a 1.25x - 7x speedup and a CPU usage reduction of at least 2x, depending on network conditions, modules executed, and time already spent by targets on useful work. Mitogen cannot improve a module once it is executing, it can only ensure the module executes as quickly as possible.
## Install
```ShellSession
ansible-playbook contrib/mitogen/mitogen.yml
```
The above playbook sets the ansible `strategy` and `strategy_plugins` in `ansible.cfg` but you can also enable them if you use your own `ansible.cfg` by setting the environment varialbles:
Extra vars are best used to override kubespray internal variables, for instances, roles/vars/.
Those vars are usually **not expected** (by Kubespray developers) to be modified by end users, and not part of Kubespray
interface. Thus they can change, disappear, or break stuff unexpectedly.
> Extra vars are best used to override kubespray internal variables, for instances, roles/vars/. Those vars are usually **not expected** (by Kubespray developers) to be modified by end users, and not part of Kubespray interface. Thus they can change, disappear, or break stuff unexpectedly.
## Ansible tags
@@ -118,12 +115,11 @@ The following tags are defined in playbooks:
Note: use `--tags` and `--skip-tags` wisely and only if you're 100% sure what you're doing.
## Mitogen
Mitogen support is deprecated, please see [mitogen related docs](/docs/advanced/mitogen.md) for usage and reasons for deprecation.
## Troubleshooting Ansible issues
Having the wrong version of ansible, ansible collections or python dependencies can cause issue.
In particular, Kubespray ship custom modules which Ansible needs to find, for which you should specify [ANSIBLE_LIBRAY](https://docs.ansible.com/ansible/latest/dev_guide/developing_locally.html#adding-a-module-or-plugin-outside-of-a-collection)
In particular, Kubespray ship custom modules which Ansible needs to find, for which you should specify [ANSIBLE_LIBRARY](https://docs.ansible.com/ansible/latest/dev_guide/developing_locally.html#adding-a-module-or-plugin-outside-of-a-collection)
```ShellSession
export ANSIBLE_LIBRAY=<kubespray_dir>/library`
export ANSIBLE_LIBRARY=<kubespray_dir>/library`
```
A simple way to ensure you get all the correct version of Ansible is to use
@@ -206,11 +193,11 @@ You will then need to use [bind mounts](https://docs.docker.com/storage/bind-mou
to access the inventory and SSH key in the container, like this:
```ShellSession
git checkout v2.27.0
docker pull quay.io/kubespray/kubespray:v2.27.0
git checkout v2.29.0
docker pull quay.io/kubespray/kubespray:v2.29.0
docker run --rm -it --mount type=bind,source="$(pwd)"/inventory/sample,dst=/inventory \
Kubespray can be installed as an [Ansible collection](https://docs.ansible.com/ansible/latest/user_guide/collections_using.html).
## Requirements
- An inventory file with the appropriate host groups. See the [README](../README.md#usage).
- A `group_vars` directory. These group variables **need** to match the appropriate variable names under `inventory/local/group_vars`. See the [README](../README.md#usage).
## Usage
1.Add Kubespray to your requirements.yml file
1.Set up an inventory with the appropriate host groups and required group vars.
See also the documentation on [kubespray inventories](./inventory.md) and the
general ["Getting started" documentation](../getting_started/getting-started.md#building-your-own-inventory).
2. Add Kubespray to your requirements.yml file
```yaml
collections:
@@ -18,20 +17,20 @@ Kubespray can be installed as an [Ansible collection](https://docs.ansible.com/a
version: master # use the appropriate tag or branch for the version you need
```
2. Install your collection
3. Install your collection
```ShellSession
ansible-galaxy install -r requirements.yml
```
3. Create a playbook to install your Kubernetes cluster
4. Create a playbook to install your Kubernetes cluster
@@ -103,13 +103,13 @@ following default cluster parameters:
* *kube_service_addresses_ipv6* - Subnet for cluster IPv6 IPs (default is ``fd85:ee78:d8a6:8607::1000/116``). Must not overlap with ``kube_pods_subnet_ipv6``.
* *kube_service_subnets* - All service subnets separated by commas (default is a mix of ``kube_service_addresses`` and ``kube_service_addresses_ipv6`` depending on ``ipv4_stack`` and ``ipv6_stacke`` options),
* *kube_service_subnets* - All service subnets separated by commas (default is a mix of ``kube_service_addresses`` and ``kube_service_addresses_ipv6`` depending on ``ipv4_stack`` and ``ipv6_stack`` options),
for example ``10.233.0.0/18,fd85:ee78:d8a6:8607::1000/116`` for dual stack(ipv4_stack/ipv6_stack set to `true`).
It is not recommended to change this variable directly.
* *kube_pods_subnet_ipv6* - Subnet for Pod IPv6 IPs (default is ``fd85:ee78:d8a6:8607::1:0000/112``). Must not overlap with ``kube_service_addresses_ipv6``.
* *kube_pods_subnets* - All pods subnets separated by commas (default is a mix of ``kube_pods_subnet`` and ``kube_pod_subnet_ipv6`` depending on ``ipv4_stack`` and ``ipv6_stacke`` options),
* *kube_pods_subnets* - All pods subnets separated by commas (default is a mix of ``kube_pods_subnet`` and ``kube_pod_subnet_ipv6`` depending on ``ipv4_stack`` and ``ipv6_stack`` options),
for example ``10.233.64.0/18,fd85:ee78:d8a6:8607::1:0000/112`` for dual stack(ipv4_stack/ipv6_stack set to `true`).
It is not recommended to change this variable directly.
1. build: build a docker image to be used in the pipeline
2. unit-tests: fast jobs for fast feedback (linting, etc...)
3. deploy-part1: small number of jobs to test if the PR works with default settings
4. deploy-extended: slow jobs testing different platforms, OS, settings, CNI, etc...
5. deploy-extended: very slow jobs (upgrades, etc...)
See [.gitlab-ci.yml](/.gitlab-ci.yml) and the included files for an overview.
## Runners
Kubespray has 3 types of GitLab runners:
Kubespray has 2 types of GitLab runners, both deployed on the Kubespray CI cluster (hosted on Oracle Cloud Infrastructure):
- packet runners: used for E2E jobs (usually long), running on Equinix Metal (ex-packet), on kubevirt managed VMs
-light runners: used for short lived jobs, running on Equinix Metal (ex-packet), as managed pods
- auto scaling runners (managed via docker-machine on Equinix Metal): used for on-demand resources, see [GitLab docs](https://docs.gitlab.com/runner/configuration/autoscale.html) for more info
- pods: use the [gitlab-ci kubernetes executor](https://docs.gitlab.com/runner/executors/kubernetes/)
-vagrant: custom executor running in pods with access to the libvirt socket on the nodes
## Vagrant
@@ -22,18 +17,17 @@ Vagrant jobs are using the [quay.io/kubespray/vagrant](/test-infra/vagrant-docke
## CI Variables
In CI we have a set of overrides we use to ensure greater success of our CI jobs and avoid throttling by various APIs we depend on. See:
In CI we have a [set of extra vars](/test/common_vars.yml) we use to ensure greater success of our CI jobs and avoid throttling by various APIs we depend on.
DISCLAIMER: The following information is not fully up to date, in particular, the CI cluster is now on Oracle Cloud Infrastcture, not Equinix.
The CI packet and light runners are deployed on a kubernetes cluster on Equinix Metal. The cluster is deployed with kubespray itself and maintained by the kubespray maintainers.
The cluster is deployed with kubespray itself and maintained by the kubespray maintainers.
The following files are used for that inventory:
### cluster.tfvars
### cluster.tfvars (OBSOLETE: this section is no longer accurate)
```ini
# your Kubernetes cluster name here
@@ -162,22 +156,10 @@ kube_feature_gates:
- "NodeSwap=True"
```
## Aditional files
## Additional files
This section documents additional files used to complete a deployment of the kubespray CI, these files sit on the control-plane node and assume a working kubernetes cluster.
@@ -37,4 +37,4 @@ If you have containers that are using iptables in the host network namespace (`h
you need to ensure they are using iptables-nft.
An example how k8s do the autodetection can be found [in this PR](https://github.com/kubernetes/kubernetes/pull/82966)
The kernel version is lower than the kubenretes 1.32 system validation, please refer to the [kernel requirements](../operations/kernel-requirements.md).
The kernel version is lower than the kubernetes 1.32 system validation, please refer to the [kernel requirements](../operations/kernel-requirements.md).
# To disable kubelet's staticPodPath (for nodes that don't use static pods like worker nodes)
kubelet_static_pod_path:""
# In case you have multiple interfaces in your
# control plane nodes and you want to specify the right
# IP addresses, kubelet_secure_addresses allows you
@@ -126,9 +126,8 @@ Let's take a deep look to the resultant **kubernetes** configuration:
* The `encryption-provider-config` provide encryption at rest. This means that the `kube-apiserver` encrypt data that is going to be stored before they reach `etcd`. So the data is completely unreadable from `etcd` (in case an attacker is able to exploit this).
* The `rotateCertificates` in `KubeletConfiguration` is set to `true` along with `serverTLSBootstrap`. This could be used in alternative to `tlsCertFile` and `tlsPrivateKeyFile` parameters. Additionally it automatically generates certificates by itself. By default the CSRs are approved automatically via [kubelet-csr-approver](https://github.com/postfinance/kubelet-csr-approver). You can customize approval configuration by modifying Helm values via `kubelet_csr_approver_values`.
See <https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/> for more information on the subject.
* If you are installing **kubernetes** in an AppArmor-based OS (eg. Debian/Ubuntu) you can enable the `AppArmor` feature gate uncommenting the lines with the comment `# AppArmor-based OS` on top.
* The `kubelet_systemd_hardening`, both with `kubelet_secure_addresses` setup a minimal firewall on the system. To better understand how these variables work, here's an explanatory image:
Modified from [comments in #3471](https://github.com/kubernetes-sigs/kubespray/issues/3471#issuecomment-530036084)
## Limitation: Removal of first kube_control_plane and etcd-master
Currently you can't remove the first node in your kube_control_plane and etcd-master list. If you still want to remove this node you have to:
### 1) Change order of current control planes
Modify the order of your control plane list by pushing your first entry to any other position. E.g. if you want to remove `node-1` of the following example:
```yaml
children:
kube_control_plane:
hosts:
node-1:
node-2:
node-3:
kube_node:
hosts:
node-1:
node-2:
node-3:
etcd:
hosts:
node-1:
node-2:
node-3:
```
change your inventory to:
```yaml
children:
kube_control_plane:
hosts:
node-2:
node-3:
node-1:
kube_node:
hosts:
node-2:
node-3:
node-1:
etcd:
hosts:
node-2:
node-3:
node-1:
```
## 2) Upgrade the cluster
run `upgrade-cluster.yml` or `cluster.yml`. Now you are good to go on with the removal.
## Adding/replacing a worker node
This should be the easiest.
@@ -83,6 +31,8 @@ That's it.
Append the new host to the inventory and run `cluster.yml`. You can NOT use `scale.yml` for that.
**Note:** When adding new control plane nodes, always append them to the end of the `kube_control_plane` group in your inventory. Adding control plane nodes in the first position is not supported and will cause the playbook to fail.
### 2) Restart kube-system/nginx-proxy
In all hosts, restart nginx-proxy pod. This pod is a local proxy for the apiserver. Kubespray will update its static config, but it needs to be restarted in order to reload.
With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
If the node you want to remove is not online, you should add `reset_nodes=false` and `allow_ungraceful_removal=true` to your extra-vars.
## Replacing a first controlplane node
## Adding/Removal of first `kube_control_plane` and etcd-master
### 1) Change controlplane nodes order in inventory
Currently you can't remove the first node in your `kube_control_plane` and etcd-master list. If you still want to remove this node you have to:
from
### 1) Change order of current control planes
```ini
[kube_control_plane]
node-1
node-2
node-3
Modify the order of your control plane list by pushing your first entry to any other position. E.g. if you want to remove `node-1` of the following example:
```yaml
all:
hosts:
children:
kube_control_plane:
hosts:
node-1:
node-2:
node-3:
kube_node:
hosts:
node-1:
node-2:
node-3:
etcd:
hosts:
node-1:
node-2:
node-3:
```
to
change your inventory to:
```ini
[kube_control_plane]
node-2
node-3
node-1
```yaml
all:
hosts:
children:
kube_control_plane:
hosts:
node-2:
node-3:
node-1:
kube_node:
hosts:
node-2:
node-3:
node-1:
etcd:
hosts:
node-2:
node-3:
node-1:
```
### 2) Remove old first control plane node from cluster
### 2) Upgrade the cluster
run `upgrade-cluster.yml` or `cluster.yml`. Now you are good to go on with the removal.
### 3) Remove old first control plane node from cluster
With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=node-1` to the playbook to limit the execution to the node being removed.
If the node you want to remove is not online, you should add `reset_nodes=false` and `allow_ungraceful_removal=true` to your extra-vars.
### 3) Edit cluster-info configmap in kube-public namespace
### 4) Edit cluster-info configmap in kube-public namespace
`kubectl edit cm -n kube-public cluster-info`
Change ip of old kube_control_plane node with ip of live kube_control_plane node (`server` field). Also, update `certificate-authority-data` field if you changed certs.
# The crio_runtimes variable defines a list of OCI compatible runtimes.
crio_runtimes:
- name:crun
path:"{{ crio_runtime_bin_dir }}/crun"
path:"{{ crio_runtime_bin_dir }}/crun"# Use crun in cri-o distributions, don't use 'crun' role
type:oci
root:/run/crun
@@ -111,3 +114,4 @@ crio_default_capabilities:
- SETPCAP
- NET_BIND_SERVICE
- KILL
crio_additional_mounts:[]
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.