353 Commits

Author SHA1 Message Date
Max Gautier
c03c68e8c7 Do not suppress output during cert generation (#12479)
Makes debugging easier.
2025-08-28 19:43:09 -07:00
刘旭
62f49822dd fix ETCD_INITIAL_CLUSTER config in etcd.env and etcd-events.env (#12342) 2025-06-27 07:44:29 -07:00
Kubernetes Prow Robot
c7c3d2ba95 Merge pull request #12163 from VannTen/cleanup/etcd_inv_sample
Move etcd inventory sample doc to role defaults
2025-05-26 03:16:16 -07:00
Max Gautier
92e8ac9de2 Remove tag 'master' (#12228)
* Remove tag master

Following it's deprecation in 4b324cb0f (Rename master to control plane
- non-breaking changes only (#11394), 2024-09-06)

* Add fail fast path when using removed tags

- Used for the master tag, but this could be used for other things in
  the future
2025-05-22 01:20:36 -07:00
Max Gautier
9c2bdeec63 Decouple etcd defaults in a separate role
This allows us to reuse the defaults in other places without putting
everything in kubespray-defaults.

In that, for kubernetes/control-plane.
2025-05-16 14:51:29 +02:00
Max Gautier
22d3cf9c2b Move 'pretend certificates' **after** cert distribution
The link target will only exist after we distribute the certs on each node.
2025-05-15 18:35:34 +02:00
Max Gautier
d6d87e9a83 Move cilium_deploy_additionnaly to kubespray-default (#12191)
Instead of using default(false) all over the place, use
kubespray-defaults
2025-05-07 05:05:17 -07:00
Max Gautier
fcc294600c Workaround missing etcd certds on control plane node (#12181) 2025-05-05 01:05:57 -07:00
Max Gautier
9631b5fd44 Move etcd inventory sample doc to role defaults 2025-05-04 21:24:26 +02:00
ERIK
8f41a2886d Update version comparison syntax and optimize whitespace (#12146)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2025-04-24 00:56:31 -07:00
Max Gautier
f9a263090a Propagate v-less version everywhere 2025-03-05 16:18:39 +01:00
ERIK
768fbeff0b update etcd snapshot count (#11997)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2025-02-27 01:30:32 -08:00
Boris
a51e7dd07d refact ip stack (#11953) 2025-02-11 03:37:58 -08:00
Antoine Legrand
4373c1be1d Revert "Add support for ipv6 only cluster via "enable_ipv6only_stack_networks…" (#11941)
This reverts commit 76c0a3aa75.
2025-02-03 07:06:58 -08:00
Boris
76c0a3aa75 Add support for ipv6 only cluster via "enable_ipv6only_stack_networks" (#11831) 2025-01-27 04:15:22 -08:00
Max Gautier
12a2c5eaa8 verify_settings: consolidate choices validation 2025-01-23 14:32:42 +01:00
Max Gautier
0f0e24be0f etcd: throttle restart for availability (#11677)
* etcd: throttle restart for availability

During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.

Limit the concurrent restart so that the etcd cluster can keep quorum.

* Simplify etcd handlers
2024-11-05 06:11:29 +00:00
Kubernetes Prow Robot
3f027abae6 Merge pull request #11598 from VannTen/cleanup/fact_gathering
Do not serialize fact gathering for no_proxy
2024-10-31 10:59:26 +00:00
Max Gautier
b4768cfa91 Always copy cert generation scripts to first etcd (#11612)
If we don't, existing installation would not pick up fix to that script,
such as dc33a1971d.
2024-10-09 02:44:22 +01:00
Max Gautier
2826b357d4 Remove serialized collect of ansible_default_ipv4
The fallback_ips tasks are essentially serializing the gathering of one
fact on all the hosts, which can have dramatic performance implications
on large clusters (several minutes).

This is essentially a reversal of 35f248dff0
Being able to run without refreshing the cache facts is not worth it.

We keep fallback_ip for now, simply changing the access to a normal
hostvars variable instead of a custom dictionnary.
2024-10-04 14:19:20 +02:00
Max Gautier
2ec1c93897 Test group membership with group_names
Testing for group membership with group names makes Kubespray more
tolerant towards the structure of the inventory.
Where 'inventory_hostname in groups["some_group"] would fail if
"some_group" is not defined, '"some_group" in group_names' would not.
2024-09-21 14:09:09 +02:00
Bogdan Sass
4b324cb0f0 Rename master to control plane - non-breaking changes only (#11394)
K8s is moving away from the "master" terminology, so kubespray should follow the same naming conventions. See 65d886bb30/sig-architecture/naming/recommendations/001-master-control-plane.md
2024-09-06 07:56:19 +01:00
刘旭
3da6c4fc18 Allow for configuring etcd progress notify interval and default set to 5s (#11499) 2024-09-05 06:29:05 +01:00
Vlad Korolev
9a7b021eb8 Do not use ‘yes/no’ for boolean values (#11472)
Consistent boolean values in ansible playbooks
2024-08-28 06:30:56 +01:00
Lihai Tu
8208a3f04f Rename systemd module to systemd_service (#11396)
Signed-off-by: tu1h <lihai.tu@daocloud.io>
2024-07-26 01:11:39 -07:00
Tom M.
242edd14ff Fix etcd certificate to acces address as SAN (#11388) 2024-07-25 18:49:23 -07:00
Bas
8f5f75211f Improving yamllint configuration (#11389)
Signed-off-by: Bas Meijer <bas.meijer@enexis.nl>
2024-07-25 18:42:20 -07:00
Max Gautier
d50f61eae5 pre-commit: apply autofixes hooks and fix the rest manually
- markdownlint (manual fix)
- end-of-file-fixer
- requirements-txt-fixer
- trailing-whitespace
2024-05-28 13:26:44 +02:00
Ugur Can Ozturk
a512b861e0 [etcd/tracing]: fix etcd sampling rate flag (#11175)
Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>
2024-05-13 03:14:39 -07:00
yun
13e1f33898 Correct the POLY1305 cipher suites by adding the suffix _SHA256 (#10641) 2024-01-22 18:00:52 +01:00
Ugur Can Ozturk
ae780e6a9b [etcd]: add etcd distributed tracing flags (#10666)
* [etcd]: add etcd distributed tracing flags

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [etcd]: add etcd distributed tracing flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

* [etcd]: add etcd distributed tracing flags - fix

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>

---------

Signed-off-by: Ugur Ozturk <ugurozturk918@gmail.com>
2023-12-19 04:00:10 +01:00
Max Gautier
0fb404c775 etcd: use dynamic group for certs generation check (#10610)
We take advantage of group_by to create the list of nodes needing new
certs, instead of manually looping inside a Jinja template.

This should make the role more readable and less susceptible to
white space problems.
2023-12-12 11:22:29 +01:00
Max Gautier
0d4f57aa22 Validate systemd unit files (#10597)
* Validate systemd unit files

This ensure that we fail early if we have a bad systemd unit file
(syntax error, using a version not available in the local version, etc)

* Hack to check systemd version for service files validation

factory-reset.target was introduced in system 250, same version as the
aliasing feature we need for verifying systemd services with ansible.
So we only actually executes the validation if that target is present.

This is an horrible hack which should be reverted as soon as we drop
support for distributions with systemd<250.
2023-11-17 20:01:23 +01:00
Max Gautier
8ebeb88e57 Refactor "multi" handlers to use listen (#10542)
* containerd: refactor handlers to use 'listen'

* cri-dockerd: refactor handlers to use 'listen'

* cri-o: refactor handlers to use 'listen'

* docker: refactor handlers to use 'listen'

* etcd: refactor handlers to use 'listen'

* control-plane: refactor handlers to use 'listen'

* kubeadm: refactor handlers to use 'listen'

* node: refactor handlers to use 'listen'

* preinstall: refactor handlers to use 'listen'

* calico: refactor handlers to use 'listen'

* kube-router: refactor handlers to use 'listen'

* macvlan: refactor handlers to use 'listen'
2023-11-08 12:28:30 +01:00
Max Gautier
8f0e553e11 etcd/backup: native ansible modules instead of shell (#10540)
This make native ansible features (dry-run, changed state) easier to
have, and should have a minimal performance impact, since it only runs
on the etcd members.
2023-10-30 20:05:28 +01:00
Max Gautier
0b2e5b2f82 Retries ssh connection for Gather node certs (#10515)
This allows this task to work with a forks count > 10 and the default
configuration of sshd, which is to limit sessions to 10. (see
MaxSessions in sshd_config).

Since this is a delegate_to task, it connects to the same host (first
etcd) for each node in the cluster, thus easily going above 10.

Raising the ssh connection attempts allow for more robustness, without
decreasing the forks count or serialising the tasks, which could slow
the task (or the playbook as a whole, if decreasing forks).
2023-10-19 05:04:29 +02:00
Samuel Liu
e1881fae02 Install etcdutl file by default (#10385) 2023-08-23 07:04:22 -07:00
Francisco Orselli
7295d13d60 [EOS-11830] Use ETCD port 2381 for metrics (#10332) 2023-08-08 11:06:16 -07:00
Arthur Outhenin-Chalandre
36e5d742dc Resolve ansible-lint name errors (#10253)
* project: fix ansible-lint name

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: ignore jinja template error in names

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: capitalize ansible name

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: update notify after name capitalization

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-07-26 07:36:22 -07:00
yangsenzk
13aa32278a bugfix: fix grep command without -w option causing prefix matched while adding one etcd member (#10291) 2023-07-13 21:43:29 -07:00
Arthur Outhenin-Chalandre
5d00b851ce project: fix var-spacing ansible rule (#10266)
* project: fix var-spacing ansible rule

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing on the beginning/end of jinja template

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing of default filter

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix spacing between filter arguments

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix double space at beginning/end of jinja

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix remaining jinja[spacing] ansible-lint warning

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-07-04 20:36:54 -07:00
Arthur Outhenin-Chalandre
f8f197e26b Fix outdated tag and experimental ansible-lint rules (#10254)
* project: fix outdated tag and experimental

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: remove no longer useful noqa 301

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: replace unnamed-task by name[missing]

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: fix daemon-reload -> daemon_reload

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-06-30 02:51:57 -07:00
Arthur Outhenin-Chalandre
25cb90bc2d Upgrade ansible (#10190)
* project: update all dependencies including ansible

Upgrade to ansible 7.x and ansible-core 2.14.x. There seems to be issue
with ansible 8/ansible-core 2.15 so we remain on those versions for now.
It's quite a big bump already anyway.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* tests: install aws galaxy collection

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* ansible-lint: disable various rules after ansible upgrade

Temporarily disable a bunch of linting action following ansible upgrade.
Those should be taken care of separately.

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve deprecated-module ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve no-free-form ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[meta] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[playbook] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve schema[tasks] ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve risky-file-permissions ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve risky-shell-pipe ansible-lint error

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: remove deprecated warn args

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: use fqcn for non builtin tasks

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: resolve syntax-check[missing-file] for contrib playbook

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

* project: use arithmetic inside jinja to fix ansible 6 upgrade

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>

---------

Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@proton.ch>
2023-06-26 03:15:45 -07:00
Kenichi Omichi
7afbdb3e1e Drop canal network_plugin (#10100)
According to the canal github[1] the repo is not maintained over 5 years.
In addition, the README says
```
  Originally, we thought we might more deeply integrate the two projects
  (possibly even going as far as a rebranding!). However, over time it
  became clear that that wasn't really necessary to fulfil our goal of
  making them work well together. Ultimately, we decided to focus on
  adding features to both projects rather than doing work just to
  combine them.
```
So it is difficult to support canal by Kubespray at this situation.

[1]: https://github.com/projectcalico/canal
2023-05-18 03:40:33 -07:00
Kei Kori
dc33a1971d [etcd] fix make-ssl-etcd.sh.j2; move pem files only if any new certs exist (#9974) 2023-04-12 21:52:35 -07:00
Karl Fischer
6278b12af6 fixed clinet to client 2023-02-20 10:09:03 +01:00
Bas
2c93c997cf pre-commit autocorrected files (#9750) 2023-02-06 01:35:16 -08:00
ERIK
20d99886ca Update etcd log-level parameter name (#9540)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>

Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2022-12-05 01:05:03 -08:00
Samuel Liu
dd4bc5fbfe [etcd] Sometimes, we do not need to run etcd role on all nodes. (#9173)
* WIP: sometimes,we not run etcd

* fix ansible lint

* like calico(kdd) cni, no need run etcd
2022-09-09 01:29:22 -07:00
ERIK
9ad2d24ad8 Add unsafe_show_logs switch (#9164)
Signed-off-by: bo.jiang <bo.jiang@daocloud.io>

Signed-off-by: bo.jiang <bo.jiang@daocloud.io>
2022-08-16 18:52:48 -07:00