Commit Graph

30 Commits

Author SHA1 Message Date
Max Gautier
6138c6a1a2 CI: use a dedicated disk for releases (#12692)
This should make 'no space left on device' problems easier to handle

Use /tmp/releases as local_release_dir CI created machine, while keeping
the same folder on the runner (needed for gitlab-ci runner pods)
2025-11-17 02:57:39 -08:00
Max Gautier
6115eba3c3 CI: label VirtualMachineInstance with PR id and pipeline ids (#12716)
Helps with CI debuggability
2025-11-17 02:21:39 -08:00
Max Gautier
2fbbf2e1e4 CI/kubevirt: Configure ignition provisioning
Flatcar does not support cloud-init
2025-05-27 23:29:56 +08:00
ant31
3597b8d7fe Kubevirt: use Ignition cloud config 2025-05-27 23:29:55 +08:00
Max Gautier
315313dd10 CI: convert molecule jobs to parallel:matrix
With the new provisionning using kubevirt this should be faster.
2025-03-13 10:14:48 +01:00
Max Gautier
ac4c41e4e6 CI: use OS name in VMs
Allows an easier log reading on multi-OS test runs (such as molecule
tests)
2025-03-13 10:14:47 +01:00
Max Gautier
611f645907 CI: Generate ssh key pair on the fly
There is litte reason to share an ssh key common to all CI jobs, so
generate one for each on the fly.

Also use plain-text cloud-init config instead of base64 for readability
2025-03-13 10:14:46 +01:00
Max Gautier
e62bbe0c76 CI: adapt packet-ci role to act as a molecule provisioner
To work with molecule, we need to use the name provided by molecule_yml
in inventory.

Inject the name in the VirtualMachineInstance (with a default to handle
non-molecule scenario) and get it back as part of inventory).

Account for no ansible groups
2025-03-13 10:14:45 +01:00
Max Gautier
a8d494fb95 CI/kubevirt: allow every vars in kubevirt template to be overriden
The current templating of kubevirt VirtualMachine relies on global
ansible variables, except for the group the nodes are meant to be in.

In order to have more flexibility (in particular, mixed OS cluster for
instances), expect now an abitrary  dict to be passed to the template ;
this allows to embed directly in the nodes definition any variable used
by the template.
2025-03-13 10:14:44 +01:00
Max Gautier
ff4de880ae CI: Replace kubevirt dynamic inventory with generated yaml
VirtualMachineInstance resources sometimes temporarily loose their
IP (at least as far as the kubevirt controllers can see).
See https://github.com/kubevirt/kubevirt/issues/12698 for the upstream
bug.

This does not seems to affect actual connection (if it did, our current
CI would not work).
However, our CI execute multiple playbooks, and in particular:
1. The provisioning playbook (which checks that the IPs have been
   provisioned by querying the K8S API)
2. Kubespray itself

If any of the VirtualMachineInstance looses its IP between after 1
checked for it, and before 2 starts, the dynamic inventory (which is
invoked when the playbook is launched by ansible-playbook) will not have
an ip for that host, and will try to use the name for ssh, which of
course will not work.

Instead, when we have a valid state during provisioning (all IPs
presents), use it to construct a static inventory which will be used for
the rest of the CI run.
2024-11-14 09:40:59 +01:00
Max Gautier
329ffd45f0 CI: use kubevirt.core dynamic inventory
This allows a single source of truth for the virtual machines in a
kubevirt ci-run.

`etcd_member_name` should be correctly handled in kubespray-defaults for
testing the recover cases.
2024-11-14 09:40:58 +01:00
Max Gautier
c46e5dc33a CI: use VirtualMachineInstance for VMs
VMI in Kubevirt are the abstraction below VirtualMachine.

- We don't really need the extra abstraction of VirtualMachine objects
- Convert the waiting for VMs ip address to use kubernetes.core.k8s_info
  and no shell pipeline
2024-11-13 17:32:50 +01:00
Max Gautier
65c67c5c51 CI: use Kubernetes GC to delete kubevirt vms
This leverage the Kubernetes GC to delete kubevirt VMs, by using
ownerReferences, with the CI pod running the playbook as the owner.
This concretely means that the control plane in our CI cluster will
delete the kubevirt VMs associated with a particular ci job as soon as
that pod job is deleted, which usually happens when the job terminates,
(barring errors, which will be addressed in the cluster directly)

Upgrade to kubevirt.io/v1 for the VirtualMachine manifests, since the
alpha version is deprecated.
2024-10-18 12:14:52 +02:00
Max Gautier
7580e59bbf Define k8s_cluster dynamically
This allows inventories to not define the k8s_cluster group manually.
2024-09-21 14:35:35 +02:00
Max Gautier
76c42b4d3f CI: cleanup '-scale' tests infra (#11535)
There is actually no test using this since ad6fecefa8,
so there is no reason to keep that infra in our tests scripts.
2024-09-18 13:04:50 +01:00
Antoine Legrand
a0587e0b8e CI: rework pipeline: short/extended based on labels (#11324)
* CI: reduce VM resources requests to improve scheduling

* CI: Reduce default jobs; add labels(ci-full/extended) to run more test

* CI: use jobs dependencies instead of stages

* precommit one-job

* CI: Use Kubevirt VM to run Molecule and Vagrant jobs
2024-07-01 03:25:36 -07:00
Max Gautier
a9e29a9eb2 Fix etcd client generation (#10769)
* ci: redefine multinode to node-etcd-client

This should allow to catch several class of problem rather than just
one -> from network plugin such as calico or cilium talking directly to
the etcd.

* Dynamically define etcd host range

This has two benefits:
- We don't play the etcd role twice for no reason
- We have access to the whole cluster (if needed) to use things like
  group_by.
2024-01-16 15:50:41 +01:00
Max Gautier
243ca5d08f Add test case for calico using etcd datastore (#10722)
* Add multinode ci layout

* Add test case for calico using etcd datastore
2023-12-20 09:59:02 +01:00
Max Gautier
7395c27932 CI: Document the 'all-in-one' layout + small refactoring (#10725)
* Rename aio to all-in-one and document it

ADTM.
Acronyms don't tell much.

* Refactor vm_count in tests provisioning
2023-12-18 11:33:13 +01:00
Kenichi Omichi
cd7381d8de Drop Ansible support for v2.9 and v2.10 (#8925)
Ansible v2.9 and v2.10 are EOL as [1].
This drops those version supports by following the upstream Ansible.

This sets use_ssh_args true always because that is required to use
ssh_args on ansible.cfg on Ansible v2.11 or later[2].

ansible_ssh_host is replaced with ansible_host because ansible_ssh_host
has been deprecated already and cenots7 jobs were failed due to the
deprecated ansible_ssh_host.

[1]: https://docs.ansible.com/ansible/devel/reference_appendices/release_and_maintenance.html#ansible-core-changelogs
[2]: https://docs.ansible.com/ansible/latest/collections/ansible/posix/synchronize_module.html#parameter-use_ssh_args
2022-06-09 07:07:42 -07:00
Florian Ruynat
77a74adedd Bump centos8 CI job memory to 3go and remove mitogen for fedora CI (#7921) 2021-08-30 08:25:13 -07:00
Kenichi Omichi
8d7327c188 Remove old groups from test inventory (#7656)
We have released v2.16 of Kubespray already, so we can remove those
old groups from the test inventory as the TODO says.
2021-06-09 02:45:48 -07:00
Cristian Calin
360aff4a57 Rename ansible groups to use _ instead of - (#7552)
* rename ansible groups to use _ instead of -

k8s-cluster -> k8s_cluster
k8s-node -> k8s_node
calico-rr -> calico_rr
no-floating -> no_floating

Note: kube-node,k8s-cluster groups in upgrade CI
      need clean-up after v2.16 is tagged

* ensure old groups are mapped to the new ones
2021-04-29 05:20:50 -07:00
Florian Ruynat
1c7053c9d8 Fix CI template for etcd recover jobs (kube-master rename) (#7441) 2021-04-05 13:41:19 -07:00
Etienne Champetier
f0cdf71ccb Remove vault (#7400)
* Remove contrib/vault

This is marked as broken since 2018 / 3dcb914607
This still reference apiserver.pem, not used since ddffdb63bf

Signed-off-by: Etienne Champetier <e.champetier@ateme.com>

* Finish nuking vault from the codebase

Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
2021-03-24 09:26:08 -07:00
Kenichi Omichi
486b223e01 Replace kube-master with kube_control_plane (#7256)
This replaces kube-master with kube_control_plane because of [1]:

  The Kubernetes project is moving away from wording that is
  considered offensive. A new working group WG Naming was created
  to track this work, and the word "master" was declared as offensive.
  A proposal was formalized for replacing the word "master" with
  "control plane". This means it should be removed from source code,
  documentation, and user-facing configuration from Kubernetes and
  its sub-projects.

NOTE: The reason why this changes it to kube_control_plane not
      kube-control-plane is for valid group names on ansible.

[1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation
2021-03-23 17:26:05 -07:00
qvicksilver
ac2135e450 Fix recover-control-plane to work with etcd 3.3.x and add CI (#5500)
* Fix recover-control-plane to work with etcd 3.3.x and add CI

* Set default values for testcase

* Add actual test jobs

* Attempt to satisty gitlab ci linter

* Fix ansible targets

* Set etcd_member_name as stated in the docs...

* Recovering from 0 masters is not supported yet

* Add other master to broken_kube-master group as well

* Increase number of retries to see if etcd needs more time to heal

* Make number of retries for ETCD loops configurable, increase it for recovery CI and document it
2020-02-11 01:38:01 -08:00
Matthew Mosesohn
023108a733 Refactor calico route reflector to run in k8s cluster (#4975)
* Refactor calico-rr to run in k8s cluster with taint

Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa

* add preinstall checks

* rework calico/rr role

Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8

* add empty calico-rr group

Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad
2019-08-08 07:37:22 -07:00
Maxime Guyot
88fe3403ce Add overcommitment for CPU in Packet CI playbook (#4597) 2019-04-21 02:27:44 -07:00
Andreas Krüger
b834a28891 PHASE 1 - Add Packet-CI playbook and configuration (#4537) 2019-04-16 14:49:07 -07:00