Most variables should have a default instead of relying on the default
filter.
(Note that the variable is misnomed, this should be certs and not keys,
but it's not worth breaking compat).
When loadbalancer_apiserver_localhost is enabled, Calico falls back to the
Kubernetes service IP because the kubernetes-services-endpoint ConfigMap is
empty. CNI then fails to reach the API server even though an nginx proxy is
listening on localhost.
Update kube_apiserver_global_endpoint to always reference the localhost load
balancer (respecting the configured port) and populate the ConfigMap for both
eBPF and localhost LB modes.
* control-plane: fix first_kube_control_plane delegation with kube_override_hostname
When kube_override_hostname is configured, the node names reported by
`kubectl get nodes` differ from the inventory_hostname known to Ansible.
This causes delegation failures in subsequent tasks since Ansible cannot
resolve the hostname from kubectl output to an inventory host.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
* control-plane: remove fragile first_control_plane selection logic
Current implementation breaks with kube_override_hostname and has
multiple edge cases. Drop until proper kubectl-based node lookup
can be implemented.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
---------
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
This should make 'no space left on device' problems easier to handle
Use /tmp/releases as local_release_dir CI created machine, while keeping
the same folder on the runner (needed for gitlab-ci runner pods)
* CI: Try a full ssh connection on hosts instead of only checking the port
If we only try the port, we can try to connect in the playbook which is
executed next even though the managed node has not yet completed it's
boot-up sequence ("System is booting up. Unprivileged users are not
permitted to log in yet. Please come back later. For technical details,
see pam_nologin(8).")
This does not account for python-less hosts, but we don't use those in
CI anyway (for now, at least).
* CI: Remove connection method override when creating VMs
This prevented wait_for_connection to work correctly by hijacking the
connection to localhost, thus bypassing the connection check.
Add missing RBAC permissions for Calico apiserver to function correctly
with Kubernetes 1.33+
Changes:
1. Add K8s 1.33 ValidatingAdmissionPolicy resources to calico-webhook-reader
- validatingadmissionpolicies
- validatingadmissionpolicybindings
Kubernetes 1.33 introduced ValidatingAdmissionPolicy resources (KEP-3488)
that require explicit RBAC permissions. Without these changes, Calico
apiserver on k8s 1.33+ will not work and needless errors are logged
* Remove etcd member by peerURLs
The way to obtain the IP of a particular member is convoluted and depend
on multiple variables. The match is also textual and it's not clear
against what we're matching
It's also broken for etcd member which are not also Kubernetes nodes,
because the "Lookup node IP in kubernetes" task will fail and abort the
play.
Instead, match against 'peerURLs', which does not need new variable, and
use json output.
* Add testcase for etcd removal on external etcd
* do not merge
* fixup! Remove etcd member by peerURLs
* fixup! Remove etcd member by peerURLs
The 'old' playbook and the collection use '-' and '_' as separator,
which breaks the logic in scripts/testcases_run.sh.
Add aliases using the old schemes to make the test work and avoid
breaking anything.
Both '-' and '_' variants will be deleted once we switch to supporting
collection only.
fixed kubelet condition
CRI-O: fix for handling of container runtime switching
refactored kubelet start condition
stop/start kubelet and crio only when default runtime is changed
fixed condition for runtime_matches fact variable
fixed set facts for existing container runtime
added crio runtime switch variable
changed condition to use runtime switch variable
added comment for not-found for readers
Allow setting deployment replicas through `coredns_replicas` when
`enable_dns_autoscaler` is set to false.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
The option ara_default was still present in ansible.cfg under callbacks_enabled.
This is a leftover from commit b9e9364 ("Remove ara support in CI") and should
have been removed together with the rest of the ara integration.
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Enable reserved variable name checks and fix violations
Updated .ansible-lint configuration to skip only var-naming[pattern]
and var-naming[no-role-prefix] instead of skipping the entire var-naming rule.
This enables the check for reserved variable names.
Renamed variables that used reserved names to avoid conflicts.
Updated all references in tasks, variables, and templates.
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Rename namespace variable inside tasks instead of deleting it
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Change hosts variable to vm_hosts
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
* Use k8s_namespace instead of dashboard_namespace in dashboard.yml.j2 template
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
---------
Signed-off-by: Ali Afsharzadeh <afsharzadeh8@gmail.com>
Fixes a bug where `kube-apiserver` fails to start if the PodSecurity
configuration file doesn't have the `apiVersion` and `kind` keys.
Signed-off-by: Alejandro Macedo <alex.macedopereira@gmail.com>
Retrying to load conntrack modules was bound to fail due to the way, the current `when` conditions were utilized.
It was based on the assumption, that in case of success, the registered variable would have an `rc` attribute with the value `0`.
Unfortunately, the `rc` attribute is only present in case of a failure, where it's value is >1.
The result of `community.general.modprobe` in case of success looks like this:
```
{
"changed": false,
"msg": "All items completed",
"results": [
{
"ansible_loop_var": "item",
"changed": false,
"failed": false,
"invocation": {
"module_args": {
"name": "nf_conntrack",
"params": "",
"persistent": "present",
"state": "present"
}
},
"item": "nf_conntrack",
"name": "nf_conntrack",
"params": "",
"state": "present"
}
],
"skipped": false
}
```
While it looks like this in case of a failure:
```
{
"changed": false,
"failed": true,
"msg": "One or more items failed",
"results": [
{
"ansible_loop_var": "item",
"attempts": 3,
"changed": false,
"failed": true,
"invocation": {
"module_args": {
"name": "nf_conntrack_doesnotexist",
"params": "",
"persistent": "present",
"state": "present"
}
},
"item": "nf_conntrack_doesnotexist",
"msg": "modprobe: FATAL: Module nf_conntrack_doesnotexist not found in directory /lib/modules/5.14.0-570.32.1.el9_6.x86_64\n",
"name": "nf_conntrack_doesnotexist",
"params": "",
"rc": 1,
"state": "present",
"stderr": "modprobe: FATAL: Module nf_conntrack_doesnotexist not found in directory /lib/modules/5.14.0-570.32.1.el9_6.x86_64\n",
"stderr_lines": [
"modprobe: FATAL: Module nf_conntrack_doesnotexist not found in directory /lib/modules/5.14.0-570.32.1.el9_6.x86_64"
],
"stdout": "",
"stdout_lines": []
}
],
"skipped": false
}
```
By evaluating `failed` instead, this issue can be prevented.
See also:
- https://github.com/kubernetes-sigs/kubespray/issues/11340
Co-authored-by: Max Gautier <mg@max.gautier.name>
The Prometheus Operator CRDs are commonly used for monitoring and are
used by some CNIs (such as Cilium). Kubespray can be installed first,
and the subsequent installation of the operator can be handled by the
user (or later extensions).
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Add variable to set kubelet staticPodPath location.
It can be set to empty so that we can choose to disable it for some nodes.
STIG recommendation is to disable it.
Signed-off-by: Shaleen Bathla <shaleenbathla@gmail.com>
Co-authored-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Debian Trixie recently removed the package `software-properties-common`,
add the condition not on Debian Trixie.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Remove --auth-anonymous if kube_api_anonymous_auth in undefined, to avoid
compatibility errors with other arguments of the kube-apiserver, such as
--authentication-config when anonymous field is configured.
* Add header configuration in containerd hosts.toml
Signed-off-by: Alexander Gil <pando855@gmail.com>
* Disable log output on containerd mirrors settings if required
Signed-off-by: Alexander Gil <pando855@gmail.com>
---------
Signed-off-by: Alexander Gil <pando855@gmail.com>
* docs: remove obsolete reference to `gen_tags.sh`
`scripts/gen_tags.sh` was removed in 373b952a0c
* docs: fix 404 links
Merge the `Requirements` section with the `Usage` section and just
reference the inventory documentation, which then points to all further
information related to group vars etc.
* feat(subnet): Ensure Vagrant subnet not in use by localhost
This commit ensures that Vagrantfile supplied $subnet is not in use by
the localhost. Previously, if the subnet is in use by localhost (i.e.
bridge network), Vagrant VM boxes can not communicate.
* refactor(socket): Use ruby Socket library to find addrs
This commit reverts the usage of Ruby .scan() which may result in
failure if program is not provided. Instead, this commit refactors to
use Socket library to determine interfaces in use, then proceeds to
compare with Vagrantfile supplied subnets. Additionally, the commit
supports IPv6 comparisons.
This allows to use kubespray_defaults (once) instead of redefining
defaults in the tests.
Test test files becomes imported tasks rather thand standalone
playbooks.
* Test: molecule replace ubuntu2004 with ubuntu2204 ubuntu2404
cri-dockerd, adduser and bastion-ssh-config can't run ubuntu2404, maybe needs to check login.
"System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)."
Signed-off-by: ChengHao Yang
<17496418+tico88612@users.noreply.github.com>
* Test: replace ubuntu-2004 with ubuntu-2404
All ubuntu-2004 tests are removed.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
* Docs: update ci.md
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
* Docs: update README.md
Remove Ubuntu 20.04 support
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
---------
Signed-off-by: ChengHao Yang
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Not really a reason not to, and this actually breaks daily-ci because
some jobs depends on this one so the whole pipeline is invalid if it's
not created.
This uses the same logic than the other versions, with simplications for
crictl and crio whose versionning scheme is tied to upstream kubernetes.
Also move some version variables in vars/ rather than defaults/, because
they are not used elsewhere and don't really make sense as modifiable by
the user.
Currently, there is no reliable way to obtain individual CRD files, so
the only solution is to update first.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
When installing or upgrading in the past, there was no validation
config. Check if the file exists first to prevent subsequent validation
errors.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
The validation step is moved to the end to avoid the loss of files that
may lead to verification failure.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
`cilium install` is equivalent to `helm install`, it will failed if
cilium relase exist. `cilium version` can know the release exist without
helm binary
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
Give users two options: besides skip Cilium, add
`cilium_remove_old_resources`, default is `false`, when set to `true`,
it will remove the content of the old version, but it will cause the
downtime, need to be careful to use.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
This patch fixes the indentation in the `encryption` section.
Previously configuration like this:
```yml
cilium_encryption_enabled: true
cilium_encryption_type: wireguard
```
Would template to a `values.yaml` file with indentation that looks like this:
```yml
encryption:
enabled: True
type: wireguard
nodeEncryption: False
```
instead of this:
```yml
encryption:
enabled: true
type: wireguard
nodeEncryption: false
```
This syntax issue causes an error during Cilium installation.
This patch also makes all boolean values in this template file go through the `to_json` filter.
Since values like `True` and `False` are not compliant with the YAML v1.2 spec,
avoiding them is preferable.
`to_json` may be used for all other values in this template to ensure we end up with
a valid YAML document in all cases (even when various strings include special characters),
but this was left for another (future) patch.
* Fix: check expiraty before renew
Since certificate renewal and container restarts involve higher risks,
they should be executed with extra caution.
* squash to Fix: check expiraty before renew
* squash to Fix: address more comments from VannTen
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
---------
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
* Remove tag master
Following it's deprecation in 4b324cb0f (Rename master to control plane
- non-breaking changes only (#11394), 2024-09-06)
* Add fail fast path when using removed tags
- Used for the master tag, but this could be used for other things in
the future
The checksums are not a defaults and are not meant to be changed from
the inventories.
Furthermore, role defaults have a lower priority that hosts facts, which
technically means a rogue hosts could hijack the hashes for its
variables.
* feat(cilium): add configurable Hubble export log rotation parameters
- Adds support for `cilium_hubble_export_file_max_backups` and `cilium_hubble_export_file_max_size_mb`
- Applies values only if `cilium_hubble_export_file_path` is defined
- Default values are set in role defaults
- Cleans up template logic by removing unnecessary conditionals
* Fix indentation for hubble export settings
* Fix undefined variable issue with ipwrap in kubeconfig override that caused pre-commit errors
* Update main.yml
rollback
This is now handled directly at the failfast-ci level (== integration
Github <-> Gitlab).
The whole pipeline will not be triggered unless:
- The author is a maintainer
- The PR has the /ok-to-test label
dnsautoscaler should only be enabled when enable_dns_autoscaler is
set to true. without this, it could be enabled without any manifest
actually using it, which makes it a false signal.
Signed-off-by: Seena Fallah <seenafallah@gmail.com>
The switch to not use system packages for containerd packages happened
multiples releases ago ; there should not be any up-to-date installation
of kubespray needing that cleanup.
Remove those steps and variables only used by them.
* Delete unused scripts
- gen_tags.sh: not the right file, produce garbage even if path is fixed
- premoderator.sh: not used since ef6d24a49 (CI require a 'lgtm' or
'ok-to-test' labels to pass (#11251), 2024-05-31)
- gitlab-branch-cleanup: unused AFAICT
* CI: inline molecule logs
Single use site -> less indirection makes it easier to read.
- This enables ithe override of the tolerations for the cilium-operator deployment
- default behaviour is to leave the toleration as is unless the var is set
With the current github-workflow setup, workflows are triggered on every
forked repository (which is quite wasteful).
Add a condition to only run on the main repository.
kubespray-defaults currently does two things:
- records a number of default variable values (in particular values used
in several places)
- gather and compose some complex network facts (in particular,
`fallback_ip` and `no_proxy`
There is no actual reason to couple those two things, and it makes using
defaults more difficult (because computing the network facts is somewhat
expensive, we don't want to do it willy-nilly)
Split the two and adjust import paths as needed.
The Gateway API needs to be installed first if you want to use Cilium's
Gateway API functionality. The Gateway API is just CRD without any Pod,
Deployment, etc., so I think it can be brought forward to before the CNI
installation.
Signed-off-by: ChengHao Yang
The recommended usage of kubespray is to use the default versions.
So putting them in inventory/sample is not really very helpful, and
causes:
- churn (keeping the inventory/sample up to date)
- support issues (mismatch between defaults and sample inventory)
Remove all concrete versions from the inventory sample.
bootstrap-os does not do anything in sudoers since e2ad6aad5 (bootstrap:
rework role (#4045), 2019-02-11).
So SSH pipelining working is effectively a pre-requisite anyway.
The preinstall assert cover a number of things, many of which depends
only on the inventory, and can be run without any ansible_facts
collected.
Split them off to simplify re-ordering.
* docs: Fix offline-environment.md to add 'v' prefix of some versions
Now some version variables (kube_version, etcd_version, etc) don't have 'v' prefix,
so you need to add 'v' prefix to download URLs.
* fix: Fix offline.yml to add 'v' prefix of some versions
* [Issue-12117]-Certificates for the new hosts are not generated during scale.yml
* [Issue-12117]-Certificates for the new hosts are not generated during scale.yml
* [Issue-12117]-Certificates for the new hosts are not generated during scale.yml
This commit fixed the process to ensure that CCM is installed first to
avoid the chicken-and-egg problem.
Signed-off-by: ChengHao Yang <17496418+tico88612@users.noreply.github.com>
* fix(containerd): always render NRI plugin block with conditional disable flag
* feat: enable Node Resource Interface plugin when using containerd
* fix: remove the
* fix: fix for linter
subscription-manager status can in some circumstances just never
terminates, with nothing indicating the problem from the Ansible
playbook log.
This makes it difficult to find the hosts misbehaving.
Add a timeout to the subscription checks (defaulting to 3 minutes). This
should be more than enough for normal circumstances while allowing
easier troubleshooting, as the hosts will be FAILED instead of the
playbook just waiting indefinitely.
* terraform upcloud: Added possibility to set up nodes with only private IPs
* terraform upcloud: add support for gateway in private zone
* terraform upcloud: split LB proxy protocol config per backend
* terraform upcloud: fix flexible plans
* terraform upcloud: Removed overview of cluster setup
---------
Co-authored-by: davidumea <david.andersson@elastisys.com>
This is more in-line with dependabot and similar auto-updaters.
Reduce ci coverage on github action updating (it does not change
kubespray code, no need for testing).
* Remove heketi
Heketi is no longer developed or supported and should not be used
anymore.
Remove the contrib playbook.
* Remove contrib glusterfs
Glusterfs integration with glusterfs is now either deprecated or
unsupported.
Other storage solutions should be preferred.
This commit enhances the node removal playbook's reliability and safety by implementing the following changes:
1. **Node Validation**: Added a validation step using assert to ensure the `node` variable is defined and contains nodes. If the list is empty or undefined, the playbook fails early, preventing accidental operations on the entire cluster.
2. **Removed Defaulting for Hosts**: Updated tasks to enforce explicit `node` variable input without defaulting to critical groups (e.g., `etcd:k8s_cluster:calico_rr`). By validating `node` beforehand, tasks now solely rely on user-provided input and safely avoid unintended targeting.
3. **Explicit User Confirmation**: Enhanced the confirmation prompt to clarify the scope of the operation. The admin is now required to explicitly confirm node state deletion, ensuring a deliberate decision before proceeding.
These improvements strengthen the reliability and safety of the `remove-node.yml` playbook by eliminating ambiguous behavior, preventing misconfigurations, and ensuring clear interaction during node removal tasks.
Vagrant jobs needs a big cache which makes them slow / sometimes stuck
completely. Using the kubevirt provisionning playbook is now
significantly faster, so do just that.
Having only one provisionner in CI will also allows us to remove some of
the custom runners executors we use for vagrant, and more generally
reduce the CI maintenance.
Our kubevirt CI platform does not support ivp6 yet, so we keep the
relevant jobs in vagrant, but we'll migrate them as well as soon as
possible.
- Take advantage of `parallel:matrix` to make the jobs definition shorter
and more readable.
- Remove helper scripts which are no longer needed
- Remove redundant indirection in the gitlab-ci pipelines definitions
(only one user)
This commit upgrades ingress-nginx to version v1.12.1, addressing multiple critical vulnerabilities including CVE-2025-1974, CVE-2025-1097, CVE-2025-1098, CVE-2025-24513, and CVE-2025-24514 as detailed in the ingress-nginx release notes: https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.12.1
Important Notes:
- Fixing CVE-2025-1974 required disabling validation of the generated NGINX configuration during validation of Ingress resources. Invalid Ingress resources may stop the NGINX configuration from being updated.
- Recommended mitigations include enabling annotation validation and disabling snippet annotations.
Alongside this upgrade, the `ingress_nginx_kube_webhook_certgen_image_tag` has been updated to v1.5.2 for compatibility, based on: https://github.com/kubernetes/ingress-nginx/pull/13066
Changelog:
- Updated ingress-nginx version to v1.12.1 in Kubespray.
- Updated `ingress_nginx_kube_webhook_certgen_image_tag` in `roles/kubespray-defaults/defaults/main/download.yml` to v1.5.2.
Fixes: https://github.com/kubernetes-sigs/kubespray/issues/12073
@@ -185,9 +182,6 @@ You can choose among ten network plugins. (default: `calico`, except Vagrant use
- [cilium](http://docs.cilium.io/en/latest/): layer 3/4 networking (as well as layer 7 to protect and secure application protocols), supports dynamic insertion of BPF bytecode into the Linux kernel to implement security services, networking and visibility logic.
- [weave](docs/CNI/weave.md): Weave is a lightweight container overlay network that doesn't require an external K/V database cluster.
(Please refer to `weave` [troubleshooting documentation](https://www.weave.works/docs/net/latest/troubleshooting/)).
- [kube-ovn](docs/CNI/kube-ovn.md): Kube-OVN integrates the OVN-based Network Virtualization with Kubernetes. It offers an advanced Container Network Fabric for Enterprises.
- [kube-router](docs/CNI/kube-router.md): Kube-router is a L3 CNI for Kubernetes networking aiming to provide operational
# Deploying a Kubespray Kubernetes Cluster with GlusterFS
You can either deploy using Ansible on its own by supplying your own inventory file or by using Terraform to create the VMs and then providing a dynamic inventory to Ansible. The following two sections are self-contained, you don't need to go through one to use the other. So, if you want to provision with Terraform, you can skip the **Using an Ansible inventory** section, and if you want to provision with a pre-built ansible inventory, you can neglect the **Using Terraform and Ansible** section.
## Using an Ansible inventory
In the same directory of this ReadMe file you should find a file named `inventory.example` which contains an example setup. Please note that, additionally to the Kubernetes nodes/masters, we define a set of machines for GlusterFS and we add them to the group `[gfs-cluster]`, which in turn is added to the larger `[network-storage]` group as a child group.
Change that file to reflect your local setup (adding more machines or removing them and setting the adequate ip numbers), and save it to `inventory/sample/k8s_gfs_inventory`. Make sure that the settings on `inventory/sample/group_vars/all.yml` make sense with your deployment. Then execute change to the kubespray root folder, and execute (supposing that the machines are all using ubuntu):
If your machines are not using Ubuntu, you need to change the `--user=ubuntu` to the correct user. Alternatively, if your Kubernetes machines are using one OS and your GlusterFS a different one, you can instead specify the `ansible_ssh_user=<correct-user>` variable in the inventory file that you just created, for each machine/VM:
First step is to fill in a `my-kubespray-gluster-cluster.tfvars` file with the specification desired for your cluster. An example with all required variables would look like:
As explained in the general terraform/openstack guide, you need to source your OpenStack credentials file, add your ssh-key to the ssh-agent and setup environment variables for terraform:
```shell
$ source ~/.stackrc
$ eval$(ssh-agent -s)
$ ssh-add ~/.ssh/my-desired-key
$ echo Setting up Terraform creds &&\
exportTF_VAR_username=${OS_USERNAME}&&\
exportTF_VAR_password=${OS_PASSWORD}&&\
exportTF_VAR_tenant=${OS_TENANT_NAME}&&\
exportTF_VAR_auth_url=${OS_AUTH_URL}
```
Then, standing on the kubespray directory (root base of the Git checkout), issue the following terraform command to create the VMs for the cluster:
This will create both your Kubernetes and Gluster VMs. Make sure that the ansible file `contrib/terraform/openstack/group_vars/all.yml` includes any ansible variable that you want to setup (like, for instance, the type of machine for bootstrapping).
Then, provision your Kubernetes (kubespray) cluster with the following ansible call:
For GlusterFS to connect between servers, TCP ports `24007`, `24008`, and `24009`/`49152`+ (that port, plus an additional incremented port for each additional server in the cluster; the latter if GlusterFS is version 3.4+), and TCP/UDP port `111` must be open. You can open these using whatever firewall you wish (this can easily be configured using the `geerlingguy.firewall` role).
This role performs basic installation and setup of Gluster, but it does not configure or mount bricks (volumes), since that step is easier to do in a series of plays in your own playbook. Ansible 1.9+ includes the [`gluster_volume`](https://docs.ansible.com/ansible/latest/collections/gluster/gluster/gluster_volume_module.html) module to ease the management of Gluster volumes.
## Role Variables
Available variables are listed below, along with default values (see `defaults/main.yml`):
```yaml
glusterfs_default_release:""
```
You can specify a `default_release` for apt on Debian/Ubuntu by overriding this variable. This is helpful if you need a different package or version for the main GlusterFS packages (e.g. GlusterFS 3.5.x instead of 3.2.x with the `wheezy-backports` default release on Debian Wheezy).
```yaml
glusterfs_ppa_use:true
glusterfs_ppa_version:"3.5"
```
For Ubuntu, specify whether to use the official Gluster PPA, and which version of the PPA to use. See Gluster's [Getting Started Guide](https://docs.gluster.org/en/latest/Quick-Start-Guide/Quickstart/) for more info.
## Dependencies
None.
## Example Playbook
```yaml
- hosts:server
roles:
- geerlingguy.glusterfs
```
For a real-world use example, read through [Simple GlusterFS Setup with Ansible](http://www.jeffgeerling.com/blog/simple-glusterfs-setup-ansible), a blog post by this role's author, which is included in Chapter 8 of [Ansible for DevOps](https://www.ansiblefordevops.com/).
## License
MIT / BSD
## Author Information
This role was created in 2015 by [Jeff Geerling](http://www.jeffgeerling.com/), author of [Ansible for DevOps](https://www.ansiblefordevops.com/).
- name:Ensure GlusterFS is started and enabled at boot.
service:
name:"{{ glusterfs_daemon }}"
state:started
enabled:true
- name:Ensure Gluster brick and mount directories exist.
file:
path:"{{ item }}"
state:directory
mode:"0775"
with_items:
- "{{ gluster_brick_dir }}"
- "{{ gluster_mount_dir }}"
- name:Configure Gluster volume with replicas
gluster.gluster.gluster_volume:
state:present
name:"{{ gluster_brick_name }}"
brick:"{{ gluster_brick_dir }}"
replicas:"{{ groups['gfs-cluster'] | length }}"
cluster:"{% for item in groups['gfs-cluster'] -%}{{ hostvars[item]['ip'] | default(hostvars[item].ansible_default_ipv4['address']) }}{% if not loop.last %},{% endif %}{%- endfor %}"
host:"{{ inventory_hostname }}"
force:true
run_once:true
when:groups['gfs-cluster'] | length > 1
- name:Configure Gluster volume without replicas
gluster.gluster.gluster_volume:
state:present
name:"{{ gluster_brick_name }}"
brick:"{{ gluster_brick_dir }}"
cluster:"{% for item in groups['gfs-cluster'] -%}{{ hostvars[item]['ip'] | default(hostvars[item].ansible_default_ipv4['address']) }}{% if not loop.last %},{% endif %}{%- endfor %}"
host:"{{ inventory_hostname }}"
force:true
run_once:true
when:groups['gfs-cluster'] | length <= 1
- name:Mount glusterfs to retrieve disk size
ansible.posix.mount:
name:"{{ gluster_mount_dir }}"
src:"{{ ip | default(ansible_default_ipv4['address']) }}:/gluster"
fstype:glusterfs
opts:"defaults,_netdev"
state:mounted
when:groups['gfs-cluster'] is defined and inventory_hostname == groups['gfs-cluster'][0]
- name:Get Gluster disk size
setup:
filter:ansible_mounts
register:mounts_data
when:groups['gfs-cluster'] is defined and inventory_hostname == groups['gfs-cluster'][0]
- name:Set Gluster disk size to variable
set_fact:
gluster_disk_size_gb:"{{ (mounts_data.ansible_facts.ansible_mounts | selectattr('mount', 'equalto', gluster_mount_dir) | map(attribute='size_total') | first | int / (1024 * 1024 * 1024)) | int }}"
when:groups['gfs-cluster'] is defined and inventory_hostname == groups['gfs-cluster'][0]
- name:Create file on GlusterFS
template:
dest:"{{ gluster_mount_dir }}/.test-file.txt"
src:test-file.txt
mode:"0644"
when:groups['gfs-cluster'] is defined and inventory_hostname == groups['gfs-cluster'][0]
- name:Unmount glusterfs
ansible.posix.mount:
name:"{{ gluster_mount_dir }}"
fstype:glusterfs
src:"{{ ip | default(ansible_default_ipv4['address']) }}:/gluster"
state:unmounted
when:groups['gfs-cluster'] is defined and inventory_hostname == groups['gfs-cluster'][0]
when:inventory_hostname == groups['kube_control_plane'][0] and groups['gfs-cluster'] is defined and hostvars[groups['gfs-cluster'][0]].gluster_disk_size_gb is defined
- name:Kubernetes Apps | Set GlusterFS endpoint and PV
# Deploy Heketi/Glusterfs into Kubespray/Kubernetes
This playbook aims to automate [this](https://github.com/heketi/heketi/blob/master/docs/admin/install-kubernetes.md) tutorial. It deploys heketi/glusterfs into kubernetes and sets up a storageclass.
## Important notice
> Due to resource limits on the current project maintainers and general lack of contributions we are considering placing Heketi into a [near-maintenance mode](https://github.com/heketi/heketi#important-notice)
## Client Setup
Heketi provides a CLI that provides users with a means to administer the deployment and configuration of GlusterFS in Kubernetes. [Download and install the heketi-cli](https://github.com/heketi/heketi/releases) on your client machine.
## Install
Copy the inventory.yml.sample over to inventory/sample/k8s_heketi_inventory.yml and change it according to your setup.
An SSH keypair is required so Ansible can access the newly provisioned nodes (Equinix Metal hosts). By default, the public SSH key defined in cluster.tfvars will be installed in authorized_key on the newly provisioned nodes (~/.ssh/id_rsa.pub). Terraform will upload this public key and then it will be distributed out to all the nodes. If you have already set this public key in Equinix Metal (i.e. via the portal), then set the public keyfile name in cluster.tfvars to blank to prevent the duplicate key from being uploaded which will cause an error.
If you don't already have a keypair generated (~/.ssh/id_rsa and ~/.ssh/id_rsa.pub), then a new keypair can be generated with the command:
```ShellSession
ssh-keygen -f ~/.ssh/id_rsa
```
## Terraform
Terraform will be used to provision all of the Equinix Metal resources with base software as appropriate.
### Configuration
#### Inventory files
Create an inventory directory for your cluster by copying the existing sample and linking the `hosts` script (used to build the inventory based on Terraform state):
The Equinix Metal Project ID associated with the key will be set later in `cluster.tfvars`.
For more information about the API, please see [Equinix Metal API](https://metal.equinix.com/developers/api/).
For more information about terraform provider authentication, please see [the equinix provider documentation](https://registry.terraform.io/providers/equinix/equinix/latest/docs).
Example:
```ShellSession
export METAL_AUTH_TOKEN="Example-API-Token"
```
Note that to deploy several clusters within the same project you need to use [terraform workspace](https://www.terraform.io/docs/state/workspaces.html#using-workspaces).
#### Cluster variables
The construction of the cluster is driven by values found in
[variables.tf](variables.tf).
For your cluster, edit `inventory/$CLUSTER/cluster.tfvars`.
The `cluster_name` is used to set a tag on each server deployed as part of this cluster.
This helps when identifying which hosts are associated with each cluster.
While the defaults in variables.tf will successfully deploy a cluster, it is recommended to set the following values:
- cluster_name = the name of the inventory directory created above as $CLUSTER
- equinix_metal_project_id = the Equinix Metal Project ID associated with the Equinix Metal API token above
#### Enable localhost access
Kubespray will pull down a Kubernetes configuration file to access this cluster by enabling the
`kubeconfig_localhost: true` in the Kubespray configuration.
Edit `inventory/$CLUSTER/group_vars/k8s_cluster/k8s_cluster.yml` and comment back in the following line and change from `false` to `true`:
`\# kubeconfig_localhost: false`
becomes:
`kubeconfig_localhost: true`
Once the Kubespray playbooks are run, a Kubernetes configuration file will be written to the local host at `inventory/$CLUSTER/artifacts/admin.conf`
#### Terraform state files
In the cluster's inventory folder, the following files might be created (either by Terraform
or manually), to prevent you from pushing them accidentally they are in a
`.gitignore` file in the `contrib/terraform/equinix` directory :
-`.terraform`
-`.tfvars`
-`.tfstate`
-`.tfstate.backup`
-`.lock.hcl`
You can still add them manually if you want to.
### Initialization
Before Terraform can operate on your cluster you need to install the required
If you've started the Ansible run, it may also be a good idea to do some manual cleanup:
- Remove SSH keys from the destroyed cluster from your `~/.ssh/known_hosts` file
- Clean up any temporary cache files: `rm /tmp/$CLUSTER-*`
### Debugging
You can enable debugging output from Terraform by setting `TF_LOG` to `DEBUG` before running the Terraform command.
## Ansible
### Node access
#### SSH
Ensure your local ssh-agent is running and your ssh key has been added. This
step is required by the terraform provisioner:
```ShellSession
eval $(ssh-agent -s)
ssh-add ~/.ssh/id_rsa
```
If you have deployed and destroyed a previous iteration of your cluster, you will need to clear out any stale keys from your SSH "known hosts" file ( `~/.ssh/known_hosts`).
#### Test access
Make sure you can connect to the hosts. Note that Flatcar Container Linux by Kinvolk will have a state `FAILED` due to Python not being present. This is okay, because Python will be installed during bootstrapping, so long as the hosts are not `UNREACHABLE`.
```ShellSession
$ ansible -i inventory/$CLUSTER/hosts -m ping all
example-k8s_node-1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
example-etcd-1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
example-k8s-master-1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
```
If it fails try to connect manually via SSH. It could be something as simple as a stale host key.
This will take some time as there are many tasks to run.
## Kubernetes
### Set up kubectl
- [Install kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) on the localhost.
- Verify that Kubectl runs correctly
```ShellSession
kubectl version
```
- Verify that the Kubernetes configuration file has been copied over
```ShellSession
cat inventory/alpha/$CLUSTER/admin.conf
```
- Verify that all the nodes are running correctly.
```ShellSession
kubectl version
kubectl --kubeconfig=inventory/$CLUSTER/artifacts/admin.conf get nodes
```
## What's next
Try out your new Kubernetes cluster with the [Hello Kubernetes service](https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/).
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.