modify doc structure and update existing doc-links as preparation for new doc generation script

This commit is contained in:
Payback159
2024-05-15 19:32:51 +02:00
parent 0b464b5239
commit 4dbfd42f1d
82 changed files with 70 additions and 70 deletions

17
docs/advanced/arch.md Normal file
View File

@@ -0,0 +1,17 @@
# Architecture compatibility
The following table shows the impact of the CPU architecture on compatible features:
- amd64: Cluster using only x86/amd64 CPUs
- arm64: Cluster using only arm64 CPUs
- amd64 + arm64: Cluster with a mix of x86/amd64 and arm64 CPUs
| kube_network_plugin | amd64 | arm64 | amd64 + arm64 |
|---------------------|-------|-------|---------------|
| Calico | Y | Y | Y |
| Weave | Y | Y | Y |
| Flannel | Y | N | N |
| Canal | Y | N | N |
| Cilium | Y | Y | N |
| Contib | Y | N | N |
| kube-router | Y | N | N |

View File

@@ -0,0 +1,196 @@
# Installation Guide
- [Installation Guide](#installation-guide)
- [Kubernetes TLS Root CA Certificate/Key Secret](#kubernetes-tls-root-ca-certificatekey-secret)
- [Securing Ingress Resources](#securing-ingress-resources)
- [Create New TLS Root CA Certificate and Key](#create-new-tls-root-ca-certificate-and-key)
- [Install Cloudflare PKI/TLS `cfssl` Toolkit.](#install-cloudflare-pkitls-cfssl-toolkit)
- [Create Root Certificate Authority (CA) Configuration File](#create-root-certificate-authority-ca-configuration-file)
- [Create Certficate Signing Request (CSR) Configuration File](#create-certficate-signing-request-csr-configuration-file)
- [Create TLS Root CA Certificate and Key](#create-tls-root-ca-certificate-and-key)
Cert-Manager is a native Kubernetes certificate management controller. It can help with issuing certificates from a variety of sources, such as Lets Encrypt, HashiCorp Vault, Venafi, a simple signing key pair, or self signed. It will ensure certificates are valid and up to date, and attempt to renew certificates at a configured time before expiry.
## Kubernetes TLS Root CA Certificate/Key Secret
If you're planning to secure your ingress resources using TLS client certificates, you'll need to create and deploy the Kubernetes `ca-key-pair` secret consisting of the Root CA certificate and key to your K8s cluster.
For further information, read the official [Cert-Manager CA Configuration](https://cert-manager.io/docs/configuration/ca/) doc.
`cert-manager` can now be enabled by editing your K8s cluster addons inventory e.g. `inventory\sample\group_vars\k8s_cluster\addons.yml` and setting `cert_manager_enabled` to true.
```ini
# Cert manager deployment
cert_manager_enabled: true
```
If you don't have a TLS Root CA certificate and key available, you can create these by following the steps outlined in section [Create New TLS Root CA Certificate and Key](#create-new-tls-root-ca-certificate-and-key) using the Cloudflare PKI/TLS `cfssl` toolkit. TLS Root CA certificates and keys can also be created using `ssh-keygen` and OpenSSL, if `cfssl` is not available.
## Securing Ingress Resources
A common use-case for cert-manager is requesting TLS signed certificates to secure your ingress resources. This can be done by simply adding annotations to your Ingress resources and cert-manager will facilitate creating the Certificate resource for you. A small sub-component of cert-manager, ingress-shim, is responsible for this.
To enable the Nginx Ingress controller as part of your Kubespray deployment, simply edit your K8s cluster addons inventory e.g. `inventory\sample\group_vars\k8s_cluster\addons.yml` and set `ingress_nginx_enabled` to true.
```ini
# Nginx ingress controller deployment
ingress_nginx_enabled: true
```
For example, if you're using the Nginx ingress controller, you can secure the Prometheus ingress by adding the annotation `cert-manager.io/cluster-issuer: ca-issuer` and the `spec.tls` section to the `Ingress` resource definition.
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-k8s
namespace: monitoring
labels:
prometheus: k8s
annotations:
kubernetes.io/ingress.class: "nginx"
cert-manager.io/cluster-issuer: ca-issuer
spec:
tls:
- hosts:
- prometheus.example.com
secretName: prometheus-dashboard-certs
rules:
- host: prometheus.example.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: prometheus-k8s
port:
name: web
```
Once deployed to your K8s cluster, every 3 months cert-manager will automatically rotate the Prometheus `prometheus.example.com` TLS client certificate and key, and store these as the Kubernetes `prometheus-dashboard-certs` secret.
Please consult the official upstream documentation:
- [cert-manager Ingress Usage](https://cert-manager.io/v1.5-docs/usage/ingress/)
- [cert-manager Ingress Tutorial](https://cert-manager.io/v1.5-docs/tutorials/acme/ingress/#step-3-assign-a-dns-name)
### ACME
The ACME Issuer type represents a single account registered with the Automated Certificate Management Environment (ACME) Certificate Authority server. When you create a new ACME Issuer, cert-manager will generate a private key which is used to identify you with the ACME server.
Certificates issued by public ACME servers are typically trusted by clients computers by default. This means that, for example, visiting a website that is backed by an ACME certificate issued for that URL, will be trusted by default by most clients web browsers. ACME certificates are typically free.
- [ACME Configuration](https://cert-manager.io/v1.5-docs/configuration/acme/)
- [ACME HTTP Validation](https://cert-manager.io/v1.5-docs/tutorials/acme/http-validation/)
- [HTTP01 Challenges](https://cert-manager.io/v1.5-docs/configuration/acme/http01/)
- [ACME DNS Validation](https://cert-manager.io/v1.5-docs/tutorials/acme/dns-validation/)
- [DNS01 Challenges](https://cert-manager.io/v1.5-docs/configuration/acme/dns01/)
- [ACME FAQ](https://cert-manager.io/v1.5-docs/faq/acme/)
#### ACME With An Internal Certificate Authority
The ACME Issuer with an internal certificate authority requires cert-manager to trust the certificate authority. This trust must be done at the cert-manager deployment level.
To add a trusted certificate authority to cert-manager, add it's certificate to `group_vars/k8s-cluster/addons.yml`:
```yaml
cert_manager_trusted_internal_ca: |
-----BEGIN CERTIFICATE-----
[REPLACE with your CA certificate]
-----END CERTIFICATE-----
```
Once the CA is trusted, you can define your issuer normally.
### Create New TLS Root CA Certificate and Key
#### Install Cloudflare PKI/TLS `cfssl` Toolkit
e.g. For Ubuntu/Debian distributions, the toolkit is part of the `golang-cfssl` package.
```shell
sudo apt-get install -y golang-cfssl
```
#### Create Root Certificate Authority (CA) Configuration File
The default TLS certificate expiry time period is `8760h` which is 1 years from the date the certificate is created.
```shell
$ cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": ["signing", "key encipherment", "server auth", "client auth"],
"expiry": "8760h"
}
}
}
}
EOF
```
#### Create Certficate Signing Request (CSR) Configuration File
The TLS certificate `names` details can be updated to your own specific requirements.
```shell
$ cat > ca-csr.json <<EOF
{
"CN": "Kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "US",
"L": "Portland",
"O": "Kubernetes",
"OU": "CA",
"ST": "Oregon"
}
]
}
EOF
```
#### Create TLS Root CA Certificate and Key
```shell
$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca
ca.pem
ca-key.pem
```
Check the TLS Root CA certificate has the correct `Not Before` and `Not After` dates, and ensure it is indeed a valid Certificate Authority with the X509v3 extension `CA:TRUE`.
```shell
$ openssl x509 -text -noout -in ca.pem
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
6a:d4:d8:48:7f:98:4f:54:68:9a:e1:73:02:fa:d0:41:79:25:08:49
Signature Algorithm: sha256WithRSAEncryption
Issuer: C = US, ST = Oregon, L = Portland, O = Kubernetes, OU = CA, CN = Kubernetes
Validity
Not Before: Jul 10 15:21:00 2020 GMT
Not After : Jul 9 15:21:00 2025 GMT
Subject: C = US, ST = Oregon, L = Portland, O = Kubernetes, OU = CA, CN = Kubernetes
Subject Public Key Info:
...
X509v3 extensions:
X509v3 Key Usage: critical
Certificate Sign, CRL Sign
X509v3 Basic Constraints: critical
CA:TRUE
X509v3 Subject Key Identifier:
D4:38:B5:E2:26:49:5E:0D:E3:DC:D9:70:73:3B:C4:19:6A:43:4A:F2
...
```

329
docs/advanced/dns-stack.md Normal file
View File

@@ -0,0 +1,329 @@
# K8s DNS stack by Kubespray
For K8s cluster nodes, Kubespray configures a [Kubernetes DNS](https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/)
[cluster add-on](https://releases.k8s.io/master/cluster/addons/README.md)
to serve as an authoritative DNS server for a given ``dns_domain`` and its
``svc, default.svc`` default subdomains (a total of ``ndots: 5`` max levels).
Other nodes in the inventory, like external storage nodes or a separate etcd cluster
node group, considered non-cluster and left up to the user to configure DNS resolve.
## DNS variables
There are several global variables which can be used to modify DNS settings:
### ndots
ndots value to be used in ``/etc/resolv.conf``
It is important to note that multiple search domains combined with high ``ndots``
values lead to poor performance of DNS stack, so please choose it wisely.
## dns_timeout
timeout value to be used in ``/etc/resolv.conf``
## dns_attempts
attempts value to be used in ``/etc/resolv.conf``
### searchdomains
Custom search domains to be added in addition to the cluster search domains (``default.svc.{{ dns_domain }}, svc.{{ dns_domain }}``).
Most Linux systems limit the total number of search domains to 6 and the total length of all search domains
to 256 characters. Depending on the length of ``dns_domain``, you're limited to less than the total limit.
`remove_default_searchdomains: true` will remove the default cluster search domains.
Please note that ``resolvconf_mode: docker_dns`` will automatically add your systems search domains as
additional search domains. Please take this into the accounts for the limits.
### nameservers
This variable is only used by ``resolvconf_mode: host_resolvconf``. These nameservers are added to the hosts
``/etc/resolv.conf`` *after* ``upstream_dns_servers`` and thus serve as backup nameservers. If this variable
is not set, a default resolver is chosen (depending on cloud provider or 8.8.8.8 when no cloud provider is specified).
### upstream_dns_servers
DNS servers to be added *after* the cluster DNS. Used by all ``resolvconf_mode`` modes. These serve as backup
DNS servers in early cluster deployment when no cluster DNS is available yet.
### dns_upstream_forward_extra_opts
Whether or not upstream DNS servers come from `upstream_dns_servers` variable or /etc/resolv.conf, related forward block in coredns (and nodelocaldns) configuration can take options (see <https://coredns.io/plugins/forward/> for details).
These are configurable in inventory in as a dictionary in the `dns_upstream_forward_extra_opts` variable.
By default, no other option than the ones hardcoded (see `roles/kubernetes-apps/ansible/templates/coredns-config.yml.j2` and `roles/kubernetes-apps/ansible/templates/nodelocaldns-config.yml.j2`).
### coredns_kubernetes_extra_opts
Custom options to be added to the kubernetes coredns plugin.
### coredns_kubernetes_extra_domains
Extra domains to be forwarded to the kubernetes coredns plugin.
### coredns_rewrite_block
[Rewrite](https://coredns.io/plugins/rewrite/) plugin block to perform internal message rewriting.
### coredns_external_zones
Array of optional external zones to coredns forward queries to. It's injected into
`coredns`' config file before default kubernetes zone. Use it as an optimization for well-known zones and/or internal-only
domains, i.e. VPN for internal networks (default is unset)
Example:
```yaml
coredns_external_zones:
- zones:
- example.com
- example.io:1053
nameservers:
- 1.1.1.1
- 2.2.2.2
cache: 5
- zones:
- https://mycompany.local:4453
nameservers:
- 192.168.0.53
cache: 0
- zones:
- mydomain.tld
nameservers:
- 10.233.0.3
cache: 5
rewrite:
- name stop website.tld website.namespace.svc.cluster.local
```
or as INI
```ini
coredns_external_zones='[{"cache": 30,"zones":["example.com","example.io:453"],"nameservers":["1.1.1.1","2.2.2.2"]}]'
```
### dns_etchosts (coredns)
Optional hosts file content to coredns use as /etc/hosts file. This will also be used by nodelocaldns, if enabled.
Example:
```yaml
dns_etchosts: |
192.168.0.100 api.example.com
192.168.0.200 ingress.example.com
```
### enable_coredns_reverse_dns_lookups
Whether reverse DNS lookups are enabled in the coredns config. Defaults to `true`.
### CoreDNS default zone cache plugin
If you wish to configure the caching behaviour of CoreDNS on the default zone, you can do so using the `coredns_default_zone_cache_block` string block.
An example value (more information on the [plugin's documentation](https://coredns.io/plugins/cache/)) to:
* raise the max cache TTL to 3600 seconds
* raise the max amount of success responses to cache to 3000
* disable caching of denial responses altogether
* enable pre-fetching of lookups with at least 10 lookups per minute before they expire
Would be as follows:
```yaml
coredns_default_zone_cache_block: |
cache 3600 {
success 3000
denial 0
prefetch 10 1m
}
```
### Handle old/extra dns_domains
If you need to change the dns_domain of your cluster for whatever reason (switching to or from `cluster.local` for example),
and you have workloads that embed it in their configuration you can use the variable `old_dns_domains`.
This will add some configuration to coredns and nodelocaldns to ensure the DNS requests using the old domain are handled correctly.
Example:
```yaml
old_dns_domains:
- example1.com
- example2.com
dns_domain: cluster.local
```
will make `my-svc.my-ns.svc.example1.com`, `my-svc.my-ns.svc.example2.com` and `my-svc.my-ns.svc.cluster.local` have the same DNS answer.
### systemd_resolved_disable_stub_listener
Whether or not to set `DNSStubListener=no` when using systemd-resolved. Defaults to `true` on Flatcar.
You might need to set it to `true` if CoreDNS fails to start with `address already in use` errors.
## DNS modes supported by Kubespray
You can modify how Kubespray sets up DNS for your cluster with the variables ``dns_mode`` and ``resolvconf_mode``.
### dns_mode
``dns_mode`` configures how Kubespray will setup cluster DNS. There are four modes available:
#### dns_mode: coredns (default)
This installs CoreDNS as the default cluster DNS for all queries.
#### dns_mode: coredns_dual
This installs CoreDNS as the default cluster DNS for all queries, plus a secondary CoreDNS stack.
#### dns_mode: manual
This does not install coredns, but allows you to specify
`manual_dns_server`, which will be configured on nodes for handling Pod DNS.
Use this method if you plan to install your own DNS server in the cluster after
initial deployment.
#### dns_mode: none
This does not install any of DNS solution at all. This basically disables cluster DNS completely and
leaves you with a non functional cluster.
## resolvconf_mode
``resolvconf_mode`` configures how Kubespray will setup DNS for ``hostNetwork: true`` PODs and non-k8s containers.
There are three modes available:
### resolvconf_mode: host_resolvconf (default)
This activates the classic Kubespray behavior that modifies the hosts ``/etc/resolv.conf`` file and dhclient
configuration to point to the cluster dns server (either coredns or coredns_dual, depending on dns_mode).
As cluster DNS is not available on early deployment stage, this mode is split into 2 stages. In the first
stage (``dns_early: true``), ``/etc/resolv.conf`` is configured to use the DNS servers found in ``upstream_dns_servers``
and ``nameservers``. Later, ``/etc/resolv.conf`` is reconfigured to use the cluster DNS server first, leaving
the other nameservers as backups.
Also note, existing records will be purged from the `/etc/resolv.conf`,
including resolvconf's base/head/cloud-init config files and those that come from dhclient.
### resolvconf_mode: docker_dns
This sets up the docker daemon with additional --dns/--dns-search/--dns-opt flags.
The following nameservers are added to the docker daemon (in the same order as listed here):
* cluster nameserver (depends on dns_mode)
* content of optional upstream_dns_servers variable
* host system nameservers (read from hosts /etc/resolv.conf)
The following search domains are added to the docker daemon (in the same order as listed here):
* cluster domains (``default.svc.{{ dns_domain }}``, ``svc.{{ dns_domain }}``)
* content of optional searchdomains variable
* host system search domains (read from hosts /etc/resolv.conf)
The following dns options are added to the docker daemon
* ndots:{{ ndots }}
* timeout:2
* attempts:2
These dns options can be overridden by setting a different list:
```yaml
docker_dns_options:
- ndots:{{ ndots }}
- timeout:2
- attempts:2
- rotate
```
For normal PODs, k8s will ignore these options and setup its own DNS settings for the PODs, taking
the --cluster_dns (either coredns or coredns_dual, depending on dns_mode) kubelet option into account.
For ``hostNetwork: true`` PODs however, k8s will let docker setup DNS settings. Docker containers which
are not started/managed by k8s will also use these docker options.
The host system name servers are added to ensure name resolution is also working while cluster DNS is not
running yet. This is especially important in early stages of cluster deployment. In this early stage,
DNS queries to the cluster DNS will timeout after a few seconds, resulting in the system nameserver being
used as a backup nameserver. After cluster DNS is running, all queries will be answered by the cluster DNS
servers, which in turn will forward queries to the system nameserver if required.
### resolvconf_mode: none
Does nothing regarding ``/etc/resolv.conf``. This leaves you with a cluster that works as expected in most cases.
The only exception is that ``hostNetwork: true`` PODs and non-k8s managed containers will not be able to resolve
cluster service names.
## Nodelocal DNS cache
Setting ``enable_nodelocaldns`` to ``true`` will make pods reach out to the dns (core-dns) caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking. The local caching agent will query core-dns (depending on what main DNS plugin is configured in your cluster) for cache misses of cluster hostnames(cluster.local suffix by default).
More information on the rationale behind this implementation can be found [here](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/1024-nodelocal-cache-dns/README.md).
**As per the 2.10 release, Nodelocal DNS cache is enabled by default.**
### External zones
It's possible to extent the `nodelocaldns`' configuration by adding an array of external zones. For example:
```yaml
nodelocaldns_external_zones:
- zones:
- example.com
- example.io:1053
nameservers:
- 1.1.1.1
- 2.2.2.2
cache: 5
- zones:
- https://mycompany.local:4453
nameservers:
- 192.168.0.53
```
### dns_etchosts (nodelocaldns)
See [dns_etchosts](#dns_etchosts-coredns) above.
### Nodelocal DNS HA
Under some circumstances the single POD nodelocaldns implementation may not be able to be replaced soon enough and a cluster upgrade or a nodelocaldns upgrade can cause DNS requests to time out for short intervals. If for any reason your applications cannot tolerate this behavior you can enable a redundant nodelocal DNS pod on each node:
```yaml
enable_nodelocaldns_secondary: true
```
**Note:** when the nodelocaldns secondary is enabled, the primary is instructed to no longer tear down the iptables rules it sets up to direct traffic to itself. In case both daemonsets have failing pods on the same node, this can cause a DNS blackout with traffic no longer being forwarded to the coredns central service as a fallback. Please ensure you account for this also if you decide to disable the nodelocaldns cache.
There is a time delta (in seconds) allowed for the secondary nodelocaldns to survive in case both primary and secondary daemonsets are updated at the same time. It is advised to tune this variable after you have performed some tests in your own environment.
```yaml
nodelocaldns_secondary_skew_seconds: 5
```
## Limitations
* Kubespray has yet ways to configure Kubedns addon to forward requests SkyDns can
not answer with authority to arbitrary recursive resolvers. This task is left
for future. See [official SkyDns docs](https://github.com/skynetservices/skydns)
for details.
* There is
[no way to specify a custom value](https://github.com/kubernetes/kubernetes/issues/33554)
for the SkyDNS ``ndots`` param.
* the ``searchdomains`` have a limitation of a 6 names and 256 chars
length. Due to default ``svc, default.svc`` subdomains, the actual
limits are a 4 names and 239 chars respectively. If `remove_default_searchdomains: true`
added you are back to 6 names.
* the ``nameservers`` have a limitation of a 3 servers, although there
is a way to mitigate that with the ``upstream_dns_servers``,
see below. Anyway, the ``nameservers`` can take no more than a two
custom DNS servers because of one slot is reserved for a Kubernetes
cluster needs.

View File

@@ -0,0 +1,41 @@
# Downloading binaries and containers
Kubespray supports several download/upload modes. The default is:
* Each node downloads binaries and container images on its own, which is ``download_run_once: False``.
* For K8s apps, pull policy is ``k8s_image_pull_policy: IfNotPresent``.
* For system managed containers, like kubelet or etcd, pull policy is ``download_always_pull: False``, which is pull if only the wanted repo and tag/sha256 digest differs from that the host has.
There is also a "pull once, push many" mode as well:
* Setting ``download_run_once: True`` will make kubespray download container images and binaries only once and then push them to the cluster nodes. The default download delegate node is the first `kube_control_plane`.
* Set ``download_localhost: True`` to make localhost the download delegate. This can be useful if cluster nodes cannot access external addresses. To use this requires that the container runtime is installed and running on the Ansible master and that the current user is either in the docker group or can do passwordless sudo, to be able to use the container runtime. Note: even if `download_localhost` is false, files will still be copied to the Ansible server (local host) from the delegated download node, and then distributed from the Ansible server to all cluster nodes.
NOTE: When `download_run_once` is true and `download_localhost` is false, all downloads will be done on the delegate node, including downloads for container images that are not required on that node. As a consequence, the storage required on that node will probably be more than if download_run_once was false, because all images will be loaded into the storage of the container runtime on that node, instead of just the images required for that node.
On caching:
* When `download_run_once` is `True`, all downloaded files will be cached locally in `download_cache_dir`, which defaults to `/tmp/kubespray_cache`. On subsequent provisioning runs, this local cache will be used to provision the nodes, minimizing bandwidth usage and improving provisioning time. Expect about 800MB of disk space to be used on the ansible node for the cache. Disk space required for the image cache on the kubernetes nodes is a much as is needed for the largest image, which is currently slightly less than 150MB.
* By default, if `download_run_once` is false, kubespray will not retrieve the downloaded images and files from the download delegate node to the local cache, or use that cache to pre-provision those nodes. If you have a full cache with container images and files and you dont need to download anything, but want to use a cache - set `download_force_cache` to `True`.
* By default, cached images that are used to pre-provision the remote nodes will be deleted from the remote nodes after use, to save disk space. Setting `download_keep_remote_cache` will prevent the files from being deleted. This can be useful while developing kubespray, as it can decrease provisioning times. As a consequence, the required storage for images on the remote nodes will increase from 150MB to about 550MB, which is currently the combined size of all required container images.
Container images and binary files are described by the vars like ``foo_version``,
``foo_download_url``, ``foo_checksum`` for binaries and ``foo_image_repo``,
``foo_image_tag`` or optional ``foo_digest_checksum`` for containers.
Container images may be defined by its repo and tag, for example:
`andyshinn/dnsmasq:2.72`. Or by repo and tag and sha256 digest:
`andyshinn/dnsmasq@sha256:7c883354f6ea9876d176fe1d30132515478b2859d6fc0cbf9223ffdc09168193`.
Note, the SHA256 digest and the image tag must be both specified and correspond
to each other. The given example above is represented by the following vars:
```yaml
dnsmasq_digest_checksum: 7c883354f6ea9876d176fe1d30132515478b2859d6fc0cbf9223ffdc09168193
dnsmasq_image_repo: andyshinn/dnsmasq
dnsmasq_image_tag: '2.72'
```
The full list of available vars may be found in the download's ansible role defaults. Those also allow to specify custom urls and local repositories for binaries and container
images as well. See also the DNS stack docs for the related intranet configuration,
so the hosts can resolve those urls and repos.

20
docs/advanced/gcp-lb.md Normal file
View File

@@ -0,0 +1,20 @@
# GCP Load Balancers for type=LoadBalacer of Kubernetes Services
Google Cloud Platform can be used for creation of Kubernetes Service Load Balancer.
This feature is able to deliver by adding parameters to `kube-controller-manager` and `kubelet`. You need specify:
```ShellSession
--cloud-provider=gce
--cloud-config=/etc/kubernetes/cloud-config
```
To get working it in kubespray, you need to add tag to GCE instances and specify it in kubespray group vars and also set `cloud_provider` to `gce`. So for example, in file `group_vars/all/gcp.yml`:
```yaml
cloud_provider: gce
gce_node_tags: k8s-lb
```
When you will setup it and create SVC in Kubernetes with `type=LoadBalancer`, cloud provider will create public IP and will set firewall.
Note: Cloud provider run under VM service account, so this account needs to have correct permissions to be able to create all GCP resources.

View File

@@ -0,0 +1,108 @@
# Overview
Distributed system such as Kubernetes are designed to be resilient to the
failures. More details about Kubernetes High-Availability (HA) may be found at
[Building High-Availability Clusters](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/)
To have a simple view the most of the parts of HA will be skipped to describe
Kubelet<->Controller Manager communication only.
By default the normal behavior looks like:
1. Kubelet updates it status to apiserver periodically, as specified by
`--node-status-update-frequency`. The default value is **10s**.
2. Kubernetes controller manager checks the statuses of Kubelet every
`-node-monitor-period`. The default value is **5s**.
3. In case the status is updated within `--node-monitor-grace-period` of time,
Kubernetes controller manager considers healthy status of Kubelet. The
default value is **40s**.
> Kubernetes controller manager and Kubelet work asynchronously. It means that
> the delay may include any network latency, API Server latency, etcd latency,
> latency caused by load on one's control plane nodes and so on. So if
> `--node-status-update-frequency` is set to 5s in reality it may appear in
> etcd in 6-7 seconds or even longer when etcd cannot commit data to quorum
> nodes.
## Failure
Kubelet will try to make `nodeStatusUpdateRetry` post attempts. Currently
`nodeStatusUpdateRetry` is constantly set to 5 in
[kubelet.go](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/kubelet.go#L102).
Kubelet will try to update the status in
[tryUpdateNodeStatus](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/kubelet_node_status.go#L312)
function. Kubelet uses `http.Client()` Golang method, but has no specified
timeout. Thus there may be some glitches when API Server is overloaded while
TCP connection is established.
So, there will be `nodeStatusUpdateRetry` * `--node-status-update-frequency`
attempts to set a status of node.
At the same time Kubernetes controller manager will try to check
`nodeStatusUpdateRetry` times every `--node-monitor-period` of time. After
`--node-monitor-grace-period` it will consider node unhealthy. Pods will then be rescheduled based on the
[Taint Based Eviction](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions)
timers that you set on them individually, or the API Server's global timers:`--default-not-ready-toleration-seconds` &
``--default-unreachable-toleration-seconds``.
Kube proxy has a watcher over API. Once pods are evicted, Kube proxy will
notice and will update iptables of the node. It will remove endpoints from
services so pods from failed node won't be accessible anymore.
## Recommendations for different cases
## Fast Update and Fast Reaction
If `--node-status-update-frequency` is set to **4s** (10s is default).
`--node-monitor-period` to **2s** (5s is default).
`--node-monitor-grace-period` to **20s** (40s is default).
`--default-not-ready-toleration-seconds` and ``--default-unreachable-toleration-seconds`` are set to **30**
(300 seconds is default). Note these two values should be integers representing the number of seconds ("s" or "m" for
seconds\minutes are not specified).
In such scenario, pods will be evicted in **50s** because the node will be
considered as down after **20s**, and `--default-not-ready-toleration-seconds` or
``--default-unreachable-toleration-seconds`` occur after **30s** more. However, this scenario creates an overhead on
etcd as every node will try to update its status every 2 seconds.
If the environment has 1000 nodes, there will be 15000 node updates per
minute which may require large etcd containers or even dedicated nodes for etcd.
> If we calculate the number of tries, the division will give 5, but in reality
> it will be from 3 to 5 with `nodeStatusUpdateRetry` attempts of each try. The
> total number of attempts will vary from 15 to 25 due to latency of all
> components.
## Medium Update and Average Reaction
Let's set `--node-status-update-frequency` to **20s**
`--node-monitor-grace-period` to **2m** and `--default-not-ready-toleration-seconds` and
``--default-unreachable-toleration-seconds`` to **60**.
In that case, Kubelet will try to update status every 20s. So, it will be 6 * 5
= 30 attempts before Kubernetes controller manager will consider unhealthy
status of node. After 1m it will evict all pods. The total time will be 3m
before eviction process.
Such scenario is good for medium environments as 1000 nodes will require 3000
etcd updates per minute.
> In reality, there will be from 4 to 6 node update tries. The total number of
> of attempts will vary from 20 to 30.
## Low Update and Slow reaction
Let's set `--node-status-update-frequency` to **1m**.
`--node-monitor-grace-period` will set to **5m** and `--default-not-ready-toleration-seconds` and
``--default-unreachable-toleration-seconds`` to **60**. In this scenario, every kubelet will try to update the status
every minute. There will be 5 * 5 = 25 attempts before unhealthy status. After 5m,
Kubernetes controller manager will set unhealthy status. This means that pods
will be evicted after 1m after being marked unhealthy. (6m in total).
> In reality, there will be from 3 to 5 tries. The total number of attempt will
> vary from 15 to 25.
There can be different combinations such as Fast Update with Slow reaction to
satisfy specific cases.

30
docs/advanced/mitogen.md Normal file
View File

@@ -0,0 +1,30 @@
# Mitogen
*Warning:* Mitogen support is now deprecated in kubespray due to upstream not releasing an updated version to support ansible 4.x (ansible-base 2.11.x) and above. The CI support has been stripped for mitogen and we are no longer validating any support or regressions for it. The supporting mitogen install playbook and integration documentation will be removed in a later version.
[Mitogen for Ansible](https://mitogen.networkgenomics.com/ansible_detailed.html) allow a 1.25x - 7x speedup and a CPU usage reduction of at least 2x, depending on network conditions, modules executed, and time already spent by targets on useful work. Mitogen cannot improve a module once it is executing, it can only ensure the module executes as quickly as possible.
## Install
```ShellSession
ansible-playbook contrib/mitogen/mitogen.yml
```
The above playbook sets the ansible `strategy` and `strategy_plugins` in `ansible.cfg` but you can also enable them if you use your own `ansible.cfg` by setting the environment varialbles:
```ShellSession
export ANSIBLE_STRATEGY=mitogen_linear
export ANSIBLE_STRATEGY_PLUGINS=plugins/mitogen/ansible_mitogen/plugins/strategy
```
... or `ansible.cfg` setup:
```ini
[defaults]
strategy_plugins = plugins/mitogen/ansible_mitogen/plugins/strategy
strategy=mitogen_linear
```
## Limitation
If you are experiencing problems, please see the [documentation](https://mitogen.networkgenomics.com/ansible_detailed.html#noteworthy-differences).

41
docs/advanced/netcheck.md Normal file
View File

@@ -0,0 +1,41 @@
# Network Checker Application
With the ``deploy_netchecker`` var enabled (defaults to false), Kubespray deploys a
Network Checker Application from the 3rd side `mirantis/k8s-netchecker` docker
images. It consists of the server and agents trying to reach the server by usual
for Kubernetes applications network connectivity meanings. Therefore, this
automatically verifies a pod to pod connectivity via the cluster IP and checks
if DNS resolve is functioning as well.
The checks are run by agents on a periodic basis and cover standard and host network
pods as well. The history of performed checks may be found in the agents' application
logs.
To get the most recent and cluster-wide network connectivity report, run from
any of the cluster nodes:
```ShellSession
curl http://localhost:31081/api/v1/connectivity_check
```
Note that Kubespray does not invoke the check but only deploys the application, if
requested.
There are related application specific variables:
```yml
netchecker_port: 31081
agent_report_interval: 15
netcheck_namespace: default
```
Note that the application verifies DNS resolve for FQDNs comprising only the
combination of the ``netcheck_namespace.dns_domain`` vars, for example the
``netchecker-service.default.svc.cluster.local``. If you want to deploy the application
to the non default namespace, make sure as well to adjust the ``searchdomains`` var
so the resulting search domain records to contain that namespace, like:
```yml
search: foospace.cluster.local default.cluster.local ...
nameserver: ...
```

50
docs/advanced/ntp.md Normal file
View File

@@ -0,0 +1,50 @@
# NTP synchronization
The Network Time Protocol (NTP) is a networking protocol for clock synchronization between computer systems. Time synchronization is important to Kubernetes and Etcd.
## Enable the NTP
To start the ntpd(or chrony) service and enable it at system boot. There are related specific variables:
```ShellSession
ntp_enabled: true
```
The NTP service would be enabled and sync time automatically.
## Customize the NTP configure file
In the Air-Gap environment, the node cannot access the NTP server by internet. So the node can use the customized ntp server by configuring ntp file.
```ShellSession
ntp_enabled: true
ntp_manage_config: true
ntp_servers:
- "0.your-ntp-server.org iburst"
- "1.your-ntp-server.org iburst"
- "2.your-ntp-server.org iburst"
- "3.your-ntp-server.org iburst"
```
## Setting the TimeZone
The timezone can also be set by the `ntp_timezone` , eg: "Etc/UTC","Asia/Shanghai". If not set, the timezone will not change.
```ShellSession
ntp_enabled: true
ntp_timezone: Etc/UTC
```
## Advanced Configure
Enable `tinker panic` is useful when running NTP in a VM environment to avoiding clock drift on VMs. It only takes effect when ntp_manage_config is true.
```ShellSession
ntp_tinker_panic: true
```
Force sync time immediately by NTP after the ntp installed, which is useful in newly installed system.
```ShellSession
ntp_force_sync_immediately: true
```

29
docs/advanced/proxy.md Normal file
View File

@@ -0,0 +1,29 @@
# Setting up Environment Proxy
If you set http and https proxy, all nodes and loadbalancer will be excluded from proxy with generating no_proxy variable in `roles/kubespray-defaults/tasks/no_proxy.yml`, if you have additional resources for exclude add them to `additional_no_proxy` variable. If you want fully override your `no_proxy` setting, then fill in just `no_proxy` and no nodes or loadbalancer addresses will be added to no_proxy.
## Set proxy for http and https
`http_proxy:"http://example.proxy.tld:port"`
`https_proxy:"http://example.proxy.tld:port"`
## Set custom CA
CA must be already on each target nodes
`https_proxy_cert_file: /path/to/host/custom/ca.crt`
## Set default no_proxy (this will override default no_proxy generation)
`no_proxy: "node1,node1_ip,node2,node2_ip...additional_host"`
## Set additional addresses to default no_proxy (all cluster nodes and loadbalancer)
`additional_no_proxy: "additional_host1,additional_host2"`
## Exclude workers from no_proxy
Since workers are included in the no_proxy variable, by default, docker engine will be restarted on all nodes (all
pods will restart) when adding or removing workers. To override this behaviour by only including control plane nodes in the
no_proxy variable, set:
`no_proxy_exclude_workers: true`

244
docs/advanced/registry.md Normal file
View File

@@ -0,0 +1,244 @@
# Private Docker Registry in Kubernetes
Kubernetes offers an optional private Docker registry addon, which you can turn
on when you bring up a cluster or install later. This gives you a place to
store truly private Docker images for your cluster.
## How it works
The private registry runs as a `Pod` in your cluster. It does not currently
support SSL or authentication, which triggers Docker's "insecure registry"
logic. To work around this, we run a proxy on each node in the cluster,
exposing a port onto the node (via a hostPort), which Docker accepts as
"secure", since it is accessed by `localhost`.
## Turning it on
Some cluster installs (e.g. GCE) support this as a cluster-birth flag. The
`ENABLE_CLUSTER_REGISTRY` variable in `cluster/gce/config-default.sh` governs
whether the registry is run or not. To set this flag, you can specify
`KUBE_ENABLE_CLUSTER_REGISTRY=true` when running `kube-up.sh`. If your cluster
does not include this flag, the following steps should work. Note that some of
this is cloud-provider specific, so you may have to customize it a bit.
### Make some storage
The primary job of the registry is to store data. To do that we have to decide
where to store it. For cloud environments that have networked storage, we can
use Kubernetes's `PersistentVolume` abstraction. The following template is
expanded by `salt` in the GCE cluster turnup, but can easily be adapted to
other situations:
```yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: kube-system-kube-registry-pv
spec:
{% if pillar.get('cluster_registry_disk_type', '') == 'gce' %}
capacity:
storage: {{ pillar['cluster_registry_disk_size'] }}
accessModes:
- ReadWriteOnce
gcePersistentDisk:
pdName: "{{ pillar['cluster_registry_disk_name'] }}"
fsType: "ext4"
{% endif %}
```
If, for example, you wanted to use NFS you would just need to change the
`gcePersistentDisk` block to `nfs`. See
[here](https://kubernetes.io/docs/concepts/storage/volumes/) for more details on volumes.
Note that in any case, the storage (in the case the GCE PersistentDisk) must be
created independently - this is not something Kubernetes manages for you (yet).
### I don't want or don't have persistent storage
If you are running in a place that doesn't have networked storage, or if you
just want to kick the tires on this without committing to it, you can easily
adapt the `ReplicationController` specification below to use a simple
`emptyDir` volume instead of a `persistentVolumeClaim`.
## Claim the storage
Now that the Kubernetes cluster knows that some storage exists, you can put a
claim on that storage. As with the `PersistentVolume` above, you can start
with the `salt` template:
```yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: kube-registry-pvc
namespace: kube-system
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: {{ pillar['cluster_registry_disk_size'] }}
```
This tells Kubernetes that you want to use storage, and the `PersistentVolume`
you created before will be bound to this claim (unless you have other
`PersistentVolumes` in which case those might get bound instead). This claim
gives you the right to use this storage until you release the claim.
## Run the registry
Now we can run a Docker registry:
```yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-registry-v0
namespace: kube-system
labels:
k8s-app: registry
version: v0
spec:
replicas: 1
selector:
k8s-app: registry
version: v0
template:
metadata:
labels:
k8s-app: registry
version: v0
spec:
containers:
- name: registry
image: registry:2
resources:
limits:
cpu: 100m
memory: 100Mi
env:
- name: REGISTRY_HTTP_ADDR
value: :5000
- name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
value: /var/lib/registry
volumeMounts:
- name: image-store
mountPath: /var/lib/registry
ports:
- containerPort: 5000
name: registry
protocol: TCP
volumes:
- name: image-store
persistentVolumeClaim:
claimName: kube-registry-pvc
```
*Note:* that if you have set multiple replicas, make sure your CSI driver has support for the `ReadWriteMany` accessMode.
## Expose the registry in the cluster
Now that we have a registry `Pod` running, we can expose it as a Service:
```yaml
apiVersion: v1
kind: Service
metadata:
name: kube-registry
namespace: kube-system
labels:
k8s-app: registry
kubernetes.io/name: "KubeRegistry"
spec:
selector:
k8s-app: registry
ports:
- name: registry
port: 5000
protocol: TCP
```
## Expose the registry on each node
Now that we have a running `Service`, we need to expose it onto each Kubernetes
`Node` so that Docker will see it as `localhost`. We can load a `Pod` on every
node by creating following daemonset.
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-registry-proxy
namespace: kube-system
labels:
k8s-app: kube-registry-proxy
version: v0.4
spec:
template:
metadata:
labels:
k8s-app: kube-registry-proxy
kubernetes.io/name: "kube-registry-proxy"
version: v0.4
spec:
containers:
- name: kube-registry-proxy
image: gcr.io/google_containers/kube-registry-proxy:0.4
resources:
limits:
cpu: 100m
memory: 50Mi
env:
- name: REGISTRY_HOST
value: kube-registry.kube-system.svc.cluster.local
- name: REGISTRY_PORT
value: "5000"
ports:
- name: registry
containerPort: 80
hostPort: 5000
```
When modifying replication-controller, service and daemon-set definitions, take
care to ensure *unique* identifiers for the rc-svc couple and the daemon-set.
Failing to do so will have register the localhost proxy daemon-sets to the
upstream service. As a result they will then try to proxy themselves, which
will, for obvious reasons, not work.
This ensures that port 5000 on each node is directed to the registry `Service`.
You should be able to verify that it is running by hitting port 5000 with a web
browser and getting a 404 error:
```ShellSession
$ curl localhost:5000
404 page not found
```
## Using the registry
To use an image hosted by this registry, simply say this in your `Pod`'s
`spec.containers[].image` field:
```yaml
image: localhost:5000/user/container
```
Before you can use the registry, you have to be able to get images into it,
though. If you are building an image on your Kubernetes `Node`, you can spell
out `localhost:5000` when you build and push. More likely, though, you are
building locally and want to push to your cluster.
You can use `kubectl` to set up a port-forward from your local node to a
running Pod:
```ShellSession
$ POD=$(kubectl get pods --namespace kube-system -l k8s-app=registry \
-o template --template '{{range .items}}{{.metadata.name}} {{.status.phase}}{{"\n"}}{{end}}' \
| grep Running | head -1 | cut -f1 -d' ')
$ kubectl port-forward --namespace kube-system $POD 5000:5000 &
```
Now you can build and push images on your local computer as
`localhost:5000/yourname/container` and those images will be available inside
your kubernetes cluster with the same name.