* Fix broken CI jobs
Adjust image and image_family scenarios for debian.
Checkout CI file for upgrades
* add debugging to file download
* Fix download for alternate playbooks
* Update ansible ssh args to force ssh user
* Update sync_container.yml
* Refactor downloads to use download role directly
Also disable fact delegation so download delegate works acros OSes.
* clean up bools and ansible_os_family conditionals
* Update main.yml
Needs to set up resolv.conf before updating Yum cache otherwise no name resolution available (resolv.conf empty).
* Update main.yml
Removing trailing spaces
* Add possibility to insert more ip adresses in certificates
* Add newline at end of files
* Move supp ip parameters to k8s-cluster group file
* Add supplementary addresses in kubeadm master role
* Improve openssl indexes
* don't try to install this rpm on fedora atomic
* add docker 1.13.1 for fedora
* built-in docker unit file is sufficient, as tested on both fedora and centos atomic
* Change file used to check kubeadm upgrade method
Test for ca.crt instead of admin.conf because admin.conf
is created during normal deployment.
* more fixes for upgrade
In 1.8, the Node authorization mode should be listed first to
allow kubelet to access secrets. This seems to only impact
environments with cloudprovider enabled.
* Changre raw execution to use yum module
Changed raw exection to use yum module provided by Ansible.
* Replace ansible_ssh_* by ansible_*
Ansible 2.0 has deprecated the “ssh” from ansible_ssh_user, ansible_ssh_host, and ansible_ssh_port to become ansible_user, ansible_host, and ansible_port. If you are using a version of Ansible prior to 2.0, you should continue using the older style variables (ansible_ssh_*). These shorter variables are ignored, without warning, in older versions of Ansible.
I am not sure about the broader impact of this change. But I have seen on the requirements the version required is ansible>=2.4.0.
http://docs.ansible.com/ansible/latest/intro_inventory.html
This role only support Red Hat type distros and is not maintained
or used by many users. It should be removed because it creates
feature disparity between supported OSes and is not maintained.
* Rename dns_server to dnsmasq_dns_server so that it includes role prefix
as the var name is generic and conflicts when integrating with existing ansible automation.
* Enable selinux state to be configurable with new var preinstall_selinux_state
PID namespace sharing is disabled only in Kubernetes 1.7.
Explicitily enabling it by default could help reduce unexpected
results when upgrading to or downgrading from 1.7.
The value cannot be determined properly via local facts, so
checking k8s api is the most reliable way to look up what hostname
is used when using a cloudprovider.
This follows pull request #1677, adding the cgroup-driver
autodetection also for kubeadm way of deploying.
Info about this and the possibility to override is added to the docs.
Red Hat family platforms run docker daemon with `--exec-opt
native.cgroupdriver=systemd`. When kubespray tried to start kubelet
service, it failed with:
Error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
Setting kubelet's cgroup driver to the correct value for the platform
fixes this issue. The code utilizes autodetection of docker's cgroup
driver, as different RPMs for the same distro may vary in that regard.
New files: /etc/kubernetes/admin.conf
/root/.kube/config
$GITDIR/artifacts/{kubectl,admin.conf}
Optional method to download kubectl and admin.conf if
kubeconfig_lcoalhost is set to true (default false)
* kubeadm support
* move k8s master to a subtask
* disable k8s secrets when using kubeadm
* fix etcd cert serial var
* move simple auth users to master role
* make a kubeadm-specific env file for kubelet
* add non-ha CI job
* change ci boolean vars to json format
* fixup
* Update create-gce.yml
* Update create-gce.yml
* Update create-gce.yml
* Fix netchecker update side effect
kubectl apply should only be used on resources created
with kubectl apply. To workaround this, we should apply
the old manifest before upgrading it.
* Update 030_check-network.yml
* Add option for fact cache expiry
By adding the `fact_caching_timeout` we avoid having really stale/invalid data ending up in there.
Leaving commented out by default, for backwards compatibility, but nice to have there.
* Enabled cache-expiry by default
Set to 2 hours and modified comment to reflect change
* Add comment line and documentation for bastion host usage
* Take out unneeded sudo parm
* Remove blank lines
* revert changes
* take out disabling of strict host checking
This sets br_netfilter and net.bridge.bridge-nf-call-iptables sysctl from a single play before kube-proxy is first ran instead of from the flannel and weave network_plugin roles after kube-proxy is started
the uploads.yml playbook was broken with checksum mismatch errors in
various kubespray commits, for example, 3bfad5ca73
which updated the version from 3.0.6 to 3.0.17 without updating the
corresponding checksums.
This trigger ensures the inventory file is kept up-to-date. Otherwise, if the file exists and you've made changes to your terraform-managed infra without having deleted the file, it would never get updated.
For example, consider the case where you've destroyed and re-applied the terraform resources, none of the IPs would get updated, so ansible would be trying to connect to the old ones.
* using separated vault roles for generate certs with different `O` (Organization) subject field;
* configure vault roles for issuing certificates with different `CN` (Common name) subject field;
* set `CN` and `O` to `kubernetes` and `etcd` certificates;
* vault/defaults vars definition was simplified;
* vault dirs variables defined in kubernetes-defaults foles for using
shared tasks in etcd and kubernetes/secrets roles;
* upgrade vault to 0.8.1;
* generate random vault user password for each role by default;
* fix `serial` file name for vault certs;
* move vault auth request to issue_cert tasks;
* enable `RBAC` in vault CI;
* Use kubectl apply instead of create/replace
Disable checks for existing resources to speed up execution.
* Fix non-rbac deployment of resources as a list
* Fix autoscaler tolerations field
* set all kube resources to state=latest
* Update netchecker and weave
* Added update CA trust step for etcd and kube/secrets roles
* Added load_balancer_domain_name to certificate alt names if defined. Reset CA's in RedHat os.
* Rename kube-cluster-ca.crt to vault-ca.crt, we need separated CA`s for vault, etcd and kube.
* Vault role refactoring, remove optional cert vault auth because not not used and worked. Create separate CA`s fro vault and etcd.
* Fixed different certificates set for vault cert_managment
* Update doc/vault.md
* Fixed condition create vault CA, wrong group
* Fixed missing etcd_cert_path mount for rkt deployment type. Distribute vault roles for all vault hosts
* Removed wrong when condition in create etcd role vault tasks.
* Updates Controller Manager/Kubelet with Flannel's required configuration for CNI
* Removes old Flannel installation
* Install CNI enabled Flannel DaemonSet/ConfigMap/CNI bins and config (with portmap plugin) on host
* Uses RBAC if enabled
* Fixed an issue that could occur if br_netfilter is not a module and net.bridge.bridge-nf-call-iptables sysctl was not set
* Adding yaml linter to ci check
* Minor linting fixes from yamllint
* Changing CI to install python pkgs from requirements.txt
- adding in a secondary requirements.txt for tests
- moving yamllint to tests requirements
If Kubernetes > 1.6 register standalone master nodes w/ a
node-role.kubernetes.io/master=:NoSchedule taint to allow
for more flexible scheduling rather than just marking unschedulable.
Change kubelet deploy mode to host
Enable cri and qos per cgroup for kubelet
Update CoreOS images
Add upgrade hook for switching from kubelet deployment from docker to host.
Bump machine type for ubuntu-rkt-sep
* Added custom ips to etcd vault distributed certificates
* Added custom ips to kube-master vault distributed certificates
* Added comment about issue_cert_copy_ca var in vault/issue_cert role file
* Generate kube-proxy, controller-manager and scheduler certificates by vault
* Revert "Disable vault from CI (#1546)"
This reverts commit 781f31d2b8.
* Fixed upgrade cluster with vault cert manager
* Remove vault dir in reset playbook
* Bump tag for upgrade CI, fix netchecker upgrade
netchecker-server was changed from pod to deployment, so
we need an upgrade hook for it.
CI now uses v2.1.1 as a basis for upgrade.
* Fix upgrades for certs from non-rbac to rbac
This does not address per-node certs and scheduler/proxy/controller-manager
component certs which are now required. This should be handled in a
follow-up patch.
Install roles under /usr/local/share/kubespray/roles,
playbooks - /usr/local/share/kubespray/playbooks/,
ansible.cfg and inventory group vars - into /etc/kubespray.
Ship README and an example inventory as the package docs.
Update the ansible.cfg to consume the roles from the given path,
including virtualenvs prefix, if defined.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Making fluentd.conf as configmap to change configuration.
Change elasticsearch rc to deployment.
Having installed previous elastaicsearch as rc, first should delete that.
* Make yum repos used for installing docker rpms configurable
* TasksMax is only supported in systemd version >= 226
* Change to systemd file should restart docker
Before restarting docker, instruct it to kill running
containers when it restarts.
Needs a second docker restart after we restore the original
behavior, otherwise the next time docker is restarted by
an operator, it will unexpectedly bring down all running
containers.
In atomic, containers are left running when docker is restarted.
When docker is restarted after the flannel config is put in place,
the docker0 interface isn't re-IPed because docker sees the running
containers and won't update the previous config.
This patch kills all the running containers after docker is stopped.
We can't simply `docker stop` the running containers, as they respawn
before we've got a chance to stop the docker daemon, so we need to
use runc to do this after dockerd is stopped.
Replace 'netcheck_tag' with 'netcheck_version' and add additional
'netcheck_server_tag' and 'netcheck_agent_tag' config options to
provide ability to use different tags for server and agent
containers.
When VPC is used, external DNS might not be available. This patch change
behavior to use metadata service instead of external DNS when
upstream_dns_servers is not specified.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
According to code apiserver, scheduler, controller-manager, proxy don't
use resolution of objects they created. It's not harmful to change
policy to have external resolver.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
$IPS only expands to the first ip address in the array:
justin@box:~$ declare -a IPS=(10.10.1.3 10.10.1.4 10.10.1.5)
justin@box:~$ echo $IPS
10.10.1.3
justin@box:~$ echo ${IPS[@]}
10.10.1.3 10.10.1.4 10.10.1.5
Pod opbject is not reschedulable by kubernetes. It means that if node
with netchecker-server goes down, netchecker-server won't be scheduled
somewhere. This commit changes the type of netchecker-server to
Deployment, so netchecker-server will be scheduled on other nodes in
case of failures.
In kubernetes 1.6 ClusterFirstWithHostNet was added as an option. In
accordance to it kubelet will generate resolv.conf based on own
resolv.conf. However, this doesn't create 'options', thus the proper
solution requires some investigation.
This patch sets the same resolv.conf for kubelet as host
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Clarify that the `kube_version` environment variable is needed for the CLI "graceful upgrade". Also add and example to check that the upgrade was successful.
ansible 2.2.2.0 has an [issue]() that causes problems for kargo:
```
(env) kargo ᐅ env/bin/ansible-playbook upgrade-cluster.yml
ERROR! Unexpected Exception: 'Host' object has no attribute 'remove_group'
```
Pinning ansible to 2.2.1.0 resolved this for me.
- Run docker run from script rather than directly from systemd target
- Refactoring styling/templates
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Non-brekable space is 0xc2 0xa0 byte sequence in UTF-8.
To find one:
$ git grep -I -P '\xc2\xa0'
To replace with regular space:
$ git grep -l -I -P '\xc2\xa0' | xargs sed -i 's/\xc2\xa0/ /g'
This commit doesn't include changes that will overlap with commit f1c59a91a1.
The docker-network environment file masks the new values
put into /etc/systemd/system/docker.service.d/flannel-options.conf
to renumber the docker0 to work correctly with flannel.
Optional Ansible playbook for preparing a host for running Kargo.
This includes creation of a user account, some basic packages,
and sysctl values required to allow CNI networking on a libvirt network.
etcd is crucial part of kubernetes cluster. Ansible restarts etcd on
reconfiguration. Backup helps operator to restore cluster manually in
case of any issues.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
1298.6.0 fixes some sporadic network issues. It also includes docker
1.12.6 which includes several stability fixes for kubernetes.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Ansible 2.2.1 requires jinja2<2.9, see <https://github.com/ansible/ansible/blob/v2.2.1.0-1/setup.py#L25>,
but without explicit limiting upper jinja2 version here pip ignores
Ansible requirements and installs latest available jinja2
(pip is not very smart here), which is incompatible with with
Ansible 2.2.1.
With incompatible jinja2 version "ansible-vault create" (and probably other parts)
fails with:
ERROR! Unexpected Exception: The 'jinja2<2.9' distribution was not found
and is required by ansible
This upper limit should be removed in 2.2.2 release, see:
<978311bf3f>
By default Calico CNI does not create any network access policies
or profiles if 'policy' is enabled in CNI config. And without any
policies/profiles network access to/from PODs is blocked.
K8s related policies are created by calico-policy-controller in
such case. So we need to start it as soon as possible, before any
real workloads.
This patch also fixes kube-api port in calico-policy-controller
yaml template.
Closes#1132
It is now possible to deactivate selected authentication methods
(basic auth, token auth) inside the cluster by adding
removing the required arguments to the Kube API Server and generating
the secrets accordingly.
The x509 authentification is currently not optional because disabling it
would affect the kubectl clients deployed on the master nodes.
Default backend is now etcd3 (was etcd2).
The migration process consists of the following steps:
* check if migration is necessary
* stop etcd on first etcd server
* run migration script
* start etcd on first etcd server
* stop kube-apiserver until configuration is updated
* update kube-apiserver
* purge old etcdv2 data
Issue #1125. Make RBAC authorization plugin work out of the box.
"When bootstrapping, superuser credentials should include the system:masters group, for example by creating a client cert with /O=system:masters. This gives those credentials full access to the API and allows an admin to then set up bindings for other users."
Rewrote AWS Terraform deployment for AWS Kargo. It supports now
multiple Availability Zones, AWS Loadbalancer for Kubernetes API,
Bastion Host, ...
For more information see README
To use OpenID Connect Authentication beside deploying an OpenID Connect
Identity Provider it is necesarry to pass additional arguments to the Kube API Server.
These required arguments were added to the kube apiserver manifest.
The AWS IAM profiles and policies required to run Kargo on AWS
are no longer hosted in the kubernetes main repo since kube-up got
deprecated. Hence we have to move the files into the kargo repository.
- Only have ubuntu to test on
- fedora and redhat are placeholders/guesses
- the "old" package repositories seem to have the "new" CE version which is `1.13.1` based
- `docker-ce` looks like it is named as a backported `docker-engine` package in some
places
- Did not change the `defaults` version anywhere, so should work as before
- Did not point to new package repositories, as existing ones have the new packages.
By default kubedns and dnsmasq scale when installed.
Dnsmasq is no longer a daemonset. It is now a deployment.
Kubedns is no longer a replicationcluster. It is now a deployment.
Minimum replicas is two (to enable rolling updates).
Reduced memory erquirements for dnsmasq and kubedns
Until now it was not possible to add an API Loadbalancer
without an static IP Address. But certain Loadbalancers
like AWS Elastic Loadbalanacer dontt have an fixed IP address.
With this commit it is possible to add these kind of Loadbalancers
to the Kargo deployment.
The default version of Docker was switched to 1.13 in #1059. This
change also bumped ubuntu from installing docker-engine 1.13.0 to
1.13.1. This PR updates os families which had 1.13 defined, but
were using 1.13.0.
The impetus for this change is an issue running tiller 1.2.3 on
docker 1.13.0. See discussion [1][2].
[1] https://github.com/kubernetes/helm/issues/1838
[2] https://github.com/kubernetes-incubator/kargo/pull/1100
Since inventory ships with kargo, the ability to change functionality
without having a dirty git index is nice. An example, we wish to change
is the version of docker deployed to our CentOS systems. Due to an issue
with tiller and docker 1.13, we wish to deploy docker 1.12. Since this
change does not belong in Kargo, we wish to locally override the docker
version, until the issue is sorted.
Updates based on feedback
Simplify checks for file exists
remove invalid char
Review feedback. Use regular systemd file.
Add template for docker systemd atomic
By default Calico blocks traffic from endpoints
to the host itself by using an iptables DROP
action. It could lead to a situation when service
has one alive endpoint, but pods which run on
the same node can not access it. Changed the action
to RETURN.
The Vagrantfile is setup to use flannel. The default network
was changed to Calico (#1031). However, the Vagrantfile was
not updated to reflect this. Ensuring the Vagrantfile remains
functional on master, until someone decides to make it work
with Calico.
Kubernetes project is about to set etcdv3 as default storage engine in
1.6. This patch allows to specify particular backend for
kube-apiserver. User may force the option to etcdv3 for new environment.
At the same time if the environment uses v2 it will continue uses it
until user decides to upgrade to v3.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Operator can specify any port for kube-api (6443 default) This helps in
case where some pods such as Ingress require 443 exclusively.
Closes: 820
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
When a apiserver_loadbalancer_domain_name is added to the Openssl.conf
the counter gets not increased correctly. This didnt seem to have an
effect at the current kargo version.
* Leave all.yml to keep only optional vars
* Store groups' specific vars by existing group names
* Fix optional vars casted as mandatory (add default())
* Fix missing defaults for an optional IP var
* Relink group_vars for terraform to reflect changes
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Sometimes, a sysadmin might outright delete the SELinux rpms and
delete the configuration. This causes the selinux module to fail
with
```
IOError: [Errno 2] No such file or directory: '/etc/selinux/config'\n",
"module_stdout": "", "msg": "MODULE FAILURE"}
```
This simply checks that /etc/selinux/config exists before we try
to set it Permissive.
Update from feedback
New deploy modes: scale, ha-scale, separate-scale
Creates 200 fake hosts for deployment with fake hostvars.
Useful for testing certificate generation and propagation to other
master nodes.
Updated test cases descriptions.
Migrate older inline= syntax to pure yml syntax for module args as to be consistant with most of the rest of the tasks
Cleanup some spacing in various files
Rename some files named yaml to yml for consistancy
Ansible playbook fails when tags are limited to "facts,etcd" or to
"facts". This patch allows to run ansible-playbook to gather facts only
that don't require calico/flannel/weave components to be verified. This
allows to run ansible with 'facts,bootstrap-os' or just 'facts' to
gether facts that don't require specific components.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Kubelet is responsible for creating symlinks from /var/lib/docker to /var/log
to make fluentd logging collector work.
However without using host's /var/log those links are invisible to fluentd.
This is done on rkt configuration too.
Based on #718 introduced by rsmitty.
Includes all roles and all options to support deployment of
new hosts in case they were added to inventory.
Main difference here is that master role is evaluated first
so that master components get upgraded first.
Fixes#694
- Starting from version 2.0 ansible has 'callback_whitelist =
profile_tasks'. It allows to analyze CI to find some time regressions.
- Add skippy to CI's ansible.cfg
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
- Refactor 'Check if bootstrap is needed' as ansible loop. This allows
to add new elements easily without refactoring. Add pip to the list.
- Refactor 'Install python 2.x' task to run once if any of rc
codes != 0. Actually, need_bootstrap is array of hashes, so map will
allow to get single array of rc statuses. So if status is not zero it
will be sorted and the last element will be get, converted to bool.
Closes: #961
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
"shell" step doesn't support check mode, which currently leads to failures,
when Ansible is being run in check mode (because Ansible doesn't run command,
assuming that command might have effect, and no "rc" or "output" is registered).
Setting "check_mode: no" allows to run those "shell" commands in check mode
(which is safe, because those shell commands doesn't have side effects).
always_run was deprecated in Ansible 2.2 and will be removed in 2.4
ansible logs contain "[DEPRECATION WARNING]: always_run is deprecated.
Use check_mode = no instead". This patch fix deprecation.
Since systemd kubelet.service has {{ ssl_ca_dirs }}, fact should be
gathered before writing kubelet.service.
Closes: #1007
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
- Exclude kubelet CPU/RAM (kube-reserved) from cgroup. It decreases a
chance of overcommitment
- Add a possibility to modify Kubelet node-status-update-frequency
- Add a posibility to configure node-monitor-grace-period,
node-monitor-period, pod-eviction-timeout for Kubernetes controller
manager
- Add Kubernetes Relaibility Documentation with recomendations for
various scenarios.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
kubelet lost the ability to load kernel modules. This
puts that back by adding the lib/modules mount to kubelet.
The new variable kubelet_load_modules can be set to true
to enable this item. It is OFF by default.
Daemonsets cannot be simply upgraded through a single API call,
regardless of any kubectl documentation. The resource must be
purged and then recreated in order to make any changes.
Also make no-resolv unconditional again. Otherwise, we may end up in
a resolver loop. The resolver loop was the cause for the piling up
parallel queries.
Reduce election timeout to 5000ms (was 10000ms)
Raise heartbeat interval to 250ms (was 100ms)
Remove etcd cpu share (was 300)
Make etcd_cpu_limit and etcd_memory_limit optional.
Netchecker is rewritten in Go lang with some new args instead of
env variables. Also netchecker-server no longer requires kubectl
container. Updating playbooks accordingly.
- Remove weave CPU limits from .gitlab-ci.yml. Closes: #975
- Fix weave version in documentation
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
- Docker 1.12 and further don't need nsenter hack. This patch removes
it. Also, it bumps the minimal version to 1.12.
Closes#776
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
- Set recommended CPU settings
- Cleans up upgrade to weave 1.82. The original WeaveWorks
daemonset definition uses weave-net name.
- Limit DS creation to master
- Combined 2 tasks into one with better condition
When DNSMasq is configured to read its settings
from a folder ('-7' or '--conf-dir' option) it only
checks that the directory exists and doesn't fail if
it's empty. It could lead to a situation when DNSMasq
is running and handles requests, but not properly
configured, so some of queries can't be resolved.
For consistancy with kubernetes services we should use the same
hostname for nodes, which is 'ansible_hostname'.
Also fixing missed 'kube-node' in templates, Calico is installed
on 'k8s-cluster' roles, not only 'kube-node'.
It removes the teal lines when a host is skipped for a task. This makes the output less spammy and much easier to read. Empty TASK blocks are still included in the output, but that's ok.
Calico-rr is broken for deployments with separate k8s-master and
k8s-node roles. In order to fix it we should peer k8s-cluster
nodes with calico-rr, not just k8s-node. The same for peering
with routers.
Closes#925
* Drop linux capabilities for unprivileged containerized
worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
components' static manifests, etcd, calico-rr and k8s apps,
like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
networking containers though).
* Grant the kube and netplug users read access for etcd certs via
the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
for control plane.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Also adds calico-rr group if there are standalone etcd nodes.
Now if there are 50 or more nodes, 3 etcd nodes will be standalone.
If there are 200 or more nodes, 2 kube-masters will be standalone.
If thresholds are exceeded, kube-node group cannot add nodes that
belong to etcd or kube-master groups (according to above statements).
ndots creates overhead as every pod creates 5 concurrent connections
that are forwarded to sky dns. Under some circumstances dnsmasq may
prevent forwarding traffic with "Maximum number of concurrent DNS
queries reached" in the logs.
This patch allows to configure the number of concurrent forwarded DNS
queries "dns-forward-max" as well as "cache-size" leaving the default
values as they were before.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
This change modifies 020_check-create-pod and 030_check-network test cases to
target `kube-master[0]` instead of `node1` as these tests can be useful in
deployments that do not use the same naming convention as the basic tests.
This change also modifies 020_check-create-pod to namespace into a `test`
namespace allowing the `get pods` command to get its expected number of
running containers.
Closes#866 and #867.
systemctl daemon-reload should be run before when task modifies/creates
union for etcd. Otherwise etcd won't be able to start
Closes#892
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
be run by limit on each node without regard for order.
The changes make sure that all of the directories needed to do
certificate management are on the master[0] or etcd[0] node regardless
of when the playbook gets run on each node. This allows for separate
ansible playbook runs in parallel that don't have to be synchronized.
the openssl tools will fail to create signing requests because
the CN is too long. This is mainly a problem when FQDNs are used
in the inventory file.
THis will truncate the hostname for the CN field only at the
first dot. This should handle the issue for most cases.
Also remove the check for != "RedHat" when removing the dhclient hook,
as this had also to be done on other distros. Instead, check if the
dhclienthookfile is defined.
the tasks fail because selinux prevents ip-forwarding setting.
Moving the tasks around addresses two issues. Makes sure that
the correct python tools are in place before adjusting of selinux
and makes sure that ipforwarding is toggled after selinux adjustments.
[docker](https://www.docker.com/) v1.13 (see note)<br>
[rkt](https://coreos.com/rkt/docs/latest/) v1.21.0 (see Note 2)<br>
Note: kubernetes doesn't support newer docker versions. Among other things kubelet currently breaks on docker's non-standard version numbering (it no longer uses semantic versioning). To ensure auto-updates don't break your cluster look into e.g. yum versionlock plugin or apt pin).
Note 2: rkt support as docker alternative is limited to control plane (etcd and
kubelet). Docker is still used for Kubernetes cluster workloads and network
plugins' related OS services. Also note, only one of the supported network
plugins can be deployed for a given single cluster.
@@ -67,16 +73,19 @@ plugins can be deployed for a given single cluster.
Requirements
--------------
* **Ansible v2.4 (or newer) and python-netaddr is installed on the machine
that will run Ansible commands**
***Jinja 2.9 (or newer) is required to run the Ansible Playbooks**
* The target servers must have **access to the Internet** in order to pull docker images.
* The target servers are configured to allow **IPv4 forwarding**.
***Your ssh key must be copied** to all the servers part of your inventory.
* The **firewalls are not managed**, you'll need to implement your own rules the way you used to.
in order to avoid any issue during deployment you should disable your firewall.
* The target servers are configured to allow **IPv4 forwarding**.
***Copy your ssh keys** to all the servers part of your inventory.
***Ansible v2.2 (or newer) and python-netaddr**
## Network plugins
You can choose between 4 network plugins. (default: `flannel` with vxlan backend)
You can choose between 4 network plugins. (default: `calico`, except Vagrant uses `flannel`)
@@ -32,8 +32,7 @@ Conduct may be permanently removed from the project team.
This code of conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
opening an issue or contacting one or more of the project maintainers.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting a Kubernetes maintainer, Sarah Novotny <sarahnovotny@google.com>, and/or Dan Kohn <dan@linuxfoundation.org>.
This Code of Conduct is adapted from the Contributor Covenant
(http://contributor-covenant.org), version 1.2.0, available at
@@ -53,7 +52,7 @@ The Kubernetes team does not condone any statements by speakers contrary to thes
team reserves the right to deny entrance and/or eject from an event (without refund) any individual found to
be engaging in discriminatory or offensive speech or actions.
Please bring any concerns to to the immediate attention of Kubernetes event staff
Please bring any concerns to the immediate attention of Kubernetes event staff.
# Deploying a Kargo Kubernetes Cluster with GlusterFS
# Deploying a Kubespray Kubernetes Cluster with GlusterFS
You can either deploy using Ansible on its own by supplying your own inventory file or by using Terraform to create the VMs and then providing a dynamic inventory to Ansible. The following two sections are self-contained, you don't need to go through one to use the other. So, if you want to provision with Terraform, you can skip the **Using an Ansible inventory** section, and if you want to provision with a pre-built ansible inventory, you can neglect the **Using Terraform and Ansible** section.
@@ -6,7 +6,7 @@ You can either deploy using Ansible on its own by supplying your own inventory f
In the same directory of this ReadMe file you should find a file named `inventory.example` which contains an example setup. Please note that, additionally to the Kubernetes nodes/masters, we define a set of machines for GlusterFS and we add them to the group `[gfs-cluster]`, which in turn is added to the larger `[network-storage]` group as a child group.
Change that file to reflect your local setup (adding more machines or removing them and setting the adequate ip numbers), and save it to `inventory/k8s_gfs_inventory`. Make sure that the settings on `inventory/group_vars/all.yml` make sense with your deployment. Then execute change to the kargo root folder, and execute (supposing that the machines are all using ubuntu):
Change that file to reflect your local setup (adding more machines or removing them and setting the adequate ip numbers), and save it to `inventory/k8s_gfs_inventory`. Make sure that the settings on `inventory/group_vars/all.yml` make sense with your deployment. Then execute change to the kubespray root folder, and execute (supposing that the machines are all using ubuntu):
First step is to fill in a `my-kargo-gluster-cluster.tfvars` file with the specification desired for your cluster. An example with all required variables would look like:
First step is to fill in a `my-kubespray-gluster-cluster.tfvars` file with the specification desired for your cluster. An example with all required variables would look like:
This will create both your Kubernetes and Gluster VMs. Make sure that the ansible file `contrib/terraform/openstack/group_vars/all.yml` includes any ansible variable that you want to setup (like, for instance, the type of machine for bootstrapping).
Then, provision your Kubernetes (Kargo) cluster with the following ansible call:
Then, provision your Kubernetes (kubespray) cluster with the following ansible call:
when:inventory_hostname == groups['kube-master'][0] and groups['gfs-cluster'] is defined and hostvars[groups['gfs-cluster'][0]].gluster_disk_size_gb is defined
* VPC with Public and Private Subnets in # Availability Zones
* Bastion Hosts and NAT Gateways in the Public Subnet
* A dynamic number of masters, etcd, and worker nodes in the Private Subnet
* even distributed over the # of Availability Zones
* AWS ELB in the Public Subnet for accessing the Kubernetes API from the internet
- A dynamic number of masters, etcd, and nodes can be created
- These scripts currently expect Private IP connectivity with the nodes that are created. This means that you may need a tunnel to your VPC or to run these scripts from a VM inside the VPC. Will be looking into how to work around this later.
**Requirements**
- Terraform 0.8.7 or newer
**How to Use:**
- Export the variables for your Amazon credentials:
- Export the variables for your AWS credentials or edit `credentials.tfvars`:
```
export AWS_ACCESS_KEY_ID="xxx"
export AWS_SECRET_ACCESS_KEY="yyy"
export AWS_ACCESS_KEY_ID="www"
export AWS_SECRET_ACCESS_KEY="xxx"
export AWS_SSH_KEY_NAME="yyy"
export AWS_DEFAULT_REGION="zzz"
```
- Rename `contrib/terraform/aws/terraform.tfvars.example` to `terraform.tfvars`
- Update `contrib/terraform/aws/terraform.tfvars` with your data
- Allocate a new AWS Elastic IP. Use this for your `loadbalancer_apiserver_address` value (below)
- Create an AWS EC2 SSH Key
- Run with `terraform apply --var-file="credentials.tfvars"` or `terraform apply` depending if you exported your AWS credentials
-Update contrib/terraform/aws/terraform.tfvars with your data
-Terraform automatically creates an Ansible Inventory file called `hosts` with the created infrastructure in the directory `inventory`
-Run with `terraform apply`
-Ansible will automatically generate an ssh config file for your bastion hosts. To make use of it, make sure you have a line in your `ansible.cfg` file that looks like the following:
- Update the inventory creation file to be something a little more reasonable. It's just a local-exec from Terraform now, using terraform.py or something may make sense in the future.
**Troubleshooting**
***Remaining AWS IAM Instance Profile***:
If the cluster was destroyed without using Terraform it is possible that
the AWS IAM Instance Profiles still remain. To delete them you can use
the `AWS CLI` with the following command:
```
aws iam delete-instance-profile --region <region_name> --instance-profile-name <profile_name>
```
***Ansible Inventory doesnt get created:***
It could happen that Terraform doesnt create an Ansible Inventory file automatically. If this is the case copy the output after `inventory=` and create a file named `hosts`in the directory `inventory` and paste the inventory into the file.
**Architecture**
Pictured is an AWS Infrastructure created with this Terraform project distributed over two Availability Zones.

There are some assumptions made to try and ensure it will work on your openstack cluster.
* floating-ips are used for access, but you can have masters and nodes that don't use floating-ips if needed. You need currently at least 1 floating ip, which we would suggest is used on a master.
* floating-ips are used for access, but you can have masters and nodes that don't use floating-ips if needed. You need currently at least 1 floating ip, which needs to be used on a master. If using more than one, at least one should be on a master for bastions to work fine.
* you already have a suitable OS image in glance
* you already have both an internal network and a floating-ip pool created
* you have security-groups enabled
@@ -30,12 +30,14 @@ requirements.
#### OpenStack
Ensure your OpenStack credentials are loaded in environment variables. This can be done by downloading a credentials .rc file from your OpenStack dashboard and sourcing it:
Ensure your OpenStack**Identity v2** credentials are loaded in environment variables. This can be done by downloading a credentials .rc file from your OpenStack dashboard and sourcing it:
```
$ source ~/.stackrc
```
> You must set **OS_REGION_NAME** and **OS_TENANT_ID** environment variables not required by openstack CLI
You will need two networks before installing, an internal network and
an external (floating IP Pool) network. The internet network can be shared as
we use security groups to provide network segregation. Due to the many
If you want to provision master or node VMs that don't use floating ips, write on a `my-terraform-vars.tfvars` file, for example:
##### Alternative: etcd inside masters
If you want to provision master or node VMs that don't use floating ips and where etcd is inside masters, write on a `my-terraform-vars.tfvars` file, for example:
```
number_of_k8s_masters = "1"
@@ -83,10 +87,32 @@ number_of_k8s_nodes = "0"
```
This will provision one VM as master using a floating ip, two additional masters using no floating ips (these will only have private ips inside your tenancy) and one VM as node, again without a floating ip.
##### Alternative: etcd on separate machines
If you want to provision master or node VMs that don't use floating ips and where **etcd is on separate nodes from Kubernetes masters**, write on a `my-terraform-vars.tfvars` file, for example:
This will provision one VM as master using a floating ip, two additional masters using no floating ips (these will only have private ips inside your tenancy), two VMs as nodes with floating ips, one VM as node without floating ip and three VMs for etcd.
##### Alternative: add GlusterFS
Additionally, now the terraform based installation supports provisioning of a GlusterFS shared file system based on a separate set of VMs, running either a Debian or RedHat based set of VMs. To enable this, you need to add to your `my-terraform-vars.tfvars` the following variables:
```
# Flavour depends on your openstack installation, you can get available flavours through `nova list-flavors`
# Flavour depends on your openstack installation, you can get available flavours through `nova flavor-list`
# This is the name of an image already available in your openstack installation.
image_gfs = "Ubuntu 15.10"
@@ -99,6 +125,48 @@ ssh_user_gfs = "ubuntu"
If these variables are provided, this will give rise to a new ansible group called `gfs-cluster`, for which we have added ansible roles to execute in the ansible provisioning step. If you are using Container Linux by CoreOS, these GlusterFS VM necessarily need to be either Debian or RedHat based VMs, Container Linux by CoreOS cannot serve GlusterFS, but can connect to it through binaries available on hyperkube v1.4.3_coreos.0 or higher.
GlusterFS is not deployed by the standard `cluster.yml` playbook, see the [glusterfs playbook documentation](../../network-storage/glusterfs/README.md) for instructions.
# Configure Cluster variables
Edit `inventory/group_vars/all.yml`:
- Set variable **bootstrap_os** according selected image
To deploy kubespray on [AWS](https://aws.amazon.com/) uncomment the `cloud_provider` option in `group_vars/all.yml` and set it to `'aws'`.
Prior to creating your instances, you **must** ensure that you have created IAM roles and policies for both "kubernetes-master" and "kubernetes-node". You can find the IAM policies [here](https://github.com/kubernetes/kubernetes/tree/master/cluster/aws/templates/iam). See the [IAM Documentation](https://aws.amazon.com/documentation/iam/) if guidance is needed on how to set these up. When you bring your instances online, associate them with the respective IAM role. Nodes that are only to be used for Etcd do not need a role.
Prior to creating your instances, you **must** ensure that you have created IAM roles and policies for both "kubernetes-master" and "kubernetes-node". You can find the IAM policies [here](https://github.com/kubernetes-incubator/kubespray/tree/master/contrib/aws_iam/). See the [IAM Documentation](https://aws.amazon.com/documentation/iam/) if guidance is needed on how to set these up. When you bring your instances online, associate them with the respective IAM role. Nodes that are only to be used for Etcd do not need a role.
You would also need to tag the resources in your VPC accordingly for the aws provider to utilize them. Tag the subnets and all instances that kubernetes will be run on with key `kuberentes.io/cluster/$cluster_name` (`$cluster_name` must be a unique identifier for the cluster). Tag the subnets that must be targetted by external ELBs with the key `kubernetes.io/role/elb` and internal ELBs with the key `kubernetes.io/role/internal-elb`.
Make sure your VPC has both DNS Hostnames support and Private DNS enabled.
The next step is to make sure the hostnames in your `inventory` file are identical to your internal hostnames in AWS. This may look something like `ip-111-222-333-444.us-west-2.compute.internal`. You can then specify how Ansible connects to these instances with `ansible_ssh_host` and `ansible_ssh_user`.
You can now create your cluster!
### Dynamic Inventory ###
There is also a dynamic inventory script for AWS that can be used if desired. However, be aware that it makes some certain assumptions about how you'll create your inventory. It also does not handle all use cases and groups that we may use as part of more advanced deployments. Additions welcome.
This will produce an inventory that is passed into Ansible that looks like the following:
```
{
"_meta": {
"hostvars": {
"ip-172-31-3-xxx.us-east-2.compute.internal": {
"ansible_ssh_host": "172.31.3.xxx"
},
"ip-172-31-8-xxx.us-east-2.compute.internal": {
"ansible_ssh_host": "172.31.8.xxx"
}
}
},
"etcd": [
"ip-172-31-3-xxx.us-east-2.compute.internal"
],
"k8s-cluster": {
"children": [
"kube-master",
"kube-node"
]
},
"kube-master": [
"ip-172-31-3-xxx.us-east-2.compute.internal"
],
"kube-node": [
"ip-172-31-8-xxx.us-east-2.compute.internal"
]
}
```
Guide:
- Create instances in AWS as needed.
- Either during or after creation, add tags to the instances with a key of `kubespray-role` and a value of `kube-master`, `etcd`, or `kube-node`. You can also share roles like `kube-master, etcd`
- Copy the `kubespray-aws-inventory.py` script from `kubespray/contrib/aws_inventory` to the `kubespray/inventory` directory.
- Set the following AWS credentials and info as environment variables in your terminal:
```
export AWS_ACCESS_KEY_ID="xxxxx"
export AWS_SECRET_ACCESS_KEY="yyyyy"
export REGION="us-east-2"
```
- We will now create our cluster. There will be either one or two small changes. The first is that we will specify `-i inventory/kubespray-aws-inventory.py` as our inventory script. The other is conditional. If your AWS instances are public facing, you can set the `VPC_VISIBILITY` variable to `public` and that will result in public IP and DNS names being passed into the inventory. This causes your cluster.yml command to look like `VPC_VISIBILITY="public" ansible-playbook ... cluster.yml`
To deploy kubespray on [Azure](https://azure.microsoft.com) uncomment the `cloud_provider` option in `group_vars/all.yml` and set it to `'azure'`.
To deploy Kubernetes on [Azure](https://azure.microsoft.com) uncomment the `cloud_provider` option in `group_vars/all.yml` and set it to `'azure'`.
All your instances are required to run in a resource group and a routing table has to be attached to the subnet your instances are in.
@@ -49,8 +49,8 @@ This is the AppId from the last command
- Create the role assignment with:
`azure role assignment create --spn http://kubernetes -o "Owner" -c /subscriptions/SUBSCRIPTION_ID`
azure\_aad\_client\_id musst be set to the AppId, azure\_aad\_client\_secret is your choosen secret.
azure\_aad\_client\_id must be set to the AppId, azure\_aad\_client\_secret is your choosen secret.
## Provisioning Azure with Resource Group Templates
You'll find Resource Group Templates and scripts to provision the required infrastructore to Azure in [*contrib/azurerm*](../contrib/azurerm/README.md)
You'll find Resource Group Templates and scripts to provision the required infrastructure to Azure in [*contrib/azurerm*](../contrib/azurerm/README.md)
##### Optional : Define default endpoint to host action
By default Calico blocks traffic from endpoints to the host itself by using an iptables DROP action. When using it in kubernetes the action has to be changed to RETURN (default in kubespray) or ACCEPT (see https://github.com/projectcalico/felix/issues/660 and https://github.com/projectcalico/calicoctl/issues/1389). Otherwise all network packets from pods (with hostNetwork=False) to services endpoints (with hostNetwork=True) withing the same node are dropped.
To re-define default action please set the following variable in your inventory:
```
calico_endpoint_to_host_action: "ACCEPT"
```
Cloud providers configuration
=============================
Please refer to the official documentation, for example [GCE configuration](http://docs.projectcalico.org/v1.5/getting-started/docker/installation/gce) requires a security rule for calico ip-ip tunnels. Note, calico is always configured with ``ipip: true`` if the cloud provider was defined.
- You should set the bootstrap_os variable to `coreos`
-Ensure that the bin_dir is set to `/opt/bin`
- ansible_python_interpreter should be `/opt/bin/python`. This will be laid down by the bootstrap task.
- The default resolvconf_mode setting of `docker_dns`**does not** work for CoreOS. This is because we do not edit the systemd service file for docker on CoreOS nodes. Instead, just use the `host_resolvconf` mode. It should work out of the box.
Then you can proceed to [cluster deployment](#run-deployment)
See more details in the [ansible guide](ansible.md).
Adding nodes
------------
You may want to add **worker** nodes to your existing cluster. This can be done by re-running the `cluster.yml` playbook, or you can target the bare minimum needed to get kubelet installed on the worker and talking to your masters. This is especially helpful when doing something like autoscaling your clusters.
- Add the new worker node to your inventory under kube-node (or utilize a [dynamic inventory](https://docs.ansible.com/ansible/intro_dynamic_inventory.html)).
- Run the ansible-playbook command, substituting `scale.yml` for `cluster.yml`:
6. Create a new branch which you will use in your working environment:
```git checkout -b work```
***Never*** use master branch of your repository for your commits.
7. Modify path to library and roles in your ansible.cfg file (role naming should be uniq, you may have to rename your existent roles if they have same names as kubespray project):
```
...
library = 3d/kubespray/library/
roles_path = 3d/kubespray/roles/
...
```
8. Copy and modify configs from kubespray `group_vars` folder to corresponging `group_vars` folder in your existent project.
You could rename *all.yml* config to something else, i.e. *kubespray.yml* and create corresponding group in your inventory file, which will include all hosts groups related to kubernetes setup.
9. Modify your ansible inventory file by adding mapping of your existent groups (if any) to kubespray naming.
For example:
```
...
#Kargo groups:
[kube-node:children]
kubenode
[k8s-cluster:children]
kubernetes
[etcd:children]
kubemaster
kubemaster-ha
[kube-master:children]
kubemaster
kubemaster-ha
[vault:children]
kube-master
[kubespray:children]
kubernetes
```
* Last entry here needed to apply kubespray.yml config file, renamed from all.yml of kubespray project.
10. Now you can include kargo tasks in you existent playbooks by including cluster.yml file:
```
- name: Include kargo tasks
include: 3d/kubespray/cluster.yml
```
Or your could copy separate tasks from cluster.yml into your ansible repository.
11. Commit changes to your ansible repo. Keep in mind, that submodule folder is just a link to the git commit hash of your forked repo.
When you update your "work" branch you need to commit changes to ansible repo as well.
Other members of your team should use ```git submodule sync```, ```git submodule update --init``` to get actual code from submodule.
# Contributing
If you made useful changes or fixed a bug in existent kubespray repo, use this flow for PRs to original kubespray repo.
0. Sign the [CNCF CLA](https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ).
1. Change working directory to git submodule directory (3d/kubespray).
2. Setup desired user.name and user.email for submodule.
If kubespray is only one submodule in your repo you could use something like:
4. Create new branch for the specific fixes that you want to contribute:
```git checkout -b fixes-name-date-index```
Branch name should be self explaining to you, adding date and/or index will help you to track/delete your old PRs.
5. Find git hash of your commit in "work" repo and apply it to newly created "fix" repo:
```
git cherry-pick <COMMIT_HASH>
```
6. If your have several temporary-stage commits - squash them using [```git rebase -i```](http://eli.thegreenplace.net/2014/02/19/squashing-github-pull-requests-into-a-single-commit)
Also you could use interactive rebase (```git rebase -i HEAD~10```) to delete commits which you don't want to contribute into original repo.
7. When your changes is in place, you need to check upstream repo one more time because it could be changed during your work.
Check that you're on correct branch:
```git status```
And pull changes from upstream (if any):
```git pull --rebase upstream master```
8. Now push your changes to your **fork** repo with ```git push```. If your branch doesn't exists on github, git will propose you to use something like ```git push --set-upstream origin fixes-name-date-index```.
9. Open you forked repo in browser, on the main page you will see proposition to create pull request for your newly created branch. Check proposed diff of your PR. If something is wrong you could safely delete "fix" branch on github using ```git push origin --delete fixes-name-date-index```, ```git branch -D fixes-name-date-index``` and start whole process from the beginning.
If everything is fine - add description about your changes (what they do and why they're needed) and confirm pull request creation.
-[ ] Terraform to provision instances on **GCE, AWS, Openstack, Digital Ocean, Azure**
-[ ] On AWS autoscaling, multi AZ
-[ ] On Azure autoscaling, create loadbalancer [#297](https://github.com/kubespray/kubespray/issues/297)
-[ ] On GCE be able to create a loadbalancer automatically (IAM ?) [#280](https://github.com/kubespray/kubespray/issues/280)
-[x]**TLS boostrap** support for kubelet (covered by kubeadm, but not in standard deployment) [#234](https://github.com/kubespray/kubespray/issues/234)
As mentioned above, components are upgraded in the order in which they were
installed in the Ansible playbook. The order of component installation is as
follows:
# Docker
# etcd
# kubelet and kube-proxy
# network_plugin (such as Calico or Weave)
# kube-apiserver, kube-scheduler, and kube-controller-manager
# Add-ons (such as KubeDNS)
* Docker
* etcd
* kubelet and kube-proxy
* network_plugin (such as Calico or Weave)
* kube-apiserver, kube-scheduler, and kube-controller-manager
* Add-ons (such as KubeDNS)
#### Upgrade considerations
Kubespray supports rotating certificates used for etcd and Kubernetes
components, but some manual steps may be required. If you have a pod that
requires use of a service token and is deployed in a namespace other than
`kube-system`, you will need to manually delete the affected pods after
rotating certificates. This is because all service account tokens are dependent
on the apiserver token that is used to generate them. When the certificate
rotates, all service account tokens must be rotated as well. During the
kubernetes-apps/rotate_tokens role, only pods in kube-system are destroyed and
recreated. All other invalidated service account tokens are cleaned up
automatically, but other pods are not deleted out of an abundance of caution
for impact to user deployed pods.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.