Upcloud: Add possibility to setup cluster using nodes with no public IPs (#11696)

* terraform upcloud: Added possibility to set up nodes with only private IPs

* terraform upcloud: add support for gateway in private zone

* terraform upcloud: split LB proxy protocol config per backend

* terraform upcloud: fix flexible plans

* terraform upcloud: Removed overview of cluster setup

---------

Co-authored-by: davidumea <david.andersson@elastisys.com>
This commit is contained in:
Fredrik Liv
2025-04-01 16:58:42 +02:00
committed by GitHub
parent fe2ab898b8
commit 6f74ef17f7
9 changed files with 296 additions and 119 deletions

View File

@@ -2,35 +2,6 @@
Provision a Kubernetes cluster on [UpCloud](https://upcloud.com/) using Terraform and Kubespray
## Overview
The setup looks like following
```text
Kubernetes cluster
+--------------------------+
| +--------------+ |
| | +--------------+ |
| --> | | | |
| | | Master/etcd | |
| | | node(s) | |
| +-+ | |
| +--------------+ |
| ^ |
| | |
| v |
| +--------------+ |
| | +--------------+ |
| --> | | | |
| | | Worker | |
| | | node(s) | |
| +-+ | |
| +--------------+ |
+--------------------------+
```
The nodes uses a private network for node to node communication and a public interface for all external communication.
## Requirements
* Terraform 0.13.0 or newer
@@ -100,6 +71,8 @@ terraform destroy --var-file cluster-settings.tfvars \
* `template_name`: The name or UUID of a base image
* `username`: a user to access the nodes, defaults to "ubuntu"
* `private_network_cidr`: CIDR to use for the private network, defaults to "172.16.0.0/24"
* `dns_servers`: DNS servers that will be used by the nodes. Until [this is solved](https://github.com/UpCloudLtd/terraform-provider-upcloud/issues/562) this is done using user_data to reconfigure resolved. Defaults to `[]`
* `use_public_ips`: If a NIC connencted to the Public network should be attached to all nodes by default. Can be overridden by `force_public_ip` if this is set to `false`. Defaults to `true`
* `ssh_public_keys`: List of public SSH keys to install on all machines
* `zone`: The zone where to run the cluster
* `machines`: Machines to provision. Key of this object will be used as the name of the machine
@@ -108,6 +81,8 @@ terraform destroy --var-file cluster-settings.tfvars \
* `cpu`: number of cpu cores
* `mem`: memory size in MB
* `disk_size`: The size of the storage in GB
* `force_public_ip`: If `use_public_ips` is set to `false`, this forces a public NIC onto the machine anyway when set to `true`. Useful if you're migrating from public nodes to only private. Defaults to `false`
* `dns_servers`: This works the same way as the global `dns_severs` but only applies to a single node. If set to `[]` while the global `dns_servers` is set to something else, then it will not add the user_data and thus will not be recreated. Useful if you're migrating from public nodes to only private. Defaults to `null`
* `additional_disks`: Additional disks to attach to the node.
* `size`: The size of the additional disk in GB
* `tier`: The tier of disk to use (`maxiops` is the only one you can choose atm)
@@ -139,6 +114,7 @@ terraform destroy --var-file cluster-settings.tfvars \
* `port`: Port to load balance.
* `target_port`: Port to the backend servers.
* `backend_servers`: List of servers that traffic to the port should be forwarded to.
* `proxy_protocol`: If the loadbalancer should set up the backend using proxy protocol.
* `router_enable`: If a router should be connected to the private network or not
* `gateways`: Gateways that should be connected to the router, requires router_enable is set to true
* `features`: List of features for the gateway
@@ -171,3 +147,27 @@ terraform destroy --var-file cluster-settings.tfvars \
* `server_groups`: Group servers together
* `servers`: The servers that should be included in the group.
* `anti_affinity_policy`: Defines if a server group is an anti-affinity group. Setting this to "strict" or yes" will result in all servers in the group being placed on separate compute hosts. The value can be "strict", "yes" or "no". "strict" refers to strict policy doesn't allow servers in the same server group to be on the same host. "yes" refers to best-effort policy and tries to put servers on different hosts, but this is not guaranteed.
## Migration
When `null_resource.inventories` and `data.template_file.inventory` was changed to `local_file.inventory` the old state file needs to be cleaned of the old state.
The error messages you'll see if you encounter this is:
```text
Error: failed to read schema for null_resource.inventories in registry.terraform.io/hashicorp/null: failed to instantiate provider "registry.terraform.io/hashicorp/null" to obtain schema: unavailable provider "registry.terraform.io/hashicorp/null"
Error: failed to read schema for data.template_file.inventory in registry.terraform.io/hashicorp/template: failed to instantiate provider "registry.terraform.io/hashicorp/template" to obtain schema: unavailable provider "registry.terraform.io/hashicorp/template"
```
This can be fixed with the following lines
```bash
terraform state rm -state=terraform.tfstate null_resource.inventories
terraform state rm -state=terraform.tfstate data.template_file.inventory
```
### Public to Private only migration
Since there's no way to remove the public NIC on a machine without recreating its private NIC it's not possible to inplace change a cluster to only use private IPs.
The way to migrate is to first set `use_public_ips` to `false`, `dns_servers` to some DNS servers and then update all existing servers to have `force_public_ip` set to `true` and `dns_severs` set to `[]`.
After that you can add new nodes without `force_public_ip` and `dns_servers` set and create them.
Add the new nodes into the cluster and when all of them are added, remove the old nodes.

View File

@@ -122,11 +122,11 @@ k8s_allowed_remote_ips = [
master_allowed_ports = []
worker_allowed_ports = []
loadbalancer_enabled = false
loadbalancer_plan = "development"
loadbalancer_proxy_protocol = false
loadbalancer_enabled = false
loadbalancer_plan = "development"
loadbalancers = {
# "http" : {
# "proxy_protocol" : false
# "port" : 80,
# "target_port" : 80,
# "backend_servers" : [

View File

@@ -20,24 +20,26 @@ module "kubernetes" {
username = var.username
private_network_cidr = var.private_network_cidr
dns_servers = var.dns_servers
use_public_ips = var.use_public_ips
machines = var.machines
ssh_public_keys = var.ssh_public_keys
firewall_enabled = var.firewall_enabled
firewall_default_deny_in = var.firewall_default_deny_in
firewall_default_deny_out = var.firewall_default_deny_out
master_allowed_remote_ips = var.master_allowed_remote_ips
k8s_allowed_remote_ips = var.k8s_allowed_remote_ips
master_allowed_ports = var.master_allowed_ports
worker_allowed_ports = var.worker_allowed_ports
firewall_enabled = var.firewall_enabled
firewall_default_deny_in = var.firewall_default_deny_in
firewall_default_deny_out = var.firewall_default_deny_out
master_allowed_remote_ips = var.master_allowed_remote_ips
k8s_allowed_remote_ips = var.k8s_allowed_remote_ips
bastion_allowed_remote_ips = var.bastion_allowed_remote_ips
master_allowed_ports = var.master_allowed_ports
worker_allowed_ports = var.worker_allowed_ports
loadbalancer_enabled = var.loadbalancer_enabled
loadbalancer_plan = var.loadbalancer_plan
loadbalancer_outbound_proxy_protocol = var.loadbalancer_proxy_protocol ? "v2" : ""
loadbalancer_legacy_network = var.loadbalancer_legacy_network
loadbalancers = var.loadbalancers
loadbalancer_enabled = var.loadbalancer_enabled
loadbalancer_plan = var.loadbalancer_plan
loadbalancer_legacy_network = var.loadbalancer_legacy_network
loadbalancers = var.loadbalancers
router_enable = var.router_enable
gateways = var.gateways
@@ -52,32 +54,12 @@ module "kubernetes" {
# Generate ansible inventory
#
data "template_file" "inventory" {
template = file("${path.module}/templates/inventory.tpl")
vars = {
connection_strings_master = join("\n", formatlist("%s ansible_user=ubuntu ansible_host=%s ip=%s etcd_member_name=etcd%d",
keys(module.kubernetes.master_ip),
values(module.kubernetes.master_ip).*.public_ip,
values(module.kubernetes.master_ip).*.private_ip,
range(1, length(module.kubernetes.master_ip) + 1)))
connection_strings_worker = join("\n", formatlist("%s ansible_user=ubuntu ansible_host=%s ip=%s",
keys(module.kubernetes.worker_ip),
values(module.kubernetes.worker_ip).*.public_ip,
values(module.kubernetes.worker_ip).*.private_ip))
list_master = join("\n", formatlist("%s",
keys(module.kubernetes.master_ip)))
list_worker = join("\n", formatlist("%s",
keys(module.kubernetes.worker_ip)))
}
}
resource "null_resource" "inventories" {
provisioner "local-exec" {
command = "echo '${data.template_file.inventory.rendered}' > ${var.inventory_file}"
}
triggers = {
template = data.template_file.inventory.rendered
}
resource "local_file" "inventory" {
content = templatefile("${path.module}/templates/inventory.tpl", {
master_ip = module.kubernetes.master_ip
worker_ip = module.kubernetes.worker_ip
bastion_ip = module.kubernetes.bastion_ip
username = var.username
})
filename = var.inventory_file
}

View File

@@ -53,6 +53,44 @@ locals {
# If prefix is set, all resources will be prefixed with "${var.prefix}-"
# Else don't prefix with anything
resource-prefix = "%{if var.prefix != ""}${var.prefix}-%{endif}"
master_ip = {
for instance in upcloud_server.master :
instance.hostname => {
for nic in instance.network_interface :
nic.type => nic.ip_address
if nic.ip_address != null
}
}
worker_ip = {
for instance in upcloud_server.worker :
instance.hostname => {
for nic in instance.network_interface :
nic.type => nic.ip_address
if nic.ip_address != null
}
}
bastion_ip = {
for instance in upcloud_server.bastion :
instance.hostname => {
for nic in instance.network_interface :
nic.type => nic.ip_address
if nic.ip_address != null
}
}
node_user_data = {
for name, machine in var.machines :
name => <<EOF
%{ if ( length(machine.dns_servers != null ? machine.dns_servers : [] ) > 0 ) || ( length(var.dns_servers) > 0 && machine.dns_servers == null ) ~}
#!/bin/bash
echo -e "[Resolve]\nDNS=${ join(" ", length(machine.dns_servers != null ? machine.dns_servers : []) > 0 ? machine.dns_servers : var.dns_servers) }" > /etc/systemd/resolved.conf
systemctl restart systemd-resolved
%{ endif ~}
EOF
}
}
resource "upcloud_network" "private" {
@@ -62,6 +100,9 @@ resource "upcloud_network" "private" {
ip_network {
address = var.private_network_cidr
dhcp_default_route = var.router_enable
# TODO: When support for dhcp_dns for private networks are in, remove the user_data and enable it here.
# See more here https://github.com/UpCloudLtd/terraform-provider-upcloud/issues/562
# dhcp_dns = length(var.private_network_dns) > 0 ? var.private_network_dns : null
dhcp = true
family = "IPv4"
}
@@ -89,8 +130,8 @@ resource "upcloud_server" "master" {
hostname = "${local.resource-prefix}${each.key}"
plan = each.value.plan
cpu = each.value.plan == null ? null : each.value.cpu
mem = each.value.plan == null ? null : each.value.mem
cpu = each.value.cpu
mem = each.value.mem
zone = var.zone
server_group = each.value.server_group == null ? null : upcloud_server_group.server_groups[each.value.server_group].id
@@ -99,9 +140,12 @@ resource "upcloud_server" "master" {
size = each.value.disk_size
}
# Public network interface
network_interface {
type = "public"
dynamic "network_interface" {
for_each = each.value.force_public_ip || var.use_public_ips ? [1] : []
content {
type = "public"
}
}
# Private network interface
@@ -136,6 +180,9 @@ resource "upcloud_server" "master" {
keys = var.ssh_public_keys
create_password = false
}
metadata = local.node_user_data[each.key] != "" ? true : null
user_data = local.node_user_data[each.key] != "" ? local.node_user_data[each.key] : null
}
resource "upcloud_server" "worker" {
@@ -147,8 +194,8 @@ resource "upcloud_server" "worker" {
hostname = "${local.resource-prefix}${each.key}"
plan = each.value.plan
cpu = each.value.plan == null ? null : each.value.cpu
mem = each.value.plan == null ? null : each.value.mem
cpu = each.value.cpu
mem = each.value.mem
zone = var.zone
server_group = each.value.server_group == null ? null : upcloud_server_group.server_groups[each.value.server_group].id
@@ -158,9 +205,12 @@ resource "upcloud_server" "worker" {
size = each.value.disk_size
}
# Public network interface
network_interface {
type = "public"
dynamic "network_interface" {
for_each = each.value.force_public_ip || var.use_public_ips ? [1] : []
content {
type = "public"
}
}
# Private network interface
@@ -195,6 +245,63 @@ resource "upcloud_server" "worker" {
keys = var.ssh_public_keys
create_password = false
}
metadata = local.node_user_data[each.key] != "" ? true : null
user_data = local.node_user_data[each.key] != "" ? local.node_user_data[each.key] : null
}
resource "upcloud_server" "bastion" {
for_each = {
for name, machine in var.machines :
name => machine
if machine.node_type == "bastion"
}
hostname = "${local.resource-prefix}${each.key}"
plan = each.value.plan
cpu = each.value.cpu
mem = each.value.mem
zone = var.zone
server_group = each.value.server_group == null ? null : upcloud_server_group.server_groups[each.value.server_group].id
template {
storage = var.template_name
size = each.value.disk_size
}
# Private network interface
network_interface {
type = "private"
network = upcloud_network.private.id
}
# Private network interface
network_interface {
type = "public"
}
firewall = var.firewall_enabled
dynamic "storage_devices" {
for_each = {
for disk_key_name, disk in upcloud_storage.additional_disks :
disk_key_name => disk
# Only add the disk if it matches the node name in the start of its name
if length(regexall("^${each.key}_.+", disk_key_name)) > 0
}
content {
storage = storage_devices.value.id
}
}
# Include at least one public SSH key
login {
user = var.username
keys = var.ssh_public_keys
create_password = false
}
}
resource "upcloud_firewall_rules" "master" {
@@ -543,6 +650,53 @@ resource "upcloud_firewall_rules" "k8s" {
}
}
resource "upcloud_firewall_rules" "bastion" {
for_each = upcloud_server.bastion
server_id = each.value.id
dynamic "firewall_rule" {
for_each = var.bastion_allowed_remote_ips
content {
action = "accept"
comment = "Allow bastion SSH access from this network"
destination_port_end = "22"
destination_port_start = "22"
direction = "in"
family = "IPv4"
protocol = "tcp"
source_address_end = firewall_rule.value.end_address
source_address_start = firewall_rule.value.start_address
}
}
dynamic "firewall_rule" {
for_each = length(var.bastion_allowed_remote_ips) > 0 ? [1] : []
content {
action = "drop"
comment = "Drop bastion SSH access from other networks"
destination_port_end = "22"
destination_port_start = "22"
direction = "in"
family = "IPv4"
protocol = "tcp"
source_address_end = "255.255.255.255"
source_address_start = "0.0.0.0"
}
}
firewall_rule {
action = var.firewall_default_deny_in ? "drop" : "accept"
direction = "in"
}
firewall_rule {
action = var.firewall_default_deny_out ? "drop" : "accept"
direction = "out"
}
}
resource "upcloud_loadbalancer" "lb" {
count = var.loadbalancer_enabled ? 1 : 0
configured_status = "started"
@@ -583,7 +737,7 @@ resource "upcloud_loadbalancer_backend" "lb_backend" {
loadbalancer = upcloud_loadbalancer.lb[0].id
name = "lb-backend-${each.key}"
properties {
outbound_proxy_protocol = var.loadbalancer_outbound_proxy_protocol
outbound_proxy_protocol = each.value.proxy_protocol ? "v2" : ""
}
}
@@ -622,7 +776,7 @@ resource "upcloud_loadbalancer_static_backend_member" "lb_backend_member" {
backend = upcloud_loadbalancer_backend.lb_backend[each.value.lb_name].id
name = "${local.resource-prefix}${each.key}"
ip = merge(upcloud_server.master, upcloud_server.worker)[each.value.server_name].network_interface[1].ip_address
ip = merge(local.master_ip, local.worker_ip)["${local.resource-prefix}${each.value.server_name}"].private
port = each.value.port
weight = 100
max_sessions = var.loadbalancer_plan == "production-small" ? 50000 : 1000
@@ -662,7 +816,7 @@ resource "upcloud_router" "router" {
resource "upcloud_gateway" "gateway" {
for_each = var.router_enable ? var.gateways : {}
name = "${local.resource-prefix}${each.key}-gateway"
zone = var.zone
zone = var.private_cloud ? var.public_zone : var.zone
features = each.value.features
plan = each.value.plan

View File

@@ -1,22 +1,13 @@
output "master_ip" {
value = {
for instance in upcloud_server.master :
instance.hostname => {
"public_ip" : instance.network_interface[0].ip_address
"private_ip" : instance.network_interface[1].ip_address
}
}
value = local.master_ip
}
output "worker_ip" {
value = {
for instance in upcloud_server.worker :
instance.hostname => {
"public_ip" : instance.network_interface[0].ip_address
"private_ip" : instance.network_interface[1].ip_address
}
}
value = local.worker_ip
}
output "bastion_ip" {
value = local.bastion_ip
}
output "loadbalancer_domain" {

View File

@@ -20,15 +20,21 @@ variable "username" {}
variable "private_network_cidr" {}
variable "dns_servers" {}
variable "use_public_ips" {}
variable "machines" {
description = "Cluster machines"
type = map(object({
node_type = string
plan = string
cpu = string
mem = string
cpu = optional(number)
mem = optional(number)
disk_size = number
server_group : string
force_public_ip : optional(bool, false)
dns_servers : optional(set(string))
additional_disks = map(object({
size = number
tier = string
@@ -58,6 +64,13 @@ variable "k8s_allowed_remote_ips" {
}))
}
variable "bastion_allowed_remote_ips" {
type = list(object({
start_address = string
end_address = string
}))
}
variable "master_allowed_ports" {
type = list(object({
protocol = string
@@ -94,10 +107,6 @@ variable "loadbalancer_plan" {
type = string
}
variable "loadbalancer_outbound_proxy_protocol" {
type = string
}
variable "loadbalancer_legacy_network" {
type = bool
default = false
@@ -107,6 +116,7 @@ variable "loadbalancers" {
description = "Load balancers"
type = map(object({
proxy_protocol = bool
port = number
target_port = number
allow_internal_frontend = optional(bool)

View File

@@ -7,6 +7,10 @@ output "worker_ip" {
value = module.kubernetes.worker_ip
}
output "bastion_ip" {
value = module.kubernetes.bastion_ip
}
output "loadbalancer_domain" {
value = module.kubernetes.loadbalancer_domain
}

View File

@@ -1,17 +1,33 @@
[all]
${connection_strings_master}
${connection_strings_worker}
%{ for name, ips in master_ip ~}
${name} ansible_user=${username} ansible_host=${lookup(ips, "public", ips.private)} ip=${ips.private}
%{ endfor ~}
%{ for name, ips in worker_ip ~}
${name} ansible_user=${username} ansible_host=${lookup(ips, "public", ips.private)} ip=${ips.private}
%{ endfor ~}
[kube_control_plane]
${list_master}
%{ for name, ips in master_ip ~}
${name}
%{ endfor ~}
[etcd]
${list_master}
%{ for name, ips in master_ip ~}
${name}
%{ endfor ~}
[kube_node]
${list_worker}
%{ for name, ips in worker_ip ~}
${name}
%{ endfor ~}
[k8s_cluster:children]
kube_control_plane
kube_node
%{ if length(bastion_ip) > 0 ~}
[bastion]
%{ for name, ips in bastion_ip ~}
bastion ansible_user=${username} ansible_host=${ips.public}
%{ endfor ~}
%{ endif ~}

View File

@@ -32,16 +32,31 @@ variable "private_network_cidr" {
default = "172.16.0.0/24"
}
variable "dns_servers" {
description = "DNS servers that will be used by the nodes. Until [this is solved](https://github.com/UpCloudLtd/terraform-provider-upcloud/issues/562) this is done using user_data to reconfigure resolved"
type = set(string)
default = []
}
variable "use_public_ips" {
description = "If all nodes should get a public IP"
type = bool
default = true
}
variable "machines" {
description = "Cluster machines"
type = map(object({
node_type = string
plan = string
cpu = string
mem = string
cpu = optional(number)
mem = optional(number)
disk_size = number
server_group : string
force_public_ip : optional(bool, false)
dns_servers : optional(set(string))
additional_disks = map(object({
size = number
tier = string
@@ -89,6 +104,15 @@ variable "k8s_allowed_remote_ips" {
default = []
}
variable "bastion_allowed_remote_ips" {
description = "List of IP start/end addresses allowed to SSH to bastion"
type = list(object({
start_address = string
end_address = string
}))
default = []
}
variable "master_allowed_ports" {
description = "List of ports to allow on masters"
type = list(object({
@@ -131,11 +155,6 @@ variable "loadbalancer_plan" {
default = "development"
}
variable "loadbalancer_proxy_protocol" {
type = bool
default = false
}
variable "loadbalancer_legacy_network" {
description = "If the loadbalancer should use the deprecated network field instead of networks blocks. You probably want to have this set to false"
@@ -147,6 +166,7 @@ variable "loadbalancers" {
description = "Load balancers"
type = map(object({
proxy_protocol = bool
port = number
target_port = number
allow_internal_frontend = optional(bool, false)