From EKS to Hetzner: Building a Kubernetes Module with Talos Linux and OpenTofu
How I built a single-apply OpenTofu module for Kubernetes on Hetzner Cloud using Talos Linux, replacing EKS at a fraction of the cost.
From EKS to Hetzner: Building a Kubernetes Module with Talos Linux and OpenTofu
Status: Pre-alpha. This project is not production-tested. It was abstracted from several community repos as a learning exercise. Do not run production workloads on this without thorough testing.
I needed something like EKS on Hetzner Cloud. A single OpenTofu module I could reference from another repo, run tofu apply, and have a working Kubernetes cluster. No wrapper scripts, no two-step deploys, no manual snapshot creation. Hetzner doesn’t offer managed Kubernetes, and the existing community modules all had tradeoffs I wanted to avoid. So I built one: hetzner-talos.
Why Hetzner?
A 3-node cluster on Hetzner (CX33, 4 vCPU / 8GB) runs roughly €40/month. The same on AWS with three t3.large instances, the EKS control plane fee, NAT Gateway, and EBS costs over €300/month for equivalent compute. For teams that also need EU data sovereignty (no CLOUD Act exposure, GDPR by design), Hetzner checks both boxes. (More on the legal side in our GDPR guide.)
The European managed Kubernetes landscape is maturing but not there yet. EU Cloud Cost’s Infomaniak review found 500-1000 IOPS storage and unreliable node provisioning. IONOS and OVHcloud exist, but none deliver the EKS experience of “here’s a module, give it a VPC, get a cluster.”
EKS vs Hetzner: Networking
The concepts map well:
| AWS EKS | Hetzner + Talos |
|---|---|
| VPC | Private Network (10.0.0.0/8) |
| Subnets across AZs | Subnets in eu-central zone (fsn1, nbg1, hel1) |
| NLB for K8s API | Hetzner Load Balancer (private, L4) |
| Security Groups | Hetzner Firewalls |
| Managed Node Groups | Worker Node Pools |
| EKS add-ons (VPC CNI, EBS CSI) | Cilium, Hetzner CSI, Hetzner CCM |
Hetzner is simpler: one network, subnets per role, no public/private subnet distinction. The eu-central network zone spans three datacenters (Falkenstein, Nuremberg, Helsinki), roughly analogous to availability zones.
The K8s API load balancer is private-only. During bootstrap, the API is reached through control plane public IPs, firewalled to the deployer’s auto-detected IP plus any extra CIDRs like a VPN range.
Like deploying EKS into an existing VPC, the module accepts an optional hcloud_network_id to deploy into a shared network.
The Snapshot Problem
This was the first real hurdle. Hetzner doesn’t support uploading custom OS images. They offer a fixed set (Debian, Ubuntu, etc.), so getting Talos onto a server requires a workaround: boot a temporary server into rescue mode, write the Talos image to its disk with dd, snapshot it, then use that snapshot for all cluster servers.
Every community solution uses this same fundamental trick. The difference is how it’s automated.
How Other Repos Do It
Packer (most common). Projects like hcloud-talos/terraform-hcloud-talos and kgierke/terraform-hcloud-talos-cluster use a Packer template that creates a temporary server in rescue mode, downloads the image, writes it with dd, and snapshots. Clean separation of concerns, but it means a two-step workflow: packer build, then tofu apply. You can’t do it in a single command.
(Side note: kube-hetzner popularized this Packer-on-Hetzner pattern, but uses k3s on MicroOS rather than Talos.)
hcloud-upload-image by apricote. A standalone Go CLI that wraps the entire rescue-mode-dd-snapshot sequence into a single command. Purpose-built for exactly this problem. It supports xz/bz2 compression, multiple architectures, and can take a URL directly. Simpler than Packer, but still a separate step before tofu apply and another binary to install.
Hetzner public ISO (since April 2025). Sidero Labs got Talos added as a public ISO on Hetzner (IDs 119702/119703). However, it ships a fixed schematic and version. If you need custom extensions (iscsi-tools, qemu-guest-agent) or a specific Talos release, you still need snapshots.
What I Ended Up With
I wanted the snapshot to be part of the Terraform lifecycle: created on first tofu apply, skipped if it already exists, rebuilt when the Talos version or extensions change. No separate Packer step, no extra CLIs beyond hcloud.
The solution is a 141-line shell script (build-talos-snapshot.sh) called via local-exec:
resource "terraform_data" "talos_snapshot" {
triggers_replace = [var.talos_version, local.talos_schematic_id]
provisioner "local-exec" {
command = "${path.module}/files/build-talos-snapshot.sh"
environment = {
HCLOUD_TOKEN = var.hcloud_token
TALOS_VERSION = var.talos_version
IMAGE_URL = local.talos_image_url
LOCATION = var.location
}
}
}
The script:
- Checks idempotency by querying
hcloud image listfor a snapshot labeled with the target Talos version. If found, it exits immediately. - Creates a temporary builder (CX33 with Debian 12, ephemeral SSH key).
- Boots into rescue mode via
hcloud server enable-rescue+ reboot, then polls SSH until ready. - Downloads and writes the image.
curlfetches it from Talos Image Factory, thenxz -dc | dd of=/dev/sda bs=4M conv=fsyncstreams it to disk. - Snapshots and cleans up by powering off, creating a labeled snapshot, and deleting the builder server. Everything is wrapped in a
trap cleanup EXIT.
The Image Factory URL is generated from a talos_image_factory_schematic resource, which takes the list of extensions (qemu-guest-agent, iscsi-tools, etc.) and produces a schematic ID. Change the extensions or version, and a new snapshot gets built on the next apply.
One lesson that cost real debugging time: don’t run standalone sync after dd on a rescue server. Overwriting /dev/sda breaks rescue-mode binary resolution because the rescue environment is loaded from that disk. Using dd conv=fsync flushes each block as it writes, which is sufficient.
| Approach | Extra tools? | Single tofu apply? | Idempotent? |
|---|---|---|---|
| Packer | Yes (packer) | No, separate step | Manual |
| hcloud-upload-image | Yes (Go binary) | No, separate step | No |
| Public ISO | None | N/A | N/A (fixed version) |
| This module | hcloud CLI only | Yes | Yes (label check) |
The Chicken-and-Egg Problem
The second hurdle. On EKS, the control plane exists before Terraform needs to talk to it. Self-hosted Kubernetes has a circular dependency:
- Installing Cilium (CNI) requires a running cluster
- Nodes won’t be
Readywithout a CNI - The
helmandkubernetesTerraform providers need a valid kubeconfig at plan time - The kubeconfig doesn’t exist until after bootstrap
Most modules solve this with two applies or wrapper scripts. I wanted a single tofu apply.
The approach: no helm or kubernetes providers. Instead, terraform_data with local-exec provisioners that call helm and kubectl CLI tools. The kubeconfig is passed as a base64-encoded environment variable (stays out of tofu plan output) and decoded to a temp file at runtime:
# files/kubeconfig-env.sh, sourced by all provisioners
KUBECONFIG_FILE=$(mktemp)
printf '%s' "$KUBECONFIG_B64" | base64 --decode > "$KUBECONFIG_FILE"
trap 'rm -f "$KUBECONFIG_FILE"' EXIT
export KUBECONFIG="$KUBECONFIG_FILE"
The Hetzner CCM needs its API token Secret before any local-exec runs because it’s deployed via a manifest URL during bootstrap. Talos’s inlineManifests solves this by baking the Secret directly into the machine config:
inlineManifests = [{
name = "hcloud-secret"
contents = yamlencode({
apiVersion = "v1"
kind = "Secret"
metadata = { name = "hcloud", namespace = "kube-system" }
type = "Opaque"
stringData = { token = var.hcloud_token, network = local.network_name }
})
}]
When the first control plane node bootstraps, the Secret exists immediately. No race condition.
Not elegant, but it works in one pass. When OpenTofu’s deferred actions stabilize, this can migrate to proper provider resources.
Using the Module
module "cluster" {
source = "github.com/mixxor/hetzner-talos"
hcloud_token = var.hcloud_token
cluster_name = "my-cluster"
kubernetes_version = "1.35.0"
worker_nodepools = [{
name = "default"
count = 3
server_type = "cx33"
locations = ["fsn1", "nbg1", "hel1"]
}]
}
One tofu apply creates the snapshot (if needed), provisions servers, bootstraps Talos, installs Cilium, CCM, CSI, and metrics server. The module exports kubeconfig, talosconfig, network IDs, and a cluster_ready output for downstream resources:
resource "terraform_data" "argocd" {
depends_on = [module.cluster] # waits for all components
provisioner "local-exec" {
command = "helm upgrade --install argocd ..."
environment = {
KUBECONFIG_B64 = base64encode(module.cluster.kubeconfig_public)
}
}
}
Multiple worker pools with labels and taints work like EKS managed node groups:
worker_nodepools = [
{ name = "default", count = 3, server_type = "cx33",
locations = ["fsn1", "nbg1", "hel1"] },
{ name = "db", count = 2, server_type = "cx43",
locations = ["fsn1"],
labels = { "workload" = "database" },
taints = ["dedicated=database:NoSchedule"] },
]
OpenTofu 1.11 Features
The module requires OpenTofu >= 1.11.0 and uses:
enabledmeta-argument:lifecycle { enabled = var.enable_csi }replaces thecount = var.x ? 1 : 0pattern. No more[0]indexing everywhere.- Cross-variable validation: for example, “cannot have more locations than control plane nodes.”
- Check blocks: post-apply health assertions (warnings, not blockers) that verify all nodes got private IPs.
Other Lessons
- Cilium on Talos 1.8+ needs
bpf.hostLegacyRouting=trueor DNS breaks (siderolabs/talos#10002) - Never use
/latest/in Talos manifest URLs because they’re baked into immutable machine config - Hetzner’s private network interface can be
enp7s0orens10depending on server type, so auto-detect it - Firewall CIDRs must be additive: use
distinct(concat(deployer_cidrs, var.firewall_allow_cidrs)), not a replacement. Otherwise the deployer gets locked out mid-bootstrap and nodes appear stuck in “Maintenance” - Hetzner locations can be temporarily unavailable, so don’t hardcode a default without checking
What This Is (and Isn’t)
This is a pre-alpha module built by studying hcloud-talos, kube-hetzner, blog posts, and the Talos docs, then solving the pain points differently: single-apply snapshots, no Packer dependency, no helm/kubernetes providers.
If you need production Kubernetes on European infrastructure today, look at managed offerings or the more established community modules. But if you’re exploring self-hosted Kubernetes on Hetzner and want a starting point that works in one command:
git clone https://github.com/mixxor/hetzner-talos && cd hetzner-talos
cp terraform.tfvars.example terraform.tfvars
export TF_VAR_hcloud_token="your-token"
tofu init && tofu apply
For a full example with WireGuard VPN, ArgoCD, and etcd backups, see hetzner-talos-example.
References:
- EU Cloud Cost: GDPR and Cloud Hosting
- EU Cloud Cost: Infomaniak Kubernetes Review
- EU Cloud Cost: Cloud Certifications Explained
- apricote/hcloud-upload-image, CLI tool for uploading custom images to Hetzner
- Talos Linux
- Talos Support Matrix
- siderolabs/talos#10002, Cilium DNS workaround
Find the Best Kubernetes Pricing
Configure your exact cluster requirements and compare real-time prices across 25+ European providers.
Open Calculator