Skip to main content
kubernetes hetzner talos opentofu terraform self-hosted

From EKS to Hetzner: Building a Kubernetes Module with Talos Linux and OpenTofu

How I built a single-apply OpenTofu module for Kubernetes on Hetzner Cloud using Talos Linux, replacing EKS at a fraction of the cost.

MR
Michael Raeck
9 min read

From EKS to Hetzner: Building a Kubernetes Module with Talos Linux and OpenTofu

Status: Pre-alpha. This project is not production-tested. It was abstracted from several community repos as a learning exercise. Do not run production workloads on this without thorough testing.

I needed something like EKS on Hetzner Cloud. A single OpenTofu module I could reference from another repo, run tofu apply, and have a working Kubernetes cluster. No wrapper scripts, no two-step deploys, no manual snapshot creation. Hetzner doesn’t offer managed Kubernetes, and the existing community modules all had tradeoffs I wanted to avoid. So I built one: hetzner-talos.

Why Hetzner?

A 3-node cluster on Hetzner (CX33, 4 vCPU / 8GB) runs roughly €40/month. The same on AWS with three t3.large instances, the EKS control plane fee, NAT Gateway, and EBS costs over €300/month for equivalent compute. For teams that also need EU data sovereignty (no CLOUD Act exposure, GDPR by design), Hetzner checks both boxes. (More on the legal side in our GDPR guide.)

The European managed Kubernetes landscape is maturing but not there yet. EU Cloud Cost’s Infomaniak review found 500-1000 IOPS storage and unreliable node provisioning. IONOS and OVHcloud exist, but none deliver the EKS experience of “here’s a module, give it a VPC, get a cluster.”

EKS vs Hetzner: Networking

The concepts map well:

AWS EKSHetzner + Talos
VPCPrivate Network (10.0.0.0/8)
Subnets across AZsSubnets in eu-central zone (fsn1, nbg1, hel1)
NLB for K8s APIHetzner Load Balancer (private, L4)
Security GroupsHetzner Firewalls
Managed Node GroupsWorker Node Pools
EKS add-ons (VPC CNI, EBS CSI)Cilium, Hetzner CSI, Hetzner CCM

Hetzner is simpler: one network, subnets per role, no public/private subnet distinction. The eu-central network zone spans three datacenters (Falkenstein, Nuremberg, Helsinki), roughly analogous to availability zones.

The K8s API load balancer is private-only. During bootstrap, the API is reached through control plane public IPs, firewalled to the deployer’s auto-detected IP plus any extra CIDRs like a VPN range.

Like deploying EKS into an existing VPC, the module accepts an optional hcloud_network_id to deploy into a shared network.

The Snapshot Problem

This was the first real hurdle. Hetzner doesn’t support uploading custom OS images. They offer a fixed set (Debian, Ubuntu, etc.), so getting Talos onto a server requires a workaround: boot a temporary server into rescue mode, write the Talos image to its disk with dd, snapshot it, then use that snapshot for all cluster servers.

Every community solution uses this same fundamental trick. The difference is how it’s automated.

How Other Repos Do It

Packer (most common). Projects like hcloud-talos/terraform-hcloud-talos and kgierke/terraform-hcloud-talos-cluster use a Packer template that creates a temporary server in rescue mode, downloads the image, writes it with dd, and snapshots. Clean separation of concerns, but it means a two-step workflow: packer build, then tofu apply. You can’t do it in a single command.

(Side note: kube-hetzner popularized this Packer-on-Hetzner pattern, but uses k3s on MicroOS rather than Talos.)

hcloud-upload-image by apricote. A standalone Go CLI that wraps the entire rescue-mode-dd-snapshot sequence into a single command. Purpose-built for exactly this problem. It supports xz/bz2 compression, multiple architectures, and can take a URL directly. Simpler than Packer, but still a separate step before tofu apply and another binary to install.

Hetzner public ISO (since April 2025). Sidero Labs got Talos added as a public ISO on Hetzner (IDs 119702/119703). However, it ships a fixed schematic and version. If you need custom extensions (iscsi-tools, qemu-guest-agent) or a specific Talos release, you still need snapshots.

What I Ended Up With

I wanted the snapshot to be part of the Terraform lifecycle: created on first tofu apply, skipped if it already exists, rebuilt when the Talos version or extensions change. No separate Packer step, no extra CLIs beyond hcloud.

The solution is a 141-line shell script (build-talos-snapshot.sh) called via local-exec:

resource "terraform_data" "talos_snapshot" {
  triggers_replace = [var.talos_version, local.talos_schematic_id]

  provisioner "local-exec" {
    command     = "${path.module}/files/build-talos-snapshot.sh"
    environment = {
      HCLOUD_TOKEN  = var.hcloud_token
      TALOS_VERSION = var.talos_version
      IMAGE_URL     = local.talos_image_url
      LOCATION      = var.location
    }
  }
}

The script:

  1. Checks idempotency by querying hcloud image list for a snapshot labeled with the target Talos version. If found, it exits immediately.
  2. Creates a temporary builder (CX33 with Debian 12, ephemeral SSH key).
  3. Boots into rescue mode via hcloud server enable-rescue + reboot, then polls SSH until ready.
  4. Downloads and writes the image. curl fetches it from Talos Image Factory, then xz -dc | dd of=/dev/sda bs=4M conv=fsync streams it to disk.
  5. Snapshots and cleans up by powering off, creating a labeled snapshot, and deleting the builder server. Everything is wrapped in a trap cleanup EXIT.

The Image Factory URL is generated from a talos_image_factory_schematic resource, which takes the list of extensions (qemu-guest-agent, iscsi-tools, etc.) and produces a schematic ID. Change the extensions or version, and a new snapshot gets built on the next apply.

One lesson that cost real debugging time: don’t run standalone sync after dd on a rescue server. Overwriting /dev/sda breaks rescue-mode binary resolution because the rescue environment is loaded from that disk. Using dd conv=fsync flushes each block as it writes, which is sufficient.

ApproachExtra tools?Single tofu apply?Idempotent?
PackerYes (packer)No, separate stepManual
hcloud-upload-imageYes (Go binary)No, separate stepNo
Public ISONoneN/AN/A (fixed version)
This modulehcloud CLI onlyYesYes (label check)

The Chicken-and-Egg Problem

The second hurdle. On EKS, the control plane exists before Terraform needs to talk to it. Self-hosted Kubernetes has a circular dependency:

  1. Installing Cilium (CNI) requires a running cluster
  2. Nodes won’t be Ready without a CNI
  3. The helm and kubernetes Terraform providers need a valid kubeconfig at plan time
  4. The kubeconfig doesn’t exist until after bootstrap

Most modules solve this with two applies or wrapper scripts. I wanted a single tofu apply.

The approach: no helm or kubernetes providers. Instead, terraform_data with local-exec provisioners that call helm and kubectl CLI tools. The kubeconfig is passed as a base64-encoded environment variable (stays out of tofu plan output) and decoded to a temp file at runtime:

# files/kubeconfig-env.sh, sourced by all provisioners
KUBECONFIG_FILE=$(mktemp)
printf '%s' "$KUBECONFIG_B64" | base64 --decode > "$KUBECONFIG_FILE"
trap 'rm -f "$KUBECONFIG_FILE"' EXIT
export KUBECONFIG="$KUBECONFIG_FILE"

The Hetzner CCM needs its API token Secret before any local-exec runs because it’s deployed via a manifest URL during bootstrap. Talos’s inlineManifests solves this by baking the Secret directly into the machine config:

inlineManifests = [{
  name = "hcloud-secret"
  contents = yamlencode({
    apiVersion = "v1"
    kind       = "Secret"
    metadata   = { name = "hcloud", namespace = "kube-system" }
    type       = "Opaque"
    stringData = { token = var.hcloud_token, network = local.network_name }
  })
}]

When the first control plane node bootstraps, the Secret exists immediately. No race condition.

Not elegant, but it works in one pass. When OpenTofu’s deferred actions stabilize, this can migrate to proper provider resources.

Using the Module

module "cluster" {
  source       = "github.com/mixxor/hetzner-talos"
  hcloud_token = var.hcloud_token

  cluster_name       = "my-cluster"
  kubernetes_version = "1.35.0"

  worker_nodepools = [{
    name        = "default"
    count       = 3
    server_type = "cx33"
    locations   = ["fsn1", "nbg1", "hel1"]
  }]
}

One tofu apply creates the snapshot (if needed), provisions servers, bootstraps Talos, installs Cilium, CCM, CSI, and metrics server. The module exports kubeconfig, talosconfig, network IDs, and a cluster_ready output for downstream resources:

resource "terraform_data" "argocd" {
  depends_on = [module.cluster]  # waits for all components

  provisioner "local-exec" {
    command = "helm upgrade --install argocd ..."
    environment = {
      KUBECONFIG_B64 = base64encode(module.cluster.kubeconfig_public)
    }
  }
}

Multiple worker pools with labels and taints work like EKS managed node groups:

worker_nodepools = [
  { name = "default", count = 3, server_type = "cx33",
    locations = ["fsn1", "nbg1", "hel1"] },
  { name = "db", count = 2, server_type = "cx43",
    locations = ["fsn1"],
    labels = { "workload" = "database" },
    taints = ["dedicated=database:NoSchedule"] },
]

OpenTofu 1.11 Features

The module requires OpenTofu >= 1.11.0 and uses:

  • enabled meta-argument: lifecycle { enabled = var.enable_csi } replaces the count = var.x ? 1 : 0 pattern. No more [0] indexing everywhere.
  • Cross-variable validation: for example, “cannot have more locations than control plane nodes.”
  • Check blocks: post-apply health assertions (warnings, not blockers) that verify all nodes got private IPs.

Other Lessons

  • Cilium on Talos 1.8+ needs bpf.hostLegacyRouting=true or DNS breaks (siderolabs/talos#10002)
  • Never use /latest/ in Talos manifest URLs because they’re baked into immutable machine config
  • Hetzner’s private network interface can be enp7s0 or ens10 depending on server type, so auto-detect it
  • Firewall CIDRs must be additive: use distinct(concat(deployer_cidrs, var.firewall_allow_cidrs)), not a replacement. Otherwise the deployer gets locked out mid-bootstrap and nodes appear stuck in “Maintenance”
  • Hetzner locations can be temporarily unavailable, so don’t hardcode a default without checking

What This Is (and Isn’t)

This is a pre-alpha module built by studying hcloud-talos, kube-hetzner, blog posts, and the Talos docs, then solving the pain points differently: single-apply snapshots, no Packer dependency, no helm/kubernetes providers.

If you need production Kubernetes on European infrastructure today, look at managed offerings or the more established community modules. But if you’re exploring self-hosted Kubernetes on Hetzner and want a starting point that works in one command:

git clone https://github.com/mixxor/hetzner-talos && cd hetzner-talos
cp terraform.tfvars.example terraform.tfvars
export TF_VAR_hcloud_token="your-token"
tofu init && tofu apply

For a full example with WireGuard VPN, ArgoCD, and etcd backups, see hetzner-talos-example.


References:

M
Michael Raeck

Cloud infrastructure nerd. Building tools to make Kubernetes less painful and more affordable in Europe. Running Talos clusters on Hetzner for fun.

READY TO COMPARE?

Find the Best Kubernetes Pricing

Configure your exact cluster requirements and compare real-time prices across 25+ European providers.

Open Calculator

Open Source Pricing Data

All pricing data is open source and community-maintained

View on GitHub