Daniele Polencic's Avatar

Daniele Polencic

@danielepolencic.com

Teaching Kubernetes at @LearnKube.com

283
Followers
11
Following
281
Posts
07.02.2024
Joined
Posts Following

Latest posts by Daniele Polencic @danielepolencic.com

https://assets.learnk8s.io/linkedin-174.png

https://assets.learnk8s.io/linkedin-174.png

Just landed: Learn Kubernetes weekly 174! My top picks:

๐ŸŽฎ Making and Scaling a Game Server with Agones
๐Ÿ—„๏ธ Zero-Downtime PostgreSQL Migration
๐Ÿš€ From Chaos to 99.9% Uptime
๐Ÿ“Š k8s-d2: Kubernetes visualization

Read it here: https://kube.today/issues/174

11.03.2026 12:11 ๐Ÿ‘ 8 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

What worked was turning the specification into code that checks the current state, compares it to what should exist, and blocks progress until earlier steps are done.

The specification isn't a document. It's a build system.

https://danielepolencic.com/specification-is-the-product

10.03.2026 13:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I tried several specification languages against a workflow I actually maintain.

The AI understood every step. It just skipped the ones it didn't feel like doing.

Improving the description changed nothing.

10.03.2026 13:36 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

AI coding tools are getting faster. Some people run them directly on production, skipping reviews and checks entirely.

Others build chains of requirements docs & architecture decisions.

One camp says code is the artifact. The other says specifications.

10.03.2026 13:36 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Each one either got bypassed โ€” the agent runs as me, same UID, same permissions โ€” or locked the agent out so hard it couldn't do its job.

I came to the conclusion that the credentials shouldn't be on the machine at all.

https://danielepolencic.com/hiding-secrets-from-ai-agents

06.03.2026 13:31 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I tried five ways to stop this:

- Encrypted the files
- Moved secrets to Keychain
- Gated with Touch ID
- Built a compiled native addon
- Ran the agent in a sandbox

06.03.2026 13:31 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I built a few CLI tools for myself: one searches Gmail, another reads GDocs.

My AI agent needed to download an email attachment. My CLI didn't have that command. So it found the credentials on disk, called the API directly, and leaked my refresh token.

06.03.2026 13:31 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://assets.learnk8s.io/linkedin-173.png

https://assets.learnk8s.io/linkedin-173.png

Just landed: Learn Kubernetes weekly 173! My top picks:

๐Ÿงช Integration Testing with Kubernetes
๐Ÿ” Vault OIDC Authentication
๐Ÿ›ก๏ธ Admission & Runtime Guardrails
โœ… Kogaro Config Hygiene Agent

Read it here: https://kube.today/issues/173

04.03.2026 12:11 ๐Ÿ‘ 8 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I'm presenting a live session this Thursday with vCluster:

GPU Multi-Tenancy: When to Share, When to Separate

Register here: ku.bz/multitenant26

03.03.2026 14:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544389/gpu-sharing-problems-2026/slide-9.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544389/gpu-sharing-problems-2026/slide-9.png

The worst case: two process contexts each believe there's enough memory. Hidden reservations and runtime overhead keep shrinking real headroom.

When another workload arrives, both crash with out-of-memory errors.

03.03.2026 14:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544388/gpu-sharing-problems-2026/slide-8.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544388/gpu-sharing-problems-2026/slide-8.png

Even after memory is freed, allocation patterns can leave fragmented gaps. You may have free VRAM in total and still fail the next allocation.

That's a failure mode that doesn't show up in your dashboard until it happens.

03.03.2026 14:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544387/gpu-sharing-problems-2026/slide-7.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544387/gpu-sharing-problems-2026/slide-7.png

nvidia-smi is useful, but the driver uses a pooling allocator. Reserved memory, active model memory, and temporary workspace don't cleanly add up to one number.

You get useful signals. Not a precise per-workload memory bill.

03.03.2026 14:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544386/gpu-sharing-problems-2026/slide-6.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544386/gpu-sharing-problems-2026/slide-6.png

So if a batch job launches long kernels, latency-sensitive requests queue behind it.

Average utilization can still look fine. P95 and P99 latency tells a different story.

03.03.2026 14:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544385/gpu-sharing-problems-2026/slide-5.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544385/gpu-sharing-problems-2026/slide-5.png

CPUs preempt tasks constantly. They pause one task and rotate work across cores. That creates fair turn-taking.

GPU kernels typically run to completion. The next workload waits at kernel boundaries instead of getting a fair slice of time.

03.03.2026 14:06 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544384/gpu-sharing-problems-2026/slide-4.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544384/gpu-sharing-problems-2026/slide-4.png

GPUs work differently. The driver is in charge of what the kernel would normally handle: memory allocation, execution sequencing, and runtime coordination.

Your real sharing boundaries are defined by driver behavior, not kernel primitives.

03.03.2026 14:06 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544383/gpu-sharing-problems-2026/slide-3.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544383/gpu-sharing-problems-2026/slide-3.png

For CPU and memory, Kubernetes uses cgroups. A container asks for a fraction of CPU or a fixed memory limit, and the Linux kernel enforces it.

That gives predictable limits and fair sharing between workloads.

03.03.2026 14:06 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772544380/gpu-sharing-problems-2026/slide-1.png

https://res.cloudinary.com/learnk8s/image/upload/v1772544380/gpu-sharing-problems-2026/slide-1.png

You want to share GPUs: one team runs inference, another trains models, and both need the same expensive cards.

The problem is that GPUs don't behave like CPU and RAM under contention.

(I will cover this on Thursday: ku.bz/multitenant26 )

๐Ÿงต

03.03.2026 14:06 ๐Ÿ‘ 9 ๐Ÿ” 8 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004215/kubex-book-2026/slide-10.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004215/kubex-book-2026/slide-10.png

The book covers measurement, architecture decisions, and full-stack right-sizing across 4 chapters.

Free download: ku.bz/KL4jRvsL4

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004212/kubex-book-2026/slide-9.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004212/kubex-book-2026/slide-9.png

When you rent a GPU node, you also pay for the CPU and memory that comes with it. It's a bundle.

If the GPU is fully reserved but your workloads barely touch the CPU and memory, most of what you're paying for sits idle.

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004210/kubex-book-2026/slide-8.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004210/kubex-book-2026/slide-8.png

You can also split a GPU into separate sections with hard boundaries.

Each gets its own compute and memory. No interference between workloads.

A training job on one section reaches 89% efficiency โ€” almost the same as having the whole GPU.

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004207/kubex-book-2026/slide-7.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004207/kubex-book-2026/slide-7.png

You can share a GPU by giving pods turns on the same hardware. Sounds efficient.

But GPUs don't multitask like CPUs. Each job runs to completion before the next starts.

A training job at 92% efficiency alone drops to 47% when sharing.

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004204/kubex-book-2026/slide-6.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004204/kubex-book-2026/slide-6.png

A pod using almost no CPU and RAM can still lock an entire GPU node.

GPUs are the scheduling bottleneck. The remaining CPU and memory sit idle โ€” and you're paying for all of it.

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004202/kubex-book-2026/slide-5.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004202/kubex-book-2026/slide-5.png

The metrics that actually matter:

SM Active โ€” are compute cores busy or waiting?
DRAM Active โ€” is memory bandwidth the bottleneck?
Tensor pipeline โ€” is mixed-precision hitting the fast path?

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004199/kubex-book-2026/slide-4.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004199/kubex-book-2026/slide-4.png

nvidia-smi's GPU-Util is a time-based "busy" signal โ€” the percent of time any kernel was running.

A pod doing nothing useful can show 54%.

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004196/kubex-book-2026/slide-3.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004196/kubex-book-2026/slide-3.png

Three layers, three different answers.

Kubernetes: "we're full."
nvidia-smi: "2% utilization."
The app: "6.67 requests per second."

All correct. None tells you if the GPU is efficient.

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004194/kubex-book-2026/slide-2.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004194/kubex-book-2026/slide-2.png

Your dashboards show 4/4 GPUs allocated. Everyone assumes they're being used.

But allocation just means "reserved." It says nothing about whether the GPU is actually doing work.

02.03.2026 12:41 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1772004192/kubex-book-2026/slide-1.png

https://res.cloudinary.com/learnk8s/image/upload/v1772004192/kubex-book-2026/slide-1.png

Gulcan and I wrote a free book on right-sizing GPUs in Kubernetes.

Here's the short version (thread)

02.03.2026 12:41 ๐Ÿ‘ 10 ๐Ÿ” 8 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
https://assets.learnk8s.io/linkedin-172.png

https://assets.learnk8s.io/linkedin-172.png

Just landed: Learn Kubernetes weekly 172! My top picks:

๐ŸŽ’ Data Streaming: Kafka + Flink Baggage Tracker
๐Ÿฅง Raspberry Pi Home Kubernetes Cluster
๐Ÿค– AI Document Processing with Ray
๐Ÿ’ฐ Wozz: Kubernetes Cost Tool

Read it here: https://kube.today/issues/172

25.02.2026 12:11 ๐Ÿ‘ 7 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

learnkube.com/etcd-breaks-at-scale

24.02.2026 13:21 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
https://res.cloudinary.com/learnk8s/image/upload/v1771937075/etcd-breaks-at-scale-2026/slide-8.png

https://res.cloudinary.com/learnk8s/image/upload/v1771937075/etcd-breaks-at-scale-2026/slide-8.png

K3s ships Kine, a shim that speaks the etcd API but stores data in SQLite or PostgreSQL.

AWS replaced Raft with a journal service. Google swapped in Spanner. All kept the etcd API.

24.02.2026 13:21 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0