Vladimir Prus (@vprus) — bluesky.baby

A bored scribe doodled a ten-eyed letter O in some manuscript in the 15th century. Little did they know they are influencing international character encoding standards some 578 years later…
en.wikipedia.org/wiki/Cyrilli...

02.05.2025 21:48 👍 45 🔁 10 💬 2 📌 2

AWS / EKS / Security Question.

If I have a pod with NET_ADMIN capability, and the default pod network namespace, and it gets compromised, what's the worst it can do?

I am interested in this question specifically, not general advice.

24.04.2025 15:44 👍 0 🔁 0 💬 0 📌 0

In gRPC/Go, setting up weighted load balancing is surprisingly simple.

🔹 On the server side, we need to compute requests per second and application load.

🔹 On the client side, enable weight-based request distribution.

It was not exactly five minutes to figure out, though

06.03.2025 20:59 👍 0 🔁 0 💬 0 📌 0

At my previous job, instant communication was simple

- IRC (Internet Relay Chat) was used
- All messages were deleted after 2 weeks

I still believe that is the best way, and all the modern Slacks with years of history are only good for Slack's valuation.

26.02.2025 17:47 👍 0 🔁 0 💬 0 📌 0

The largest AWS r7i instance type is 48xlarge, with 192 vCPUs, or 96 cores.

It has two Xeon 8488C processors, each with 48 cores.

Each procesor has 4 silicon dies.

Each die has 15 cores.

I assume that 48 cores, and not 60, is the result of binning.

This is a very heterogeneous architecture.

14.02.2025 13:51 👍 0 🔁 0 💬 0 📌 0

What's the easiest way to make GRPC load-balancing consider target load?

The Go client can do weighted round-robin with xDS, but xDS requires Istio, and I'd rather not.

The Go client also supports custom load balancing policies, but that's very DIY.

Does anybody have practical recommendations?

06.02.2025 17:09 👍 0 🔁 0 💬 0 📌 0

Ah, if EventBridge was there from the day one, it makes more sense.

Nonetheless, we now have two services where I can send my customers events, with similar features.

Would it be safe to say that SQS is better for my app events, while EventBridge is more for AWS events?

06.02.2025 12:26 👍 0 🔁 0 💬 1 📌 0

But, when EventBridge was introduced in 2019, every service had to be modified to write events to it. Surely it would be possible to add SQS target instead?

06.02.2025 12:08 👍 0 🔁 0 💬 1 📌 0

Can anyone explain AWS EventBridge to me?

- Many services have triggers, e.g. I can have S3 trigger invoking Lambda.
- There is SQS that I can use any way I like

Surely, if any service could write to SQS, we would not need yet another service?

06.02.2025 11:29 👍 0 🔁 0 💬 1 📌 0

Data engineers, do we have a canonical big data modeling methodology now?

Kimball was it. However, it requires many joins and it's not perfect for big data on S3. The methodology itself might be overkill in most cases.

Do we have anything now beyond "use wide tables" and "scd2 if needed"?

31.01.2025 08:56 👍 0 🔁 0 💬 0 📌 0

I sometimes host system design interviews, and for 95% of candidates, the default database choice is PostgreSQL. The remaining 5% mention MongoDB or Cassandra, but I never heard anybody mention MySQL or MariaDB.

Is this just my bubble, or has PostgreSQL decisively won over MySQL?

23.01.2025 09:02 👍 0 🔁 0 💬 2 📌 0

IAM Roles Anywhere Notes on external access to AWS

I needed to access AWS from Kubernetes in another cloud and used IAM Roles Anywhere. In this post, I detail the steps and make some conclusions.

Spoiler: it does the job in easy cases, but for full-blown deployment, you will need to write your own automation.

vladimirprus.com/blog/2025-01...

14.01.2025 09:18 👍 1 🔁 0 💬 0 📌 0

Linear regression is a dangerous tool. It can fit any data set, but it has a number of assumptions. If you don't check them, the results might be invalid.

Generally, you have to either draw diagnostic charts for linear regression, or check the confidence intervals for coefficients, or both.

13.01.2025 16:11 👍 1 🔁 0 💬 1 📌 0

Thanks. It seems Gemini API does have context caching, so I might give it a try.

09.01.2025 19:16 👍 0 🔁 0 💬 0 📌 0

Does anybody understand if Gemini LLM model, with stated context size of 1M tokens, really mean one basically load all the context data and don't use any RAG?

There's one benchmark, called "RULER", which claims the effective context size is ">128K", which is still fairly impressive.

09.01.2025 17:04 👍 0 🔁 0 💬 0 📌 0

New toy: Odroid M1S. Quad-core ARM A-55, 8GB of RAM, 64GB eMMC storage, M.2 slot, gigabit ethernet.

17.12.2024 13:40 👍 1 🔁 0 💬 0 📌 0

Authenticating users with AWS ALB Secure your Kubernetes app in AWS using Google user authentication

Blogged: Authenticating users with AWS ALB.

It is generally best not to do your own auth. Now that AWS ALB has built-in OAuth support, you can completely off-load authentication, with your service receiving only requests from known users.

vladimirprus.com/blog/2024-11...

02.12.2024 11:42 👍 1 🔁 0 💬 0 📌 0

Adopting Spark Connect How we use a shared Spark server to make our Spark infrastructure more efficient

Spark Connect is a new feature of Spark that enables lightweight drivers to use shared execution "cluster".

In this post, my colleague Sergey Kotlov explains when it is useful, how to make it work in practice, and what challenges you might find.

towardsdatascience.com/adopting-spa...

27.11.2024 12:15 👍 1 🔁 0 💬 0 📌 0

Hello Blusky! I am a data engineer working on things like Spark infrastructure, A/B tests and anomaly detection.

Previously, I worked on developer tools such as GDB, Eclipse, and KDevelop.

Hopefully, this platform will be a good one for technical content.

27.11.2024 08:57 👍 5 🔁 0 💬 0 📌 0

Vladimir Prus

Latest posts by Vladimir Prus @vprus