Suraj Deshmukh | सुरज देशमुख's Avatar

Suraj Deshmukh | सुरज देशमुख

@suraj.io

@Microsoft.com | ex-@kinvolkio ex-@RedHat | bibliophile | He/Him | Opinions are my own. 🟥 🟩 🟦 🟨

207
Followers
296
Following
152
Posts
10.12.2023
Joined
Posts Following

Latest posts by Suraj Deshmukh | सुरज देशमुख @suraj.io

GitHub does dotfiles - dotfiles.github.io

Github has a recommendation on doing dotfiles:
dotfiles.github.io

26.02.2026 19:10 👍 0 🔁 0 💬 0 📌 0
Preview
Setting Up OpenClaw with Azure AI Foundry Learn how to configure OpenClaw to use Azure AI Foundry models, giving you a self-hosted AI assistant accessible from Telegram and other chat apps.

I just published a new guide on configuring #OpenClaw 🦀 to run with #Azure AI Foundry models. You control data control, so more privacy, talk to it from #Telegram or using the console!

Check it out here: suraj.io/post/2026/op...

23.02.2026 17:51 👍 2 🔁 0 💬 0 📌 0
Preview
Running Linux Containers Natively on macOS with Apple's Container CLI Learn how to use Apple's container CLI tool to run Linux containers as lightweight VMs on macOS with sub-second startup times

Apple has a new native container CLI for macOS! Run Linux containers without Docker Desktop—with sub-second startup times. 🚀

My guide covers setup, resource limits, and fixing macOS firewall blocks:
🔗 suraj.io/post/2026/us...

#macOS #Containers

22.02.2026 20:00 👍 1 🔁 0 💬 0 📌 0
goodreads — ClawHub Search for books, get book details and reviews, discover personalized recommendations, and manage reading lists on Goodreads — all through browser automation.

Try it:

/goodreads tell me about project hail mary by andy weir
/goodreads add the midnight library to my want to read shelf

Install with one command:
clawhub install goodreads

🔗 clawhub.ai/surajssd/goodreads
🐙 github.com/surajssd/openclaw-goodreads-skill

17.02.2026 20:26 👍 0 🔁 0 💬 0 📌 0
goodreads — ClawHub Search for books, get book details and reviews, discover personalized recommendations, and manage reading lists on Goodreads — all through browser automation.

2/n: Since Goodreads deprecated their API in 2020, this skill uses browser automation under the hood. No API keys (but you'd need to login once) — just the browser tool doing what you'd do manually!

17.02.2026 20:25 👍 0 🔁 0 💬 1 📌 0
goodreads — ClawHub Search for books, get book details and reviews, discover personalized recommendations, and manage reading lists on Goodreads — all through browser automation.

1/n 📚 Made something for fellow book nerds using Openclaw:

A Goodreads skill that lets your AI agent search for books, pull up details & reviews, get personalized recommendations, and manage your reading lists — all through natural language.

17.02.2026 20:25 👍 1 🔁 0 💬 1 📌 0
Preview
Deploying Kimi K2.5 on Azure: A Complete Guide to Running MoonshotAI's Model Learn how to deploy and configure Kimi K2.5 on Azure AI Foundry with this step-by-step guide.

Deploying #Kimi K2.5 on #Azure: A Complete Guide to Running MoonshotAI's Model suraj.io/post/2026/de...

11.02.2026 00:28 👍 0 🔁 0 💬 0 📌 0
Preview
Running Pydantic’s Monty Rust sandboxed Python subset in WebAssembly There’s a jargon-filled headline for you! Everyone’s building sandboxes for running untrusted code right now, and Pydantic’s latest attempt, Monty, provides a custom Python-like language (a subset of ...

Running Pydantic’s Monty Rust sandboxed Python subset in WebAssembly

simonwillison.net/2026/Feb/6/p...

08.02.2026 03:40 👍 2 🔁 2 💬 0 📌 0
Preview
Handy Handy is a cross platform, open-source, speech-to-text application for your computer

Thanks to @scott.hanselman.com for showing me Handy (handy.computer) — a free, open-source speech-to-text tool that runs locally on your machine. Push-to-talk, privacy-focused, and just works. Check it out!

03.02.2026 06:05 👍 41 🔁 13 💬 2 📌 0
Preview
Running Docker Commands on a Remote Machine via SSH Learn how to execute Docker commands on a remote machine from your local terminal using SSH and Docker contexts

Running Docker Commands on a Remote Machine via SSH suraj.io/post/2026/re...

#docker #ssh #remote #containers #cli #development #devops

01.02.2026 23:56 👍 0 🔁 0 💬 0 📌 0
Preview
Using Claude Code with GitHub-Hosted Anthropic Models Learn how to use Claude Code CLI with GitHub Models by proxying requests through litellm-proxy

Using Claude Code with GitHub-Hosted Anthropic Models suraj.io/post/2026/us... #claude #github-models #ai #litellm #anthropic

01.02.2026 23:54 👍 0 🔁 0 💬 0 📌 0
Meta’s Kubernetes-based Portable AI Research Environment - Shaun Hopper, Meta & Navarre Pratt
Meta’s Kubernetes-based Portable AI Research Environment - Shaun Hopper, Meta & Navarre Pratt YouTube video by CNCF [Cloud Native Computing Foundation]

Meta’s Kubernetes-based Portable AI Research Environment youtu.be/ts7bI51gRCo?...

26.11.2025 14:26 👍 1 🔁 0 💬 0 📌 0
LLMs on Kubernetes: Squeeze 5x GPU Efficiency With Cache, Route, Repea... Yuhan Liu & Suraj Deshmukh
LLMs on Kubernetes: Squeeze 5x GPU Efficiency With Cache, Route, Repea... Yuhan Liu & Suraj Deshmukh YouTube video by CNCF [Cloud Native Computing Foundation]

Our talk (me & Yuhan Liu) on improving LLM serving efficienty is on YouTube now!
youtu.be/2YCDvZokqnk?...

#vllm #kubernetes #kubecon

26.11.2025 01:30 👍 3 🔁 0 💬 0 📌 0
Preview
Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog Today, we are unveiling the next Fairwater site of Azure AI datacenters in Atlanta, Georgia. This purpose-built datacenter is connected to our first Fairwater site in Wisconsin, prior generations of A...

Infinite scale: The architecture behind the Azure AI superfactory

blogs.microsoft.com/blog/2025/11...

20.11.2025 00:25 👍 2 🔁 0 💬 0 📌 0
Preview
Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark Plus what happens if AI labs train for pelicans riding bicycles?

Gemini 3, Open AI kv cache and much more
open.substack.com/pub/simonw/p...

20.11.2025 00:22 👍 1 🔁 0 💬 0 📌 0

and also allow you to do kv cache offload to local storage for 24hrs! Also they cache only when the query is greater than 1024 tokens!

20.11.2025 00:16 👍 0 🔁 0 💬 0 📌 0
Preview
OpenAI Platform Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

Open AI gave some of the details from the user POV as to what kv cache features are available 
platform.openai.com/docs/guides/...

It is interesting to see that they cache for 10 min and if no request is found they remove hot caches from GPU

20.11.2025 00:16 👍 1 🔁 0 💬 1 📌 0
Preview
Microsoft AI superfactory Microsoft unveiled its second Fairwater AI datacenter in Atlanta as part of a new AI superfactory working across states in nearly real time.

From Wisconsin to Atlanta: Microsoft connects datacenters to build its first AI superfactory

news.microsoft.com/source/featu...

19.11.2025 04:10 👍 0 🔁 0 💬 0 📌 0
Satya Nadella – How Microsoft thinks about AGI
Satya Nadella – How Microsoft thinks about AGI YouTube video by Dwarkesh Patel

Satya Nadella – How Microsoft thinks about AGI
youtu.be/8-boBsWcr5A?...

15.11.2025 23:22 👍 0 🔁 0 💬 0 📌 0
Keynote: How One Line of Code Freed 30,000 CPU Cores: Deep-Diving Fluent Bit at Petabyte... F. Ponce
Keynote: How One Line of Code Freed 30,000 CPU Cores: Deep-Diving Fluent Bit at Petabyte... F. Ponce YouTube video by CNCF [Cloud Native Computing Foundation]

How One Line of Code Freed 30,000 CPU Cores: Deep-Diving Fluent Bit at Petabyte Scale www.youtube.com/watch?v=pbOv...

15.11.2025 20:53 👍 0 🔁 0 💬 0 📌 0
KubeCon + CloudNativeCon North America 2025: LLMs on Kubernetes: Squeeze 5x GPU Effic... View more about this event at KubeCon + CloudNativeCon North America 2025

Come see us (me & Yuhan Liu) tomorrow for our talk.

Specifically, Wednesday November 12, 2025 5:30pm - 6:00pm EST at Building B | Level 5 | Thomas Murphy Ballroom 1.

More info: sched.co/27FcQ #kubecon #vllm

11.11.2025 19:51 👍 0 🔁 0 💬 0 📌 0
Preview
Ray Direct Transport: RDMA Support in Ray Core (Part 1) Ray Direct Transport enables fast and direct GPU transfers in Ray via RDMA-backed transports. Using RDT, we can achieve up to 1000x faster GPU-GPU transfers than Ray’s native object store with a few l...

Announcing Ray Direct Transport: RDMA Support in Ray Core
www.anyscale.com/blog/ray-dir...

05.11.2025 01:06 👍 1 🔁 0 💬 0 📌 0

This has become whackamole now, source: www.youtube.com/watch?v=AXN-...

I ran the following command in Mac's terminal to get Chrome working with uBlock Origin:

```
open -a /Applications/Google\ Chrome.app --args --disable-features=ExtensionManifestV2Unsupported,ExtensionManifestV2Disabled
```

04.11.2025 20:31 👍 0 🔁 0 💬 0 📌 0
Preview
Building a tool to copy-paste share terminal sessions using Claude Code for web Plus Living dangerously with Claude, and prompt injection risks for ChatGPT Atlas

Building a tool to copy-paste share terminal sessions using Claude Code for web
open.substack.com/pub/simonw/p...

24.10.2025 20:07 👍 2 🔁 0 💬 0 📌 0
Preview
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference Today's LLM inference systems treat individual engines and queries independently for simplicity, but this causes significant resource inefficiencies. While there are proposals to avoid redundant compu...

LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
arxiv.org/abs/2510.09665

18.10.2025 22:34 👍 1 🔁 0 💬 0 📌 0
Preview
Understanding Memory Management on Hardware-Coherent Platforms | NVIDIA Technical Blog If you’re an application developer or a cluster administrator, you’ve likely seen how non-uniform memory access (NUMA) can impact system performance. When an application is not fully NUMA-aware…

Understanding Memory Management on Hardware-Coherent Platforms | NVIDIA Technical Blog developer.nvidia.com/blog/underst...

17.10.2025 20:12 👍 1 🔁 0 💬 0 📌 0
Preview
KubeCon + CloudNativeCon North America 2025: LLMs on Kubernetes: Squeeze 5x GPU Effic... View more about this event at KubeCon + CloudNativeCon North America 2025

Join me and Yuhan Liu for our talk at the upcoming #Kubecon NA 2025 in Atlanta: sched.co/27FcQ we will talk about increasing efficency while serving #LLMs using #vLLM & #LMCache!

15.10.2025 22:29 👍 1 🔁 0 💬 0 📌 0

Using Claude Code but with Github Copilot hosted Claude models:
github.com/surajssd/dot...

TFS @nilekh.bsky.social

14.10.2025 22:06 👍 1 🔁 0 💬 0 📌 0
Preview
NVIDIA Blackwell Leads on SemiAnalysis InferenceMAX v1 Benchmarks | NVIDIA Technical Blog SemiAnalysis recently launched InferenceMAX v1, a new open source initiative that provides a comprehensive methodology to evaluate inference hardware performance. Published results demonstrate that…

NVIDIA Blackwell Leads on SemiAnalysis InferenceMAX v1 Benchmarks | NVIDIA Technical Blog developer.nvidia.com/blog/nvidia-...

14.10.2025 06:38 👍 0 🔁 0 💬 0 📌 0
Claude Code: Tips and Tricks
Claude Code: Tips and Tricks YouTube video by Anand Tyagi

Claude Code: Tips and Tricks

youtu.be/HSkLeECsBcw?...

13.10.2025 22:54 👍 0 🔁 0 💬 0 📌 0