Rost Glukhov (@rosgluk)

llama.cpp Quickstart with CLI and Server Install llama.cpp, run GGUF models with llama-cli, and serve OpenAI-compatible APIs using llama-server. Key flags, examples, and tuning tips with a short commands cheatsheet

llama.cpp Quickstart with CLI and Server

#Cheatsheet #GGUF #AI #LLM #DevOps #OpenAI #API #SelfHosting #CUDA #Prometheus #llama.cpp

https://www.glukhov.org/llm-hosting/llama-cpp/

12.03.2026 09:11 👍 0 🔁 0 💬 0 📌 0

Rust vs Python for AI Development: A Comprehensive Comparison A comprehensive comparison of Rust and Python for AI development, covering architecture, performance, ecosystem maturity, and real-world use cases in machine learning and data processing.

Rust vs Python for AI Development: A Comprehensive Comparison

#Rust #Python #AI development #machine learning #performance comparison

https://dasroot.net/posts/2026/02/rust-vs-python-ai-development-comparison/

12.03.2026 00:02 👍 0 🔁 1 💬 0 📌 0

OpenCode Quickstart: Install, Configure, and Use the Terminal AI Coding Agent A practical OpenCode quickstart for developers: install and verify, connect models/providers, run CLI workflows, use the server + JS SDK, and keep a short cheatsheet.

OpenCode Quickstart: Install, Configure, and Use the Terminal AI Coding Agent

#Cheatsheet #ai-devtools #coding-agents #terminal #developer-tools #llm-tools #LLM #AI #AI Coding #Dev #DevOps

https://www.glukhov.org/ai-devtools/opencode/

11.03.2026 11:43 👍 0 🔁 0 💬 0 📌 0

Rust and WebAssembly for AI Interfaces: A 2026 Perspective Explore how Rust and WebAssembly enable secure, high-performance AI interfaces in 2026. Learn to build browser-based AI apps using Monty, wasm-pack, and real-world case studies like docfind and Bevy.

Rust and WebAssembly for AI Interfaces: A 2026 Perspective

#Rust #WebAssembly #AI Interfaces #Monty #wasm-pack

https://dasroot.net/posts/2026/02/rust-webassembly-ai-interfaces-2026/

10.03.2026 23:29 👍 0 🔁 0 💬 0 📌 0

Airtable for Developers & DevOps - Plans, API, Webhooks, and Go/Python Examples Deep research guide to Airtable - what it is, core features, Free plan limits and implications, key competitors, and production-ready DevOps integration patterns with runnable Go and Python examples (CRUD, pagination, rate limits, batching, webhooks).

Airtable for Developers & DevOps - Plans, API, Webhooks, and Go/Python Examples

#Cloud #Hosting #Dev #DevOps #Go #Golang #Python #Integration #AI #API

https://www.glukhov.org/data-infrastructure/integrations/airtable-for-developers-and-devops/

10.03.2026 23:28 👍 0 🔁 0 💬 0 📌 0

Comparing LLMs performance on Ollama on 16GB VRAM GPU Benchmark of 14 LLMs on RTX 4080 16GB with Ollama 0.15.2. Compare tokens/sec, VRAM usage, and CPU offloading for GPT-OSS, Qwen3, Qwen3.5, Mistral, and more.

Comparing LLMs performance on Ollama on 16GB VRAM GPU

#LLM #Ollama #NVidia #Hardware #Self-Hosting #Open Source #DeepLearning #AI

https://www.glukhov.org/llm-performance/benchmarks/choosing-best-llm-for-ollama-on-16gb-vram-gpu/

10.03.2026 08:44 👍 0 🔁 0 💬 0 📌 0

LLM Performance and PCIe Lanes: Key Considerations LLM Performance and PCIe Lanes: Key Considerations

LLM Performance and PCIe Lanes: Key Considerations

#Self-Hosting #LLM #Performance #AI #Ollama #Hardware #DeepLearning

https://www.glukhov.org/llm-performance/hardware/llm-performance-and-pci-lanes/

09.03.2026 10:31 👍 1 🔁 0 💬 0 📌 0

Terminal Multiplexers: tmux vs Zellij – A Comprehensive Comparison A detailed comparison of tmux and Zellij, highlighting architecture, features, performance, and usability to help developers choose the best terminal multiplexer for their workflow.

Terminal Multiplexers: tmux vs Zellij – A Comprehensive Comparison

#tmux #Zellij #terminal multiplexer #DevOps tools #command line interface

https://dasroot.net/posts/2026/02/terminal-multiplexers-tmux-vs-zellij-comparison/

08.03.2026 22:16 👍 1 🔁 0 💬 0 📌 0

Search vs Deepsearch vs Deep Research Search vs Deepsearch vs Deep Research

Search vs Deepsearch vs Deep Research

#Cloud #LLM #AI #Perplexica

https://www.glukhov.org/rag/architecture/search-vs-deepsearch-vs-deep-research/

08.03.2026 12:49 👍 0 🔁 0 💬 0 📌 0

Markdown Code Blocks: Complete Guide with Syntax, Languages & Examples Complete guide to Markdown code blocks: fenced blocks, inline code, syntax highlighting, diff formatting, language identifiers, filename display, and Hugo-specific features.

Markdown Code Blocks: Complete Guide with Syntax, Languages & Examples

#Hugo #Cheatsheet #Markdown

https://www.glukhov.org/documentation-tools/markdown/markdown-codeblocks/

07.03.2026 23:38 👍 0 🔁 0 💬 0 📌 0

Markdown Cheatsheet: Syntax, Formatting & Structure Quick Reference Quick reference to Markdown syntax: headings, bold, italic, lists, links, images, tables, code blocks, blockquotes, task lists, math, and more — with examples for every element.

Markdown Cheatsheet: Syntax, Formatting & Structure Quick Reference

#Hugo #Cheatsheet #Markdown

https://www.glukhov.org/documentation-tools/markdown/markdown-cheatsheet/

07.03.2026 08:14 👍 0 🔁 0 💬 0 📌 0

Monitor LLM Inference in Production (2026): Prometheus & Grafana for vLLM, TGI, llama.cpp Learn how to monitor LLM inference in production using Prometheus and Grafana. Track p95 latency, tokens/sec, queue duration, and KV cache usage across vLLM, TGI, and llama.cpp. Includes PromQL examples, dashboards, alerts, Docker & Kubernetes setups.

Monitor LLM Inference in Production (2026): Prometheus & Grafana for vLLM, TGI, llama.cpp

#Monitoring #Hosting #Self-Hosting #LLM #AI #DevOps #Docker #K8S #Prometheus #Grafana #observability #kubernetes #vllm

https://www.glukhov.org/observability/monitoring-llm-inference-prometheus-grafana/

06.03.2026 11:28 👍 0 🔁 0 💬 0 📌 0

Docker Model Runner vs Ollama (2026): Which Is Better for Local LLMs? Trying to choose between Docker Model Runner and Ollama? We compare performance, GPU support, API compatibility, Docker integration and production readiness to help you decide fast.

Docker Model Runner vs Ollama (2026): Which Is Better for Local LLMs?

#Docker #Ollama #LLM #AI #DevOps #Self-Hosting #Linux #API #NVidia

https://www.glukhov.org/llm-hosting/comparisons/docker-model-runner-vs-ollama-comparison/

05.03.2026 07:15 👍 0 🔁 0 💬 0 📌 0

Ollama vs vLLM vs LM Studio: Best Way to Run LLMs Locally in 2026? Choosing the best way to run LLMs locally? Compare Ollama, vLLM, LM Studio, LocalAI and 8+ tools by API support, hardware compatibility, tool calling, and production readiness.

Ollama vs vLLM vs LM Studio: Best Way to Run LLMs Locally in 2026?

#LLM #AI #Ollama #vllm #Privacy #Open Source #Self-Hosting #Docker #API #Machine Learning #RAG

https://www.glukhov.org/llm-hosting/comparisons/hosting-llms-ollama-localai-jan-lmstudio-vllm-comparison/

04.03.2026 22:31 👍 2 🔁 0 💬 0 📌 0

OpenClaw Quickstart: Install with Docker (Ollama GPU or Claude + CPU) Install OpenClaw in minutes with Docker. Run locally with Ollama (GPU) or use Claude Sonnet 4.6 (CPU-only). Includes setup, model config, testing, and troubleshooting.

OpenClaw Quickstart: Install with Docker (Ollama GPU or Claude + CPU)

#Hosting #Self-Hosting #LLM #AI #Ollama #Docker #Open Source #RAG #OpenClaw

https://www.glukhov.org/ai-systems/openclaw/quickstart/

04.03.2026 11:16 👍 0 🔁 0 💬 0 📌 0

Garage vs MinIO vs AWS S3: Object Storage Comparison and Feature Matrix Compare MinIO, Garage, and AWS S3 for object storage. Feature matrix, cost model, operational complexity, and when to choose each—managed S3, self-hosted Garage, or MinIO with broad S3 parity.

Garage vs MinIO vs AWS S3: Object Storage Comparison and Feature Matrix

#Minio #Garage #S3 #AWS #Hosting #Self-Hosting #DevOps #Open Source

https://www.glukhov.org/data-infrastructure/object-storage/garage-vs-minio-vs-s3/

04.03.2026 00:15 👍 1 🔁 0 💬 0 📌 0

Implementing Workflow Applications with Temporal in Go: A Complete Guide Learn how to implement workflow applications with Temporal in Go using the official Temporal Go SDK. This end-to-end guide covers configuration, examples, deployment, troubleshooting, and best practices for building scalable, resilient workflows.

Implementing Workflow Applications with Temporal in Go: A Complete Guide

#Go #Golang #devops #coding #LLM #Architecture #AI Coding #Dev #Open Source

https://www.glukhov.org/post/2026/03/workflow-applications-temporal-in-go/

03.03.2026 06:34 👍 2 🔁 0 💬 0 📌 0

Garage - S3 compatible object storage Quickstart Garage quickstart for S3-compatible object storage. Run Garage with Docker, set layout and replication, add TLS via reverse proxy, create buckets and keys, and apply production tips for self-hosted storage.

Garage - S3 compatible object storage Quickstart

#Self-Hosting #s3 #object-storage #self-hosted #backup #observability

https://www.glukhov.org/data-infrastructure/object-storage/garage-quickstart/

02.03.2026 02:51 👍 0 🔁 0 💬 0 📌 0

Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production A deep, production-minded guide to observability for LLM systems, covering LLM metrics, distributed tracing, logs, profiling, synthetic testing, SLOs, and an LLM observability tools comparison (Prometheus, Grafana, OpenTelemetry, Jaeger/Tempo, Loki/ELK, DCGM, and major APM platforms).

Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production

#LLM #Prometheus #Grafana #Kubernetes #Monitoring #AI #DevOps #Hosting

https://www.glukhov.org/observability/observability-for-llm-systems/

01.03.2026 03:54 👍 1 🔁 0 💬 0 📌 0

Using Go to Build RAG Systems: WeKnora Deep Dive Deep dive into WeKnora, a Go-based RAG framework for building scalable, secure, and high-performance retrieval-augmented generation systems with advanced agent capabilities and hybrid retrieval strategies.

Using Go to Build RAG Systems: WeKnora Deep Dive

#Go #RAG #WeKnora #Agent Skills #Hybrid Retrieval

https://dasroot.net/posts/2026/02/using-go-build-rag-systems-weknora-deep-dive/

28.02.2026 12:20 👍 0 🔁 0 💬 0 📌 0

Chunking Strategies in RAG Comparison: Alternatives, Trade‑offs, and Examples A rigorous, engineering‑first guide to chunking for RAG: fixed vs semantic vs hierarchical chunking, evaluation dimensions, decision matrix, and runnable Python implementations with FAISS/Chroma/Weaviate and OpenAI embeddings.

Chunking Strategies in RAG Comparison: Alternatives, Trade‑offs, and Examples

#RAG #Vector Databases #LLM Performance #DevOps #Hardware #Python #LLM #AI #AI Coding #API #Dev #Coding

https://www.glukhov.org/rag/retrieval/chunking-strategies-in-rag/

28.02.2026 02:37 👍 1 🔁 0 💬 0 📌 0

Ollama CLI Cheatsheet: ls, serve, run, ps + commands (2026 update) Ollama CLI cheatsheet: ollama serve command, ollama run command examples, ollama ps, and model management.

Ollama CLI Cheatsheet: ls, serve, run, ps + commands (2026 update)

#Linux #Cheatsheet #Self-Hosting #LLM #AI #Ollama #DevOps #Python

https://www.glukhov.org/llm-hosting/ollama/ollama-cheatsheet/

27.02.2026 13:02 👍 1 🔁 0 💬 0 📌 0

Writing High-Throughput Network Clients in Go Learn how to build high-throughput network clients in Go using concurrency, non-blocking I/O, and modern libraries like gRPC and HTTP/2 for optimal performance and scalability.

Writing High-Throughput Network Clients in Go

#Go #network clients #high-throughput #gRPC #HTTP/2

https://dasroot.net/posts/2026/02/writing-high-throughput-network-clients-go/

26.02.2026 08:44 👍 0 🔁 0 💬 0 📌 0

Running LLMs Locally for Data Privacy Learn how to run large language models locally for enhanced data privacy. This guide covers hardware requirements, software frameworks, quantization techniques, and security measures to protect sensitive data in on-premises deployments.

Running LLMs Locally for Data Privacy

#LLM #NVIDIA GPU #Google TPU #PyTorch #Hugging Face Transformers #Model Quantization #Data Privacy #Secure Communication #Access Control #TLS 1.3

https://dasroot.net/posts/2026/02/running-llms-locally-data-privacy/

25.02.2026 21:55 👍 1 🔁 0 💬 0 📌 0

How to Configure Desktop Launchers on Ubuntu 24 with Standard Icons Create and edit .desktop launchers on Ubuntu 24.04: Icon, Exec, locations, and freedesktop.org spec. Put launchers on Desktop or in applications menu, with Standard Ubuntu Icons

How to Configure Desktop Launchers on Ubuntu 24 with Standard Icons

#Linux #Cheatsheet #bash #Dev #Howtos

https://www.glukhov.org/post/2026/02/configure-desktop-launchers-ubuntu-24/

25.02.2026 06:45 👍 0 🔁 0 💬 0 📌 0

Agentic AI and Security: A Deep Technical Analysis in 2026 A deep technical analysis of Agentic AI security in 2026, covering critical risks, frameworks like OWASP AIVSS and MAESTRO, practical implementation strategies, and future governance challenges for autonomous AI systems.

Agentic AI and Security: A Deep Technical Analysis in 2026

#Agentic AI #AI Security #OWASP AIVSS #MAESTRO #Observability

https://dasroot.net/posts/2026/02/agentic-ai-security-deep-technical-analysis-2026/

24.02.2026 11:13 👍 0 🔁 0 💬 0 📌 0

Ansible vs Puppet vs Chef vs SaltStack: Configuration Management Tool Comparison Comprehensive comparison of Ansible, Puppet, Chef, and SaltStack for configuration management. Explore architecture, performance, features, and ideal use cases to choose the right tool for your infrastructure automation needs.

Ansible vs Puppet vs Chef vs SaltStack: Configuration Management Tool Comparison

#Ansible #Puppet #Chef #SaltStack #Configuration Management

https://dasroot.net/posts/2026/02/ansible-vs-puppet-vs-chef-vs-saltstack-configuration-management/

24.02.2026 02:06 👍 0 🔁 0 💬 0 📌 0

Retrieval-Augmented Generation (RAG) Tutorial: Architecture, Implementation, and Production Guide Step-by-step RAG tutorial: build retrieval-augmented generation systems with vector databases, hybrid search, reranking, and web search. Architecture, implementation, and production best practices.

Retrieval-Augmented Generation (RAG) Tutorial: Architecture, Implementation, and Production Guide:
www.glukhov.org/rag/
#AI #LLM #RAG #Embeddings #Reranking #VectorDatabase

23.02.2026 09:57 👍 0 🔁 0 💬 0 📌 0

Observability: Monitoring, Metrics, Prometheus & Grafana Guide Practical observability guide: monitoring vs observability, Prometheus metrics, Grafana dashboards, alerting, Kubernetes and AI/LLM monitoring, and production best practices.

Observability: Monitoring, Metrics, Prometheus & Grafana Guide:
www.glukhov.org/observability/
#Monitoring #Observability #Prometheus #Grafana #Kubernetes #DevOps

23.02.2026 04:01 👍 0 🔁 0 💬 0 📌 0

API-First Development and Contract Testing: Modern Practices and Tools Learn modern API-First Development and Contract Testing practices for microservices. Discover how OpenAPI and Pact ensure reliable, scalable systems with faster development cycles and fewer integration issues.

API-First Development and Contract Testing: Modern Practices and Tools

#API-First Development #Contract Testing #OpenAPI #Pact #Microservices

https://dasroot.net/posts/2026/02/api-first-development-contract-testing/

22.02.2026 05:41 👍 0 🔁 0 💬 0 📌 0

Rost Glukhov

Latest posts by Rost Glukhov @rosgluk