The Questions AI Coding Agents Are Forcing Me to Ask
zsiegel.com/the-question...
The Questions AI Coding Agents Are Forcing Me to Ask
zsiegel.com/the-question...
Inspiring weekend cooking with Claude Code. Sharing some thoughts after spending $37 in tokens and building the beginning of my "personal" MCP server.
zsiegel.com/why-2025-fee...
Really enjoyed this podcast with one of the founders of @tailscale.com
stratechery.com/2025/an-inte...
This is such an awesome part of uv.
I used to really dislike and not understand python build/package tooling but now I feel like uv has solved a lot of the problems I had with previous tooling.
Very exciting and running great via Ollama! Trying it out today replacing qwen2.5 coder
Spent some time recreating GPT alongside the video from @karpathy.bsky.social using the Apple MLX machine learning framework.
I found it to be a fun exercise and learned a ton!
github.com/zsiegel/mlx-...
The chart illustrates two sets of comparisons for large language models (LLMs) in multimodal and text-to-image benchmarks: Left Panel: Performance vs. Model Size β’ X-axis: Number of LLM Parameters (in billions). β’ Y-axis: Average performance on four multimodal understanding benchmarks. Key Observations: β’ Janus-Pro-7B: Achieves the highest average performance (~64) with 7 billion parameters. β’ LLaVA-v1.5-7B: Performs slightly lower (~60), with similar parameters. β’ TokenFlow-XL also shows notable performance at a higher parameter scale (>10B). β’ Smaller models, such as Show-o and Janus-Pro-1B, have significantly reduced performance scores (~46β54). Right Panel: Instruction-Following Benchmarks (GenEval and DPG-Bench) Accuracy (Y-axis): β’ GenEval: β’ Top-performing models: Janus-Pro-7B (80%), SDXL (~67%). β’ Lowest-performing model: PixArt-Ξ± (48%). β’ DPG-Bench: β’ Best performance: Janus-Pro-7B (84.2%) and SDXL (~83.5%). β’ Other models like Emu3-Gen (~71.1%) perform less consistently. Key Takeaways: 1. Janus-Pro Family consistently outperforms other models across both understanding and generation tasks, emphasizing its robustness. 2. Model size correlates positively with performance in multimodal understanding tasks. However, some smaller models (e.g., LLaVA) deliver competitive results in specific benchmarks.
π Alert! DeepSeek Janus-Pro-7B
Itβs multimodal and outperforms Dalle-E and StableDiffusion
Probably the biggest feature is itβs ability to generate text in an image that actually makes sense
They be cooking, Iβm here for whatever is served
huggingface.co/deepseek-ai/...
Very cool just ordered one!
Anyone with experience using n8n or langchain?
Looking to run my own self hosted system for automations and agents. Seems like n8n has all the right integrations and then langchain has more AI sauce. Thoughts?
The biggest difference I notice when using deepseek-r1 for coding tasks is that it examines and uses existing code so much better than previous LLMs.
That is a massive win for most developers who are working in existing large codebases.
I love the idea of this and thinking about running my own MCP server to expose things both privately and publicly for various use cases.
The extra money spent on 64GB ram on my M4 Max is very well spent.
Being able to run larger LLM models locally via Ollama for code assistance is wonderful.
Qwen coder and Deepseek R1 are both excellent for my daily use cases.
Been debugging an issue for 48 where our apps built from CI pipelines break at runtime. But when we push to the App Store manually via a local machine it works.
Finally realized our CI machines are running a different version of Xcode than what we all are developing on. π« π« π«
Been polishing up a "home AI" project that lets me use Llama 3.2 using Ollama to find anything in my personal documents.
Been a fun project trying out OCR tech, LLM tool calling, and different kinds of search and RAG techniques!