Wesley Pasfield (@wesleypasfield)

Self-Optimizing Football Chatbot Guided by Domain Experts on Databricks Learn how to build and continuously optimize a defensive coordinator chatbot guided by SME feedback with Databricks Agent Framework, Unity Catalog tools, and MLflow. Surface opponent tendencies by dow...

Integrating domain expert feedback into agents is the path to production, and MLflow is the way. Check out this end to end example on Databricks www.databricks.com/blog/self-op...

04.02.2026 04:36 👍 0 🔁 0 💬 0 📌 0

Not to mention the domestic challenges from AI driven inequality caused by our current incentive structure

11.02.2025 18:01 👍 1 🔁 0 💬 0 📌 0

Yeah I’d love to see the equivalent from ChatGPT. While this data is awesome I think very important to emphasize this is adoption by field, not potential capabilities.

11.02.2025 17:24 👍 0 🔁 0 💬 0 📌 0

Opinion | The Dangerous A.I. Nonsense That Trump and Biden Fell For This is not a Sputnik moment. It’s way past that.

www.nytimes.com/2025/02/05/o...

This is such an amazing opinion piece! Time to pivot regulation to logical areas

09.02.2025 21:59 👍 1 🔁 0 💬 0 📌 0

Certainly not training compute, which is what current regulation specifies. Primary point is that compute required for specific capabilities will be a moving target, exemplified by efficiencies shown recently by deepseek

07.02.2025 17:39 👍 1 🔁 0 💬 1 📌 0

Powering LLM Regulation through Data: Bridging the Gap from Compute Thresholds to Customer Experiences The rapid advancement of Large Language Models (LLMs) has created a critical gap in consumer protection due to the lack of standardized certification processes for LLM-powered Artificial Intelligence ...

My NeurIPS paper on LLM regulation through data and evaluation finally up on arxiv - I feel stronger about this approach with the shift to reasoning and away from compute as a proxy for performance arxiv.org/abs/2502.03472

07.02.2025 15:38 👍 2 🔁 0 💬 1 📌 0

With all the Deepseek news, thought I’d reshare my NeurIPS paper on data and evaluation based regulation as an alternative or complement to compute. The shift to reasoning makes this even more relevant: wesleypasfield.com/pasfield_neu...

31.01.2025 19:58 👍 0 🔁 0 💬 0 📌 0

Thank you for sharing this perspective broadly! This is very much in line with my paper at the Neurips RegML Workshop. Data + Evaluation needs to be our focus: wesleypasfield.com/pasfield_neu... "Powering LLM Regulation through Data: Bridging the
Gap from Compute Thresholds to Customer
Experiences

30.01.2025 17:24 👍 0 🔁 0 💬 0 📌 0

I keep hearing that AI will automate tasks and let people focus on higher leverage tasks…but clearly intention is for AI to try and go up the chain for higher leverage tasks. I think we are hand waving away what our intention is for humans in this agent driven future

15.01.2025 04:26 👍 3 🔁 0 💬 0 📌 0

Like AGI, depends who you ask!

12.01.2025 17:00 👍 0 🔁 0 💬 0 📌 0

Has anyone seen an estimate on what will happen to human labor in knowledge work if agentic LLM solutions are successful? I’ve seen the market opportunity from the VC side (10x SaaS) but presumably that would come at the direct cost of human employment?

08.01.2025 18:19 👍 0 🔁 0 💬 0 📌 0

I don’t have a great answer but I think test time is especially bad because it gives connotation that it is just for test sets (not live applications), I think especially confusing for non tech folks

08.01.2025 02:17 👍 3 🔁 0 💬 1 📌 0

It’s easy to lead the witness - it’s important to be as neutral as possible and ask for analysis/alternatives to ensure you are not just getting examples of being agreeable from the models

04.01.2025 21:25 👍 1 🔁 0 💬 0 📌 0

I’ve found the newsletter personally useful to keep up with research at a greater breadth than before. I’m using Claude for the paper identification and summary, and everything is serverless on AWS so quite cheap. I hope folks enjoy!

03.01.2025 16:49 👍 0 🔁 0 💬 0 📌 0

AI Research Papers Newsletter - Wesley Pasfield

I put together an automated newsletter featured in Data Elixir and Data Science Weekly posts that identifies interesting AI research and summarizes the content, sending out twice a week.

You can sign up here: wesleypasfield.com/aipapers/

And check out the code here: github.com/WesleyPasfie...

03.01.2025 16:47 👍 1 🔁 0 💬 1 📌 0

Noteworthy AI Research Papers of 2024 (Part One) Six influential AI papers from January to June

Happy New Year! To kick off the year, I've finally been able to format and upload the draft of my AI Research Highlights of 2024 article.
It covers a variety of topics, from mixture-of-experts models to new LLM scaling laws for precision:

01.01.2025 14:12 👍 66 🔁 15 💬 2 📌 0

Easy prediction for 2025 is that the gains in AI model capability will continue to grow much faster than (a) the vast majority of people’s understanding of what AI can do & (b) organizations’ ability to absorb the pace of change. Social change is much slower than technological change.

01.01.2025 12:26 👍 149 🔁 25 💬 6 📌 5

The Turning Point: Agentic AI, Inference Optimization, and Society's Next Challenge December 2024 has been busy with LLM updates and has shown us the path forward

I've been trying to write more on Substack - I just published a post on how some of the more recent LLM trends (agents, test-time inference) could impact society moving forward, and what we should do about it:

open.substack.com/pub/wesleypa...

26.12.2024 16:05 👍 2 🔁 0 💬 0 📌 0

LLM hallucinations are a feature AND a bug

24.12.2024 15:11 👍 1 🔁 0 💬 0 📌 0

OpenAI o3 Breakthrough High Score on ARC-AGI-Pub OpenAI o3 scores 75.7% on ARC-AGI public leaderboard.

“This is not just incremental progress; it is new territory, and it demands serious scientific attention.” As inference enhancements drive LLM performance optimizations, regulatory efforts that don’t directly measure model output will be less and less relevant arcprize.org/blog/oai-o3-...

20.12.2024 18:57 👍 2 🔁 0 💬 0 📌 0

I’ll be at the RegML workshop tomorrow at Neurips in East Meeting Room 13 come say hi!

15.12.2024 01:15 👍 0 🔁 0 💬 0 📌 0

NeurIPS 2024 Pack Join the conversation

I just updated the NeurIPS starter pack with many more attendees

Let me know if you'd like to be added

go.bsky.app/BuJXg5q

#NeurIPS2024 #NeurIPS

13.12.2024 18:40 👍 19 🔁 4 💬 7 📌 1

The death of the stubborn developer How stubborn developers are getting left behind by refusing to adopt chat-oriented programming (CHOP) as their primary development approach.

Strongly agree with the primary takeaway from this argument “You are getting left behind if you do not adopt chat-based programming as your primary modality.” Not sure if chat will always be the primary medium but LLM assisted/driven development is here to stay sourcegraph.com/blog/the-dea...

12.12.2024 16:35 👍 1 🔁 0 💬 0 📌 0

Paper contents around using data for domain specific evaluation to enable more logical LLM regulation

10.12.2024 21:50 👍 0 🔁 0 💬 0 📌 0

I think this extends to regulatory efforts as well. Better benchmarks / means of evaluation will lead to more logical regulation. That is the core principle of this paper I will present at Neurips later this week

10.12.2024 21:50 👍 0 🔁 0 💬 1 📌 0

They say the ideal use case is “narrow sets of complex tasks led by experts” - thoughts on if that means a singular outcome with a lot of complicated steps, or perhaps a wider set of outcomes but a very defined problem space (or something else)? Having a hard time interpreting that on the surface

06.12.2024 21:24 👍 0 🔁 0 💬 0 📌 0

I think it’s about the ways the humans are wrong vs AI based systems rather than accountability. We can trace back why the human made the decision whether they are accountable or not, AI being wrong feels much more random

05.12.2024 03:37 👍 0 🔁 0 💬 1 📌 0

Number 4 is often the real challenge on an enterprise specific problem as it’s not easy to measure the output of LLMs based on generalized nature, or the human experts that would be compared are the ones doing the evaluation

05.12.2024 03:35 👍 0 🔁 0 💬 0 📌 0

Amazon Bedrock Knowledge Bases now supports RAG evaluation (Preview) - AWS Discover more about what's new at AWS with Amazon Bedrock Knowledge Bases now supports RAG evaluation (Preview)

Very happy to see this release from AWS. This makes Bedrock very compelling and the type of practical offering that can help LLM based experiences exit the experimental phase into prod for a specific domain/application aws.amazon.com/about-aws/wh...

03.12.2024 18:36 👍 0 🔁 0 💬 0 📌 0

Have to remind yourself it’s a token predictor trained on data that very infrequently includes “I don’t know.” I often say “it’s ok to say I don’t know if you’re uncertain” which anecdotally seems to mitigate a bit, but not sure how much impact or whether that negatively impacts overall response

03.12.2024 04:03 👍 0 🔁 0 💬 0 📌 0

Wesley Pasfield

Latest posts by Wesley Pasfield @wesleypasfield