Christopher Finlan's Avatar

Christopher Finlan

@cmfinlan

19
Followers
20
Following
32
Posts
13.11.2024
Joined
Posts Following

Latest posts by Christopher Finlan @cmfinlan

Fabric Spark’s Native Execution Engine: What Speeds Up, What Falls Back, and What to Watch You have been running Spark on the JVM for years. It works. Your pipelines finish before the SLA alarm fires, your data scientists get their DataFrames, and you have learned to live with the garbage collector the way one learns to coexist with a roommate who occasionally rearranges all the furniture at 3 AM. Then Microsoft shipped the Native Execution Engine for Fabric Spark, and the pitch is seductive: swap the JVM's row-at-a-time processing for a vectorized C++ execution layer built on Meta's Velox and Apache Gluten, get up to 6x faster query performance on compute-heavy workloads, change zero lines of code, pay nothing extra.

Fabric Spark's Native Execution Engine: What Speeds Up, What Falls Back, and What to Watch

05.03.2026 15:58 👍 0 🔁 0 💬 0 📌 0
Open Mirroring + OneLake: Spark patterns that keep latency from eating your weekends Open Mirroring is deceptively simple to set up and tricky to run well with Spark at scale. Here are the architecture choices, anti-patterns, and validation checks that keep your pipelines from falling apart in production.

Open Mirroring + OneLake: Spark patterns that keep latency from eating your weekends

Open Mirroring is deceptively simple to set up and tricky to run well with Spark at scale. Here are the architecture choices, anti-patterns, and validation checks that keep your pipelines from falling apart in…

05.03.2026 15:53 👍 0 🔁 0 💬 0 📌 0
What “Execute Power Query Programmatically” Means for Fabric Spark Teams What “Execute Power Query Programmatically” Means for Fabric Spark Teams Somewhere in a Fabric workspace right now, two teams are maintaining the same transformation twice. The BI team owns it in Power Query. The Spark team rewrote it in PySpark so a notebook could run it on demand. Both versions work. Both versions drift. Both versions break at different times. That was normal. Microsoft’s new Execute Query API (preview) is the first real shot at ending that duplication. It lets you execute Power Query (M) through a public REST API from notebooks, pipelines, or any HTTP client, then stream results back in Apache Arrow format.

What “Execute Power Query Programmatically” Means for Fabric Spark Teams

What “Execute Power Query Programmatically” Means for Fabric Spark Teams Somewhere in a Fabric workspace right now, two teams are maintaining the same transformation twice. The BI team owns it in Power Query. The Spark team…

04.03.2026 15:50 👍 1 🔁 0 💬 0 📌 0
What the February 2026 Fabric Influencers Spotlight means for your Spark team What the February 2026 Fabric Influencers Spotlight means for your Spark team Microsoft published its February 2026 Fabric Influencers Spotlight last week. Twelve community posts. MVPs and Super Users. Most people skim the list. Maybe bookmark a link. Move on. Don't. Three of those posts carry signals that should change how your Spark data-engineering team operates in production. Not next quarter. Now. Signal 1: Get your production code out of notebooks Matthias Falland's Fabric Friday episode makes the case plainly: notebooks are great for development but risky in production. That framing resonates with a lot of production teams—and for good reason.

What the February 2026 Fabric Influencers Spotlight means for your Spark team

What the February 2026 Fabric Influencers Spotlight means for your Spark team Microsoft published its February 2026 Fabric Influencers Spotlight last week. Twelve community posts. MVPs and Super Users. Most people skim…

03.03.2026 15:51 👍 0 🔁 0 💬 0 📌 0
Fabric Spark failure playbook: OneLake and mirroring under real production pressure A field-tested runbook for the failures that hide between Spark, OneLake, and mirroring in Microsoft Fabric: detection signals, triage sequences, and remediation tradeoffs from real production incidents.

Fabric Spark failure playbook: OneLake and mirroring under real production pressure

A field-tested runbook for the failures that hide between Spark, OneLake, and mirroring in Microsoft Fabric: detection signals, triage sequences, and remediation tradeoffs from real production incidents.

25.02.2026 15:51 👍 0 🔁 0 💬 0 📌 0
Preview
The most boring technology announcement might be the most important one for your Fabric Spark team Microsoft's new ODBC Driver for Fabric Data Engineering looks like a checkbox feature. It isn't. Here's what it means for production Spark teams, the migration risks nobody's talking about, and a concrete rollout checklist.

The most boring technology announcement might be the most important one for your Fabric Spark team

Microsoft's new ODBC Driver for Fabric Data Engineering looks like a checkbox feature. It isn't. Here's what it means for production Spark teams, the migration risks nobody's talking about, and a…

24.02.2026 15:53 👍 0 🔁 0 💬 1 📌 0
Preview
fabric-cicd Is Now Officially Supported — Here’s Your Production Deployment Checklist Three days ago, Microsoft promoted fabric-cicd from community project to officially supported tool. That Python library your team has been running in a “don’t look too closely at our deployment process” sort of way now carries Microsoft’s name and their support commitment. That shift matters. Your compliance team can stop asking “is this thing even supported?” You can open Microsoft support tickets when it breaks. The roadmap is no longer a volunteer effort, so features will land faster and bugs will get fixed on a schedule. But here’s where most teams trip.

fabric-cicd Is Now Officially Supported — Here's Your Production Deployment Checklist

23.02.2026 16:16 👍 0 🔁 0 💬 0 📌 0
Preview
The Spark-to-Warehouse Connector in Fabric: What It Does, How It Breaks, and When to Use It The Spark connector for Fabric Data Warehouse lets your notebooks read from and write to Warehouse tables with one line of code. Here's what it does, how it breaks, and when to use Warehouse vs Lakehouse as your serving layer.

The Spark-to-Warehouse Connector in Fabric: What It Does, How It Breaks, and When to Use It

20.02.2026 15:15 👍 0 🔁 0 💬 0 📌 0
Preview
Fabric Spark billing just got clearer. Here’s how to make the most of it. Microsoft split AI function consumption out of Spark billing into its own meter. Your total cost stays the same, but your alerting, thresholds, and capacity plans probably need updating. Here's a concrete checklist.

Fabric Spark billing just got clearer. Here's how to make the most of it.

19.02.2026 16:12 👍 0 🔁 0 💬 0 📌 0
Preview
From Demo to Production: ML-Enriched Power BI in Microsoft Fabric Microsoft's new end-to-end pattern for enriching Power BI reports with ML in Fabric looks clean in the demo. Here's the production migration checklist for Spark teams crossing the gap from notebook to ops.

From Demo to Production: ML-Enriched Power BI in Microsoft Fabric

Microsoft's new end-to-end pattern for enriching Power BI reports with ML in Fabric looks clean in the demo. Here's the production migration checklist for Spark teams crossing the gap from notebook to ops.

18.02.2026 15:58 👍 0 🔁 0 💬 0 📌 0
Preview
Microsoft Fabric Warehouse + Spark: Interoperability Patterns That Actually Work If you’ve spent any time in a Fabric workspace with both Data Engineering (Spark) and Data Warehouse, you’ve probably had this moment: Spark is great for big transformations, complex parsing, and “just let me code it.” The Warehouse is great for a curated SQL model, concurrency, and giving the BI world a stable contract. And yet… teams still end up copying data around like they’re paid by the duplicate. The good news: Fabric’s architectural bet is that OneLake + Delta is the contract surface across engines. That means you can design a pipeline where Spark and Warehouse cooperate instead of competing.

Microsoft Fabric Warehouse + Spark: Interoperability Patterns That Actually Work

17.02.2026 15:15 👍 0 🔁 0 💬 0 📌 0
Preview
What SQL database in Fabric actually means for your Spark pipelines There is a particular kind of excitement that sweeps through data engineering teams when Microsoft announces a new database option. It is the same mixture of curiosity and low-grade dread you might feel upon learning that your neighborhood is getting a new highway interchange. Useful, probably. Disruptive, definitely. Someone is going to have to figure out the on-ramps. SQL database in Fabric went generally available in November 2025. Built on the same SQL Database Engine that powers Azure SQL Database, it is the first fully SaaS-native operational database living inside Microsoft Fabric.

What SQL database in Fabric actually means for your Spark pipelines

There is a particular kind of excitement that sweeps through data engineering teams when Microsoft announces a new database option. It is the same mixture of curiosity and low-grade dread you might feel upon learning that your…

16.02.2026 15:54 👍 0 🔁 0 💬 0 📌 0
Preview
Microsoft Fabric Table Maintenance Optimization: A Cross-Workload Survival Guide Your Delta tables are drowning. Thousands of tiny Parquet files pile up after every streaming microbatch. Power BI dashboards stall on cold-cache queries. SQL analytics endpoints grind through fragmented row groups. And somewhere in the middle of the medallion architecture, a Spark job is rewriting perfectly good files because nobody told it they were already compacted. This is the small-file problem at scale — and in Microsoft Fabric, where a single Delta table can serve Spark, SQL analytics endpoint, Power BI Direct Lake, and Warehouse simultaneously, it becomes a cross-workload survival situation.

Microsoft Fabric Table Maintenance Optimization: A Cross-Workload Survival Guide

Your Delta tables are drowning. Thousands of tiny Parquet files pile up after every streaming microbatch. Power BI dashboards stall on cold-cache queries. SQL analytics endpoints grind through fragmented row groups.…

15.02.2026 16:34 👍 0 🔁 0 💬 0 📌 0
Preview
Optimizing Spark Performance with the Native Execution Engine (NEE) in Microsoft Fabric Spark tuning often starts with the usual suspects (shuffle volume, skew, join strategy, caching)… but sometimes the biggest win is simply executing the same logical plan on a faster engine. Microsoft Fabric’s Native Execution Engine (NEE) does exactly that: it keeps Spark’s APIs and control plane, but runs a large portion of Spark SQL / DataFrame execution on a vectorized C++ engine. What NEE is (and why it’s fast) NEE is a vectorized native engine that integrates into Fabric Spark and can accelerate many SQL/DataFrame operators without you rewriting your code.

Optimizing Spark Performance with the Native Execution Engine (NEE) in Microsoft Fabric

14.02.2026 16:00 👍 1 🔁 0 💬 0 📌 0
Preview
The Best Thing That Ever Happened to Your Spark Pipeline Is a SQL Database Here's a counterintuitive claim: the most important announcement for Fabric Spark teams in early 2026 has nothing to do with Spark. It's a SQL database. Specifically, it's the rapid adoption of SQL database in Microsoft Fabric—a fully managed, SaaS-native transactional database that went GA in November 2025 and has been quietly reshaping how production data flows into lakehouse architectures ever since. If you're a data engineer running Spark workloads in Fabric, this changes more than you think. The ETL Pipeline You Can Delete Most Spark data engineers have a familiar pain point: getting operational data from transactional systems into the lakehouse.

The Best Thing That Ever Happened to Your Spark Pipeline Is a SQL Database

13.02.2026 15:15 👍 1 🔁 0 💬 0 📌 0
Preview
Monitoring Spark Jobs in Real Time in Microsoft Fabric If Spark performance work is surgery, monitoring is your live telemetry. Microsoft Fabric gives you multiple monitoring entry points for Spark workloads: Monitor hub for cross-item visibility, item Recent runs for focused context, and application detail pages for deep investigation. This post is a practical playbook for using those together. Why this matters When a notebook or Spark job definition slows down, "run it again" is the most expensive way to debug. Real-time monitoring helps you: spot bottlenecks while jobs are still running isolate failures quickly compare behavior across submitters and workspaces…

Monitoring Spark Jobs in Real Time in Microsoft Fabric

If Spark performance work is surgery, monitoring is your live telemetry. Microsoft Fabric gives you multiple monitoring entry points for Spark workloads: Monitor hub for cross-item visibility, item Recent runs for focused context, and…

12.02.2026 15:37 👍 2 🔁 1 💬 0 📌 0
Preview
Running OpenClaw in Production: Reliability, Alerts, and Runbooks That Actually Work Agents are fun when they’re clever. They’re useful when they’re boring. If you’re running OpenClaw as an always-on assistant (cron jobs, health checks, publishing pipelines, internal dashboards), the failure mode isn’t usually “it breaks once.” It’s it flakes intermittently and you can’t tell if the problem is upstream, your network, your config, or the agent. This post is the operational playbook that moved my setup from “cool demo” to “production-ish”: fewer false alarms, faster debugging, clearer artifacts, and tighter cost control. The production baseline (don’t skip this) Before you add features, lock the boring stuff:

Running OpenClaw in Production: Reliability, Alerts, and Runbooks That Actually Work

11.02.2026 15:15 👍 0 🔁 0 💬 0 📌 0
Lakehouse Table Optimization: VACUUM, OPTIMIZE, and Z-ORDER If your Lakehouse tables are getting slower (or more expensive) over time, it’s often not "Spark is slow." It’s usually table layout drift: too many small files, suboptimal clustering, and old files piling up. In Fabric Lakehouse, the three table-maintenance levers you’ll reach for most are: OPTIMIZE: compacts many small files into fewer, larger files (and can apply clustering)

Lakehouse Table Optimization: VACUUM, OPTIMIZE, and Z-ORDER

If your Lakehouse tables are getting slower (or more expensive) over time, it’s often not "Spark is slow." It’s usually table layout drift: too many small files, suboptimal clustering, and old files piling up. In Fabric Lakehouse, the…

10.02.2026 20:33 👍 3 🔁 1 💬 0 📌 0
Preview
OneLake catalog in Microsoft Fabric: Explore, Govern, and Secure If your Fabric tenant has grown past "a handful of workspaces," the problem isn’t just storage or compute—it’s finding the right items, understanding what they are, and making governance actionable. That’s the motivation behind the OneLake catalog: a central hub to discover and manage Fabric content, with dedicated experiences for discovery (Explore), governance posture (Govern), and security administration (Secure). This post is a practical walk-through of what’s available today, with extra focus on what Fabric admins get in the Govern…

OneLake catalog in Microsoft Fabric: Explore, Govern, and Secure

If your Fabric tenant has grown past "a handful of workspaces," the problem isn’t just storage or compute—it’s finding the right items, understanding what they are, and making governance actionable. That’s the motivation behind the…

10.02.2026 15:00 👍 0 🔁 0 💬 0 📌 0
Preview
Understanding Spark Execution in Microsoft Fabric Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime. This post is a quick, practical refresher on the Spark execution model — with Fabric-specific pointers on where to observe jobs, stages, and tasks. 1) The execution hierarchy: Application → Job → Stage → Task In Spark, your code runs as a Spark application. When you run an action (for example, count(), collect(), or writing a table), Spark submits a job…

Understanding Spark Execution in Microsoft Fabric

Spark performance work is mostly execution work: understanding where the DAG splits into stages, where shuffles happen, and why a handful of tasks can dominate runtime. This post is a quick, practical refresher on the Spark execution model — with…

09.02.2026 21:22 👍 0 🔁 0 💬 0 📌 0
Preview
Fabric Spark Shuffle Tuning: AQE + partitions for Faster Joins Shuffles are where Spark jobs go to get expensive: a wide join or aggregation forces data to move across the network, materialize shuffle files, and often spill when memory pressure spikes. In Microsoft Fabric Spark workloads, the fastest optimization is usually the boring one: avoid the shuffle when you can, and when you can’t, make it smaller and better balanced. This post lays out a practical, repeatable approach you can apply in Fabric notebooks and Spark job definitions. 1) Start with the simplest win: avoid the shuffle If one side of your join is genuinely small (think lookup/dimension tables), use a broadcast join so Spark ships the small table to executors and avoids a full shuffle.

Fabric Spark Shuffle Tuning: AQE + partitions for Faster Joins

Shuffles are where Spark jobs go to get expensive: a wide join or aggregation forces data to move across the network, materialize shuffle files, and often spill when memory pressure spikes. In Microsoft Fabric Spark workloads, the…

06.02.2026 15:03 👍 1 🔁 1 💬 0 📌 0
Preview
OneLake Shortcuts + Spark: Practical Patterns for a Single Virtual Lakehouse If you’ve adopted Microsoft Fabric, there’s a good chance you’re trying to reduce the number of ‘copies’ of data that exist just so different teams and engines can access it. OneLake shortcuts are one of the core primitives Fabric provides to unify data across domains, clouds, and accounts by making OneLake a single virtual data lake namespace. For Spark users specifically, the big win is that shortcuts appear as folders in OneLake—so Spark can read them like any other folder—and Delta-format shortcuts in the Lakehouse Tables area can be surfaced as tables.

OneLake Shortcuts + Spark: Practical Patterns for a Single Virtual Lakehouse

If you’ve adopted Microsoft Fabric, there’s a good chance you’re trying to reduce the number of ‘copies’ of data that exist just so different teams and engines can access it. OneLake shortcuts are one of the core…

05.02.2026 15:02 👍 1 🔁 0 💬 0 📌 0
Preview
When ‘Native Execution Engine’ Doesn’t Stick: Debugging Fabric Environment Deployments with fabric-cicd If you’re treating Microsoft Fabric workspaces as source-controlled assets, you’ve probably started leaning on code-first deployment tooling (either Fabric’s built-in Git integration or community tooling layered on top). One popular option is the open-source fabric-cicd Python library, which is designed to help implement CI/CD automations for Fabric workspaces without having to interact directly with the underlying Fabric APIs. For most Fabric items, a ‘deploy what’s in Git’ model works well—until you hit a configuration that looks like it’s in source control, appears in deployment logs, but still doesn’t land in the target workspace.

When ‘Native Execution Engine’ Doesn’t Stick: Debugging Fabric Environment Deployments with fabric-cicd

If you’re treating Microsoft Fabric workspaces as source-controlled assets, you’ve probably started leaning on code-first deployment tooling (either Fabric’s built-in Git integration or…

03.02.2026 15:00 👍 1 🔁 1 💬 0 📌 0
Sparkwise: an “automated data engineering specialist” for Fabric Spark tuning Spark tuning has a way of chewing up time: you start with something that “should be fine,” performance is off, costs creep up, and suddenly you’re deep in configs, Spark UI, and tribal knowledge trying to figure out what actually matters. That’s why I’m excited to highlight sparkwise, an open-source Python package created by Santhosh Kumar Ravindran, one of my direct reports here at Microsoft. Santhosh built sparkwise to make Spark optimization in Microsoft Fabric less like folklore and more like a repeatable workflow: automated diagnostics, session profiling, and actionable recommendations to help teams drive better price-performance without turning every run into an investigation.

New OSS drop: Sparkwise (PyPI: sparkwise). Built by Santhosh Kumar Ravindran to help teams improve Fabric Spark price/perf with automated diagnostics + profiling. If you run Spark in Fabric, this will save you time and vCores.

05.01.2026 21:42 👍 0 🔁 0 💬 0 📌 0
Preview
Gil Gerard, Buck Rogers, and the Kind of Grief That Shows Up in December Gil Gerard's departure reminds us that some celebrities aren't just actors; they're the comforting echoes of our past. Buck Rogers was more than a show—it was a place that shaped our childhood optimism.

Gil Gerard, Buck Rogers, and the Kind of Grief That Shows Up in December

18.12.2025 02:05 👍 0 🔁 0 💬 0 📌 0
Build Your Own Spark Job Doctor in Microsoft Fabric Microsoft Fabric simplifies Spark workload management but diagnosing performance issues remains challenging. This post introduces the "Job Doctor," an AI tool that analyzes Spark telemetry to identify problems like skew or excessive shuffles, generates human-readable diagnoses, and suggests fixes. The implementation integrates with Azure AI for optimized Spark job management.

Build Your Own Spark Job Doctor in Microsoft Fabric

Microsoft Fabric simplifies Spark workload management but diagnosing performance issues remains challenging. This post introduces the "Job Doctor," an AI tool that analyzes Spark telemetry to identify problems like skew or excessive shuffles,…

05.12.2025 19:43 👍 0 🔁 0 💬 0 📌 0
Preview
Time to Automate: Why Sports Card Grading Needs an AI Revolution As I head to the National for the first time, this is a topic I have been thinking about for quite some time, and a recent video inspired me to put this together with help from ChatGPT’s o3 model doing deep research. Enjoy! Introduction: Grading Under the Microscope Sports card grading is the backbone of the collectibles hobby – a PSA 10 vs PSA 9 on the same card can mean thousands of dollars of difference in value. Yet the process behind those grades has remained stubbornly old-fashioned, relying on human eyes and judgment.

Time to Automate: Why Sports Card Grading Needs an AI Revolution

As I head to the National for the first time, this is a topic I have been thinking about for quite some time, and a recent video inspired me to put this together with help from ChatGPT’s o3 model doing deep research. Enjoy!…

29.07.2025 23:47 👍 2 🔁 0 💬 0 📌 0
Humans + Machines: From Co-Pilots to Convergence — A Friendly Response to Josh Caplan’s “Interview with AI” 1. Setting the Table Josh, I loved how you framed your conversation with ChatGPT-4o around three crisp horizons — 5, 25 and 100 years. It’s a structure that forces us to check our near-term expectations against our speculative impulses. Below I’ll walk through each horizon, point out where my own analysis aligns or diverges, and defend those positions with the latest data and research. 2. Horizon #1 (≈ 2025-2030): The Co-Pilot Decade Where we agree You write that “AI will write drafts, summarize meetings, and surface insights … accelerating workflows without replacing human judgment.” Reality is already catching up:

Humans + Machines: From Co-Pilots to Convergence — A Friendly Response to Josh Caplan’s “Interview with AI”

1. Setting the Table Josh, I loved how you framed your conversation with ChatGPT-4o around three crisp horizons — 5, 25 and 100 years. It’s a structure that forces us to check our near-term…

15.07.2025 03:12 👍 0 🔁 0 💬 0 📌 0
Preview
🎩 Retire Your Top Hat: Why It’s Time to Say Goodbye to “Whilst” There’s a word haunting documents, cluttering up chat messages, and lurking in email threads like an uninvited character from Downton Abbey. That word is whilst. Let’s be clear: no one in the United States says this unironically. Not in conversation. Not in writing. Not in corporate life. Not unless they’re also saying “fortnight,” “bespoke,” or “I daresay.” It’s Not Just Archaic—It’s Distracting In American English, whilst is the verbal equivalent of someone casually pulling out a monocle in a team meeting. It grabs attention—but not the kind you want.

Please - don't use whilst.

09.07.2025 16:12 👍 0 🔁 0 💬 0 📌 0
The Rise and Heartbreak of Antonio McDyess: A Superstar’s Path Cut Short Note: Antonio McDyess is one of my favorite players that no one I know seems to know or remember, so I asked ChatGPT Deep Research to help tell the story of his rise to the cusp of superstardom. Do a YouTube search for McDyess highlights - it’s a blast. Humble Beginnings and Early Promise Antonio McDyess hailed from small-town Quitman, Mississippi, and quickly made a name for himself on the basketball court. After starring at the University of Alabama – where he led the Crimson Tide in both scoring and rebounding as a sophomore – McDyess entered the star-studded 1995 NBA Draft .

The Rise and Heartbreak of Antonio McDyess: A Superstar’s Path Cut Short

29.06.2025 20:11 👍 0 🔁 0 💬 0 📌 0