Data Code 101 (@datacode101)

Power BI connects to the warehouse SQL instance using a gateway or direct connection.

Builds relationships between dimension and fact tables, defines measures like Total Sales, Orders, Avg Order Value, and filters by date/restaurant/region.

12.03.2026 16:59 👍 0 🔁 0 💬 0 📌 0

Load

Insert/update into warehouse tables, usually with upsert logic for slowly changing data like restaurant or menu details.

Create indexes and possibly summary/aggregate tables to speed up BI queries.

12.03.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Transform

Data quality: handle nulls, fix invalid values, standardize timestamps and currencies.

Business logic: derive status (completed/cancelled), order duration, delivery time, etc.

Dimensional modeling: create dimensions and facts with surrogate keys.

12.03.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Extract

Periodic jobs (e.g., stored procedures, scripts, or an external tool) read new/changed rows from the OLTP MySQL database.

Data is loaded into staging tables without heavy logic, often as 1‑to‑1 copies of source tables plus load metadata.

12.03.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Data warehouse / reporting schema

Dimensional or star‑like tables (e.g., dim_customer, dim_restaurant, dim_date, fact_orders) are built for analytics.

12.03.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Staging schemas

Raw tables may be copied or materialized into staging tables where basic cleaning, type fixes, and simple joins happen.

12.03.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Source OLTP DB

Tables like customers, restaurants, menu items, orders, order_items, payments hold raw, highly normalized data optimized for the ordering app, not reporting.

12.03.2026 16:59 👍 0 🔁 0 💬 1 📌 0

End-to-End Data Engineering Project: Food Order ETL Pipeline using MySQL & Power BI
#dataengineering

This project shows a full ETL/analytics flow for a food‑ordering business, from raw operational data in MySQL to interactive dashboards in Power BI.

12.03.2026 16:59 👍 0 🔁 0 💬 1 📌 0

Flow of data and where people experience the problems
Image by Matt Arderne (Forbes)

12.03.2026 16:31 👍 0 🔁 0 💬 0 📌 0

Prompting is temporary.

Structure is permanent.

When your repo is organized this way, Claude stops behaving like a chatbot…

…and starts acting like a project-native engineer.

10.03.2026 11:58 👍 0 🔁 0 💬 0 📌 0

5️⃣ Local CLAUDE.md for risky modules

Put small files near sharp edges:

src/auth/CLAUDE.md
src/persistence/CLAUDE.md
infra/CLAUDE.md

Now Claude sees the gotchas exactly when it works there.

10.03.2026 11:58 👍 0 🔁 0 💬 1 📌 0

4️⃣ docs/ = Progressive Context

Don’t bloat prompts.

Claude just needs to know where truth lives:

• architecture overview
• ADRs (engineering decisions)
• operational runbooks

10.03.2026 11:58 👍 0 🔁 0 💬 1 📌 0

3️⃣ .claude/hooks/ = Guardrails

Models forget.

Hooks don’t.

Use them for things that must be deterministic:

• run formatter after edits
• run tests on core changes
• block unsafe directories (auth, billing, migrations)

10.03.2026 11:58 👍 0 🔁 0 💬 1 📌 0

2️⃣ .claude/skills/ = Reusable Expert Modes

Stop rewriting instructions.

Turn common workflows into skills:

• code review checklist
• refactor playbook
• release procedure
• debugging flow

Result:
Consistency across sessions and teammates.

10.03.2026 11:58 👍 1 🔁 0 💬 2 📌 0

1️⃣ CLAUDE.md = Repo Memory (keep it short)

This is the north star file.

Not a knowledge dump. Just:

• Purpose (WHY)
• Repo map (WHAT)
• Rules + commands (HOW)

If it gets too long, the model starts missing important context.

10.03.2026 11:58 👍 0 🔁 0 💬 1 📌 0

Claude needs 4 things at all times:

• the why → what the system does
• the map → where things live
• the rules → what’s allowed / not allowed
• the workflows → how work gets done

The Anatomy of a Claude Code Project 👇

10.03.2026 11:58 👍 0 🔁 0 💬 1 📌 0

Most people treat CLAUDE.md like a prompt file.

That’s the mistake.

If you want Claude Code to feel like a senior engineer living inside your repo, your project needs structure.

#Agentic #AI #Claude

10.03.2026 11:58 👍 1 🔁 0 💬 2 📌 0

If you enjoy system design, infrastructure, and data flow — engineering may suit you.
If you enjoy analysis, modeling, and problem-solving with algorithms — science may be your path.

10.02.2026 22:42 👍 0 🔁 0 💬 0 📌 0

A Data Scientist analyzes data, builds models, applies statistics, and translates patterns into actionable insights. They focus on prediction, experimentation, and business impact.

10.02.2026 22:42 👍 0 🔁 0 💬 1 📌 0

A Data Engineer designs pipelines, manages large-scale systems, ensures data reliability, and works heavily with cloud and distributed frameworks. They focus on performance, scalability, and architecture.

10.02.2026 22:42 👍 0 🔁 0 💬 1 📌 0

Data Engineer vs. Data Scientist: What’s the Difference?

One builds the data foundation.
The other turns data into intelligence.

10.02.2026 22:42 👍 0 🔁 0 💬 1 📌 0

- Using coding agents to increase the speed at which they build pipelines
- Crushing data siloes with data lakehouse architectures like Iceberg and Delta. Getting the entire company to agree upon business definitions

Data engineering is one of the few "safe" roles in the coming decade!

10.02.2026 20:28 👍 0 🔁 0 💬 0 📌 0

Data engineers in 2030 are:
- Able to handle all types of data: structured, semi-structured, and unstructured
- Integrating private data into AI in a privacy-compliant and efficient way using multi-tenant architectures

10.02.2026 20:28 👍 0 🔁 0 💬 1 📌 0

Things like Claude Code will make "building pipelines" easier, but data engineering is so much more than building pipelines!

10.02.2026 20:28 👍 0 🔁 0 💬 1 📌 0

Data engineering is projected to grow faster than AI engineering over the next decade, according to the World Economic Forum!

AI is not going to replace data engineering; it will make it increasingly more valuable!

10.02.2026 20:28 👍 0 🔁 0 💬 1 📌 0

- Typically 30–60% fewer tokens than JSON1
- Explicit lengths and fields enable validation
- Removes redundant punctuation (braces, brackets, most quotes)
- Indentation-based structure, like YAML, uses whitespace instead of braces
- Tabular arrays: declare keys once, stream data as rows

06.11.2025 06:01 👍 0 🔁 0 💬 0 📌 0

JSON:

{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}

TOON:

users[2]{id,name,role}:
1,Alice,admin
2,Bob,user

06.11.2025 06:01 👍 0 🔁 0 💬 1 📌 0

Token-Oriented Object Notation (TOON) is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input as a lossless, drop-in representation of JSON data.

#dataengineering #llm

06.11.2025 06:01 👍 0 🔁 0 💬 1 📌 0

RAG is not just an integration problem. It’s a design problem. Each layer of this stack requires deliberate choices that impact latency, quality, explainability, and cost.

If you're serious about GenAI, it's time to think in terms of stacks—not just models.

27.10.2025 10:36 👍 0 🔁 0 💬 0 📌 0

Evaluation

Tools like Ragas, Trulens, and Giskard bring much-needed observability—measuring hallucinations, relevance, grounding, and model behavior under pressure.

27.10.2025 10:36 👍 0 🔁 0 💬 1 📌 0

Data Code 101

Latest posts by Data Code 101 @datacode101