Kit Menke's Avatar

Kit Menke

@kitmenke.com

Data Engineering leader in Saint Louis, STL Big Data I.D.E.A. meetup organizer, lifelong learner and teacher. He / him #dataBS

302
Followers
372
Following
51
Posts
26.10.2024
Joined
Posts Following

Latest posts by Kit Menke @kitmenke.com

Woah what?!

24.02.2026 03:50 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I just started Google AI Pro with Antigravity for $20 a month and it includes access to Gemini and Claude models. I've hit the limits on Claude a few times but not bad.

23.02.2026 22:44 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Markdown for sure. You can read it as plain text or render it, it is widely accepted across the development community, and AI speaks it very well.

09.02.2026 21:49 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Agreed, hybrid is the worst.

15.01.2026 04:35 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
What is a Medallion Architecture? A medallion architecture is a data design pattern used to logically organize data in a lakehouse, with the goal of improving the structure and quality of data.

Timely link because I'm currently struggling with this. The article is interesting and not at all what I expected. I've always heard Silver is supposed to be the integration layer using 3NF, not star schema. So many more questions! www.databricks.com/glossary/med...

30.12.2025 23:55 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I just want everyone to know that if we are connected on here, I’m rooting for you. For your career, your creative endeavor, your happiness, whatever you are chasing. I hope it happens for you!

07.12.2025 00:58 πŸ‘ 78 πŸ” 7 πŸ’¬ 5 πŸ“Œ 0
Preview
Quad9 | A public and free DNS service for a better security and privacy A public and free DNS service for a better security and privacy

While AI companies are allowed to slurp everything they want, Quad9 warns that legal fees are drowning DNS resolvers, which are now being targeted by copyright owners to enforce blocks on piracy sites

quad9.net/news/blog/wh...

10.11.2025 22:53 πŸ‘ 72 πŸ” 45 πŸ’¬ 1 πŸ“Œ 2
Post image

Perhaps using unnest?
select unnest(value, recursive := true) from read_json('~/Data/example.json')

06.11.2025 20:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Any meetup.com organizers that have successfully moved their community to another platform? I would be interested in hearing your experience

26.09.2025 22:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

My blogging motivation has declined a lot over the years... then the endless Hugo breaking changes pretty much killed it for good. Do you know how much work it would be to convert a Hugo blog over to Zola?

01.09.2025 04:56 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Conspiracy theory: Databricks is deliberately making your clusters start up super slowly so that you want to pay more to use serverless. #dataBS

28.08.2025 14:13 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Visualizing a SQL LEFT JOIN
Visualizing a SQL LEFT JOIN YouTube video by Kit Menke

I've been working on visualizing JOINs for some beginner SQL workshops. Here is LEFT JOIN. Thoughts? #databs youtu.be/ZSxtZAulogo?...

10.06.2025 11:51 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Moving from Azure Data Studio to VSCode but ugh... the VSCode SQL Server extensions are so frustrating to use.

21.05.2025 13:37 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I met my wife on xanga! β™₯️

25.04.2025 20:52 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines We’ve just shipped our new streaming ingestion service, Pipelines β€” and we’ve acquired Arroyo, enabling us to bring new SQL-based, stateful transformations to Pipelines and R2.

Couple of big announcements from @cloudflare.social today for folk in #dataBS:

* Acquisition of Arroyo, launch of Pipelines for streaming ingestion: blog.cloudflare.com/cloudflare-a...
* Launch of R2 Data Catalogβ€”a managed Apache Iceberg catalog for R2 blog.cloudflare.com/r2-data-cata...

10.04.2025 14:50 πŸ‘ 9 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - MrPowers/chispa: PySpark test helper methods with beautiful error messages PySpark test helper methods with beautiful error messages - MrPowers/chispa

Chispa has good diffs for PySpark dataframes
github.com/MrPowers/chi...

25.02.2025 13:26 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
December 2024 - Azure Databricks December 2024 release notes for new Azure Databricks features and improvements.

Databricks recently changed the default notebook format from "source" (.py, .sql, .scala) to IPYNB which seems to indicate they will be getting rid of the source format. IMO, the ipynb format brings a few issues like difficult diffs and the potential to leak data learn.microsoft.com/en-us/azure/...

19.02.2025 20:07 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I had an HDMI KVM but it was still annoying to switch back and forth. Plus I wanted to use the full resolution at 144Hz on my gaming PC. Now I just have a big desk with separate keyboards/mice/monitors.

05.02.2025 20:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Yes, I'm working on this right now and talking about how we can potentially "upgrade" some of the dimensions without breaking everything. πŸ™ƒ

05.02.2025 20:50 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
1988 Data Warehouse Architecture is introduced
1994 Data Warehouse is too difficult to build -> Data Marts
2008 Data Warehouse is too small for big data -> Hadoop
2010 Data Warehouse is too structured  -> Data Lake
2012 Data Warehouse is too difficult to scale -> Cloud Data Warehouse
2016 Data Warehouse is too difficult to manage -> Data Fabric
2019 Data Warehouse is too centralized -> Data Mesh
2020 Data Warehouse is too limiting -> Data Lakehouse
2023 Data Warehouse is too monolithic -> Composable Data Platform

1988 Data Warehouse Architecture is introduced 1994 Data Warehouse is too difficult to build -> Data Marts 2008 Data Warehouse is too small for big data -> Hadoop 2010 Data Warehouse is too structured -> Data Lake 2012 Data Warehouse is too difficult to scale -> Cloud Data Warehouse 2016 Data Warehouse is too difficult to manage -> Data Fabric 2019 Data Warehouse is too centralized -> Data Mesh 2020 Data Warehouse is too limiting -> Data Lakehouse 2023 Data Warehouse is too monolithic -> Composable Data Platform

I found this while looking through some scratch notes. I don't now recall what the context was, but it's an interesting thought on the evolution of the data warehouse. (Though there is an equivocation imbedded in this history) #databs

05.02.2025 17:45 πŸ‘ 4 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Thanks for the input and I agree... Right now I'm battling a mono-repo used by a big team with limited git knowledge and no tooling. Choosing a tool like dbt/flyway/liquibase could help force some standardization.

04.02.2025 20:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Do you ever feel like it is difficult to keep them in sync with what is deployed to the database? Or with many people working in the same repo?

04.02.2025 18:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Is keeping the table definition valuable for only certain databases? For example in Databricks you can easily get the definition and there aren't any indexes to store. Compared to SQL Server (or similar) where it is difficult to figure out what was deployed.

04.02.2025 16:24 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

You did this only for certain breaking changes right? For example - meaning of the data in a column changed or columns removed. How did you maintain two separate versions of the schema?

04.02.2025 16:22 πŸ‘ 0 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

Do you version your data assets? Or is there only the current version of a database table? What about the table definition? #dataBS

04.02.2025 14:57 πŸ‘ 0 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

An agile ceremony / rite of passage nobody mentions: arguing about story points and what they mean.

24.01.2025 17:06 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

1. Impact. How much revenue does my work protect or generate?

2. Quality. Does my work meet or exceed customer expectations?

3. Efficiency. Reward making the right buy versus build decision.

4. Reusability. How do others leverage my work?

5. Supportability. How much work do I create for others?

23.01.2025 16:45 πŸ‘ 914 πŸ” 107 πŸ’¬ 30 πŸ“Œ 12
Video thumbnail

I'm out walking and had some thoughts about data and fun stuff and mental health that I wanted to share.

#dataBS

22.01.2025 13:23 πŸ‘ 13 πŸ” 1 πŸ’¬ 5 πŸ“Œ 2
Preview
dev up 2025: Call for Speakers The 2025 dev up conference is being held in St. Louis, Missouri from August 6-8, 2025. We are excited to be back and we are putting out the call to ...

Gahhh it’s time! @data-dragoness.bsky.social devUp call for speakers! Let’s take over with the #PowerPlatform and #MicrosoftFabric topics!

For anyone who’s in the middle west, let’s do this!

sessionize.com/dev-up-2025

14.01.2025 20:43 πŸ‘ 6 πŸ” 2 πŸ’¬ 4 πŸ“Œ 0

In my experience I'm seeing companies using Spark switch from Scala to Python for two reasons: Python has an easier learning curve and Scala devs are much harder to find.

14.01.2025 18:43 πŸ‘ 2 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0