OpenDataAlex's Avatar

OpenDataAlex

@opendataalex

All around data nerd, building awesome data platforms. Talk data with me! Avid board/video gamer and role player. He/him

1,064
Followers
510
Following
89
Posts
27.10.2024
Joined
Posts Following

Latest posts by OpenDataAlex @opendataalex

It's that time of year again! All Things Open 2025 has kicked off and let the nerding begin!

13.10.2025 12:31 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Elderly and disabled people told Congress that cutting Medicaid could kill people and in response, they were arrested in their wheelchairs.

What the hell are we doing?

25.06.2025 21:23 πŸ‘ 7489 πŸ” 2511 πŸ’¬ 350 πŸ“Œ 151
Post image

πŸš€ Apache Hop 2.13 is out!
84 tickets, 8 contributors, 2 months of work β€” this release brings big wins for data workflows πŸ’ͺ
πŸ”— Blog post: www.know.bi/blog/our-blo...
πŸ‘‰ Work with us: www.know.bi

#datasky #databs #apachehop #etl #dataengineering #opensource #gcp #mysql

24.04.2025 08:25 πŸ‘ 4 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

Many folks I interview write scripts for things like notebooks and not frameworks or tooling to do deeper data processing. 2/2

23.04.2025 12:55 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I've seen folks who have non programming backgrounds start diving into data tooling and it can be very daunting. It's relatively straightforward to use many libraries in Python without knowing the ins and outs of more advanced coding. 1/2

23.04.2025 12:54 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
7 key points to successfully upgrade from Pentaho to Apache Hop Discover 7 essential tips for a seamless upgrade from Pentaho to Apache Hop, ensuring smooth data transitions, enhanced workflow management, and optimized data engineering processes.

Thinking of moving from #Pentaho to Apache Hop? You're modernizing, not just migrating.
πŸš€ Learn how to switch smart: www.know.bi/blog/our-blo...
Need help? We coach teams too πŸ‘‰ www.know.bi/pricing

#apachehop #pentaho #migration #coaching #datasky #databs

16.04.2025 07:44 πŸ‘ 4 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Not considering data quality and trying to fix it before using it.

18.03.2025 11:53 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Case in point, spreadsheets and SQL. They have a lot of capabilities far beyond their original intent. Yes, they are great for testing things out or smaller use-cases, but are they the best tool for your environment with that functionality? 2/ #databs

05.03.2025 12:47 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Just because a tool/technology can do something, doesn't mean it should. Sure I can use a hammer on screws, but a screwdriver or drill would perform a lot better. It's definitely like that with a lot of data tooling - both foundational and esoteric. 1/ #databs

05.03.2025 12:46 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Building data platforms is not just about tech. It's also not purely about data governance. It has to be a blend optimized for the use case and for best performance of the data available. Even a basic RDBMS and a data dictionary in a spreadsheet can provide value provided it's the right fit. #databs

27.02.2025 15:39 πŸ‘ 8 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

It's always good to show that fancy tools aren't required to get the fundamentals taken care of. They may make it easier to monitor but starting with a doc with an outline like this is worth a ton.

22.02.2025 12:51 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

The information you include in a data dictionary (a collection of names, definitions, and attributes about variables in a dataset), depends on your data and how you plan to use the document.

Some ideas of fields to consider including. πŸ‘‡

Data dictionary template and example are here: osf.io/ynqcu

21.02.2025 13:49 πŸ‘ 45 πŸ” 10 πŸ’¬ 2 πŸ“Œ 3
Slashdot

It just goes to show - using data is easy, determining its worth is quite difficult. Especially when resources for data governance are not made a priority.

Data Is Very Valuable, Just Don't Ask Us To Measure It, Leaders Say - Slashdot m.slashdot.org/story/439061

22.02.2025 12:45 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

#DataSky

06.02.2025 22:02 πŸ‘ 9 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
Preview
The Data-Conscious Software Engineer The Unicorn That Data Teams Actually Need

The read πŸ“– that caused this stitch: open.substack.com/pub/dataprod...

06.02.2025 18:05 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Happy to share and enjoyed the conversation :)

CI/CD with data is a fun one for sure!

05.02.2025 19:33 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
A Data Migration Is Never Just a Data Migration: Lessons from Alex Meadows Alex Meadows shares insights on lift-and-shift vs. rearchitecting, data quality priorities, and the human factors that can make or break a data migration.

Not everyone is on Bluesky, so shoutout to @opendataalex.bsky.social for one of my absolute favourite interviews that's full of humour and hard-won insights:

www.datafold.com/data-migrati...

05.02.2025 17:28 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0

"Who cares about classic ML when you can have your own AI assistant?" says your CEO #databs

01.02.2025 16:05 πŸ‘ 11 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
CDC Ordered to Scrub Website of Words Like 'Transgender' and 'LGBT' A CDC employee spoke with Gizmodo about the

This is ridiculous and uncalled for. Years of research are going to be stalled because of this. Researchers are going to be unable to publish findings or even perform/complete their work because of politics.

gizmodo.com/cdc-ordered-...

03.02.2025 12:56 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Singin' in the Public Domain: Public Domain Day 2025 On January 1, 2025, creative works from 1929 and sound recordings from 1924 will enter the public domain in the US. Celebrate with us!

Tomorrow’s the big day! πŸŽ‰ Join us for Public Domain Day 2025 as we celebrate works from 1929 entering the #PublicDomain. It’s free for everyone to enjoy! 🎢🎬 πŸ–‹οΈ
πŸ“… Jan 22
πŸ•™οΈ 10 AM PT/1 PM ET
πŸ“ ONLINE
🎟️ REGISTER ➑️ https://www.eventbrite.com/e/1104135491979

#PublicDomainDay @InternetArchive

21.01.2025 15:00 πŸ‘ 511 πŸ” 94 πŸ’¬ 0 πŸ“Œ 2
Preview
January Low-Key Data Happy Hour (Lynnwood Brewing), Thu, Jan 30, 2025, 6:00 PM | Meetup **We're back for more drinks, data, and fun in 2025!** No presentations or pitches, just people who love data hanging out, having a drink and eating food. Show up when co

Raleigh Low-Key Data Happy Hour is back for more drinks, data, and fun in 2025! #databs

07.01.2025 12:06 πŸ‘ 4 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Many companies stick with older tech because it's the skill set they have in house and/or it is legacy enough that switching over is scary/dangerous. 2/2

09.01.2025 12:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Oh I assure you, there's a reason companies make those switches. It usually ends up being money savings (which in the short term just ends up vanishing because of the cost of switching over but can save in the long term), internal politics, or kickbacks. 1/2

09.01.2025 12:09 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

It's good to come in with fresh ideas but take time to observe not just the technology, but the people. Learning how folks interact with each other is just as important as showing how you can help improve things. General rule of thumb: observe and absorb for two weeks then hit the ground running.

06.01.2025 20:27 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
CEO Pay Has Risen 1,085% Since 1978, But for Workers? Just 24% | Common Dreams CEOs at top US companies saw their pay skyrocket by 1,085% since 1978, while typical worker pay only increased by 24%

Do you know how much CEO pay has skyrocketed since 1978?

100%? 500%?

Try 1,085%

Meanwhile, the $7.25/hr fed. minimum wage hasn't budged in 15 years and the tipped min. wage has been $2.13/hr since 1991.

This is what I mean when I say the system is rigged.

23.12.2024 23:00 πŸ‘ 28769 πŸ” 11618 πŸ’¬ 998 πŸ“Œ 733

Don't be tempted to do stored procs for app logic. You'll be in for a bad time :p They have their place for sure but keep it limited. Database compute is one of the more expensive things in an app stack.

22.12.2024 00:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

DQ is something that can't be fixed overnight but can be iterated on. It also needs to be fixed as close to entry into the data ecosystem as possible so that the quality problems impact the fewest points. 3/

20.12.2024 10:48 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In reality, those data quality issues can significantly slow new development down and cause a lot duplicated effort while each group touching the data finds the same issues over and over again. It's also one of the classic excuses - 'well we can fix it later' or 'not my data to fix'. 2/

20.12.2024 10:46 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

That's part of it. Another is a lack of understanding on how quality impacts downstream. Think 'our business has been working fine without fixing this data quality stuff so it mustn't be a high priority'. 1/

20.12.2024 10:42 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Poop

Poop

This made me think

Credit : Dr. Ordax on Linkedin

19.12.2024 22:48 πŸ‘ 95 πŸ” 10 πŸ’¬ 4 πŸ“Œ 4