It's that time of year again! All Things Open 2025 has kicked off and let the nerding begin!
It's that time of year again! All Things Open 2025 has kicked off and let the nerding begin!
Elderly and disabled people told Congress that cutting Medicaid could kill people and in response, they were arrested in their wheelchairs.
What the hell are we doing?
π Apache Hop 2.13 is out!
84 tickets, 8 contributors, 2 months of work β this release brings big wins for data workflows πͺ
π Blog post: www.know.bi/blog/our-blo...
π Work with us: www.know.bi
#datasky #databs #apachehop #etl #dataengineering #opensource #gcp #mysql
Many folks I interview write scripts for things like notebooks and not frameworks or tooling to do deeper data processing. 2/2
I've seen folks who have non programming backgrounds start diving into data tooling and it can be very daunting. It's relatively straightforward to use many libraries in Python without knowing the ins and outs of more advanced coding. 1/2
Thinking of moving from #Pentaho to Apache Hop? You're modernizing, not just migrating.
π Learn how to switch smart: www.know.bi/blog/our-blo...
Need help? We coach teams too π www.know.bi/pricing
#apachehop #pentaho #migration #coaching #datasky #databs
Not considering data quality and trying to fix it before using it.
Case in point, spreadsheets and SQL. They have a lot of capabilities far beyond their original intent. Yes, they are great for testing things out or smaller use-cases, but are they the best tool for your environment with that functionality? 2/ #databs
Just because a tool/technology can do something, doesn't mean it should. Sure I can use a hammer on screws, but a screwdriver or drill would perform a lot better. It's definitely like that with a lot of data tooling - both foundational and esoteric. 1/ #databs
Building data platforms is not just about tech. It's also not purely about data governance. It has to be a blend optimized for the use case and for best performance of the data available. Even a basic RDBMS and a data dictionary in a spreadsheet can provide value provided it's the right fit. #databs
It's always good to show that fancy tools aren't required to get the fundamentals taken care of. They may make it easier to monitor but starting with a doc with an outline like this is worth a ton.
The information you include in a data dictionary (a collection of names, definitions, and attributes about variables in a dataset), depends on your data and how you plan to use the document.
Some ideas of fields to consider including. π
Data dictionary template and example are here: osf.io/ynqcu
It just goes to show - using data is easy, determining its worth is quite difficult. Especially when resources for data governance are not made a priority.
Data Is Very Valuable, Just Don't Ask Us To Measure It, Leaders Say - Slashdot m.slashdot.org/story/439061
#DataSky
The read π that caused this stitch: open.substack.com/pub/dataprod...
Happy to share and enjoyed the conversation :)
CI/CD with data is a fun one for sure!
Not everyone is on Bluesky, so shoutout to @opendataalex.bsky.social for one of my absolute favourite interviews that's full of humour and hard-won insights:
www.datafold.com/data-migrati...
"Who cares about classic ML when you can have your own AI assistant?" says your CEO #databs
This is ridiculous and uncalled for. Years of research are going to be stalled because of this. Researchers are going to be unable to publish findings or even perform/complete their work because of politics.
gizmodo.com/cdc-ordered-...
Tomorrowβs the big day! π Join us for Public Domain Day 2025 as we celebrate works from 1929 entering the #PublicDomain. Itβs free for everyone to enjoy! πΆπ¬ ποΈ
π
Jan 22
ποΈ 10 AM PT/1 PM ET
π ONLINE
ποΈ REGISTER β‘οΈ https://www.eventbrite.com/e/1104135491979
#PublicDomainDay @InternetArchive
Raleigh Low-Key Data Happy Hour is back for more drinks, data, and fun in 2025! #databs
Many companies stick with older tech because it's the skill set they have in house and/or it is legacy enough that switching over is scary/dangerous. 2/2
Oh I assure you, there's a reason companies make those switches. It usually ends up being money savings (which in the short term just ends up vanishing because of the cost of switching over but can save in the long term), internal politics, or kickbacks. 1/2
It's good to come in with fresh ideas but take time to observe not just the technology, but the people. Learning how folks interact with each other is just as important as showing how you can help improve things. General rule of thumb: observe and absorb for two weeks then hit the ground running.
Do you know how much CEO pay has skyrocketed since 1978?
100%? 500%?
Try 1,085%
Meanwhile, the $7.25/hr fed. minimum wage hasn't budged in 15 years and the tipped min. wage has been $2.13/hr since 1991.
This is what I mean when I say the system is rigged.
Don't be tempted to do stored procs for app logic. You'll be in for a bad time :p They have their place for sure but keep it limited. Database compute is one of the more expensive things in an app stack.
DQ is something that can't be fixed overnight but can be iterated on. It also needs to be fixed as close to entry into the data ecosystem as possible so that the quality problems impact the fewest points. 3/
In reality, those data quality issues can significantly slow new development down and cause a lot duplicated effort while each group touching the data finds the same issues over and over again. It's also one of the classic excuses - 'well we can fix it later' or 'not my data to fix'. 2/
That's part of it. Another is a lack of understanding on how quality impacts downstream. Think 'our business has been working fine without fixing this data quality stuff so it mustn't be a high priority'. 1/
Poop
This made me think
Credit : Dr. Ordax on Linkedin