Databricks vs Fabric feels a lot like Pied Piper vs Nucleus. Fans of the Silicon Valley show will get the reference :)
#databs #databricks #fabric
Databricks vs Fabric feels a lot like Pied Piper vs Nucleus. Fans of the Silicon Valley show will get the reference :)
#databs #databricks #fabric
(3/3) As a data engineer, I could have spent my time learning one of the shiny tools out there or could have relied on co-pilot to write code. I chose to learn what was happening under the hood in Python, mainly in the interest of building a stronger foundation, which should be a priority.
(2/3) I am no better and I am guilty of this myself, mainly from my own ignorance to writing Python idiomatically. This is why I spent the last 6 months re-learning Python, consuming multiple books, attending advanced lessons on specific topics, and wrote a #data ingestion package using Python.
(1/3) Among programming languages, I consider #Python to be a relatively easy to learn language, opening doors for many to start coding without formal training. However this also results in some poorly written, unmaintainable, and non-extensible code; the infamous spaghetti code.
Just finished watching the webinar on introducing SDF by dbt team. After seeing SDF in action, I have to admit that I am really looking forward to the future of dbt engine. I was wondering when dbt was going to bring in notable changes to the developer experience and this might be it.
#databs
I was speaking with someone who went all in on promoting duckdb to their clients. I did not get a chance to ask what exactly are they doing with duckdb. But I am curious to understand how duckdb is utilised in modern data pipelines.
(4/4) It is crazy to think that this news came out just 2 weeks into the new year. I cannot wait to see what dbt and others have in store for the next 12 months. I am excited to see how the integration between sdf and dbt will be rolled out. Maybe we will have some updates at Coalesce 2025!!!
(3/4) I have always found the incredibly slow compile times with dbt deeply frustrating, which is of no concern to business users. But considering the early adoption of dbt is strongly rooted in the technical community, the acquisition of SDF to improve the developer experience is well timed.
(2/4) I have used dbt quite extensively, and do enjoy its utility when it comes to data transformations and acknowledge that it is not going anywhere. However, I do feel that dbt core have not had any significant upgrade in a while, especially when it comes to improving its developer experience.
(1/4) SDF acquisition by dbt
If you work in data, you probably would have come across a version of this headline this past week. A small disclaimer, I have not used SDF and neither do I have solid understanding of the tech that sits behind it, so take what I say with a grain of salt.
(3/3) I am sure this will change as we iterate through the next generation of language models. However, if you are someone that is just starting out in the world of #data, I recommend you to reduce your reliance on code assist and instead spend some time understanding the basics of how SQL works.
(2/2) But if you had to work with complex #SQL statements such as first touch attribution, finding the streak, calculating conversions etc with highly specific business logic and questionable quality of data, you probably will understand the sentiment that LLMs we have today are not powerful enough.
(1/2) Maybe an unpopular opinion, SQL is a powerful language and despite what anyone says, it is unlikely to be replaced by an LLM, at least not with the models we have today. LLMs are powerful and can be leveraged to generate ideas or as a tool to unblock when you are stuck.
#databs
The thing that allows you to link data from the physical world in a format that a machine can understand coherently.
What did they buy before?
SDF was on my list of things to try out. I'll just wait till they integrate it to dbt now I guess :)
(3/3) Laktory also supports managing ETL pipelines, much like how you would do with dbt but with Spark and/or SQL. What I really like about Laktory is its ability to modularize the Databricks assets, which is big win when it comes to long term maintainability of your #data platform.
(2/3) I am not affiliated with Laktory but if you are someone who works with Databricks and want to take a break from having to wrestle with Terraform/Pulumi, check out Laktory. It is an absolute game changer and I was able to go zero to managing multiple workspaces with a couple of yaml files.
(1/3) Continuing from my previous thread on infrastructure as code for managing #Databricks. I have recently had the pleasure to work with an open source tool called Laktory, which is an abstraction that sits on top of Terraform/Pulumi to manage your Databricks workflow using YAML.
#databs
A default approach that I take when it comes to data modelling. It works because OBT is optimized for the modern vectorized data warehouses. At the same time, the underlying data is modelled using established best practices from Kimball.
This is the approach I default to. Also, I always thought that this was the only way OBT was used. I guess not.
To an extent, I agree. Though I still prefer to use documentation alongside AI assistance to verify some of the output.
I doubt there is one course that covers them all. You can always pick what part of DE you want to learn and drill down on that first. So that would be for things like ingestion, then transformation,, orchestration and so on
Senior data engineer!! That's tricky almost all of the content out there is generally catered towards early stage DEs. But I have heard Zach Wilson's boot camp is pretty good for seniors.
Disclaimer - I have not attended the boot camp, so take my advice with a grain of salt.
(2/2) I am no where near mastering it but I am happy that I was able to use it recently to build out a data platform on #Databricks and it was not as challenging as I imagined. Key takeaway, being comfortable with IaC can dramatically improve the efficiency and reliability of your #data pipelines.
(1/2) Infrastructure as code (IaC) is ubiquitous in the data space. That being said, I have stayed away from doing any IaC work for as long as I can remember, mainly due to its aura of being difficult and also because I could pass the ball to the platform team.
#datasky #databs
(4/4) Some of the #saas products are amazing. In fact, I continue to use them where relevant. However, it is important that you do your due diligence when it comes to picking your tech stack so that you do not miss out on something better. Your future self will be grateful!
(3/4) One of the key benefit of using dlthub is that you can run it anywhere with a #python runtime, i.e. airflow, dagster, serverless etc. There is also the added flexibility to bring unique functionality that are specific to your data into the ingestion framework.
(2/4) This last year, I came across a python library called #dlthub, which I use regularly when it comes to data ingestion. What I like about dlthub is how it seamlessly integrate with widely used sources and destinations, and allows you to build production ready data pipelines in just a few lines.
(1/4) What do you use for #data ingestion?
Its true that there are no shortage of tools when it comes to data ingestion. But before you open the wallet to one of the many options out there, it might be worth doing a thorough due diligence based on your current and future needs.
#databs