Didnβt need incremental loads for that project but would likely attack that with some dagster wizardry should the need arise
Didnβt need incremental loads for that project but would likely attack that with some dagster wizardry should the need arise
Hoping you donβt stumble into a new gif-like debate
Problem is weβre in the small window before all of AIβs answers are guided by advertisers and the cycle repeats itself
Is using 67 AI agents better than 12 or 4?
My LinkedIn feed is full of these humble brags.
What are your thoughts about the inverse direction?
Can a developer use AI to become a CFO? Youβve done it the hard way, but are these new tools letting anyone with a thirst for knowledge do it a lot faster?
Itβs gonna be a fun ride for everyone who isnβt afraid to learn new things
Not 4pm deploy to production and leave for the weekend?
What weβre working with
β’ LLMs
β’ AI agents
β’ Machine learning models
β’ Unstructured data (video, voice, text)
β’ Modern data engineering and cloud tooling
Location
β’ Gatineau / Ottawa, NYC or remote
DM me.
Open roles
β’ Head of Data Engineering: pipelines, modeling, and foundations
β’ Head of Applied ML Engineering: predictive models and interpretable signals
β’ Head of AI Agent Engineering: agent workflows
β’ Founding Product Engineer (Full-Stack): product surfaces and system integration
Iβm hiring four technical leaders for a new AI company in the multi-unit space.
The team
β’ Two cofounders, both exited CEOs
β’ Backgrounds in data and analytics and multi-unit operations
β’ Backed by top-tier NYC VCs
Open to trying it out!
Vanna is good for the loading up the schema plus docs into a vector db for RAG part, just the charts part are weaker.
I want the graph part too, thatβs where Vanna w/ plotly is failing. Itβs not rendering when I use booleans or timestamps and it does scatter plots at inappropriate times
Anyone got open source alternatives to vanna.ai for text to SQL?
The SQL part is pretty good but the plotly charts it recommends are wonky.
Iβve also been playing with dbt - and considered sqlmesh instead.
Chose to go deeper with dbt as I feel like sqlmeshβs real value shines when you want to avoid transforming the data both in dev and prod.
β¦ and Iβm just prototyping stuff locally to play with different open source BI frontends.
Last week I did some experimentation with unstructured.io to extract content out of some pdfs.
Also played with github.com/getomni-ai/z... as a more lightweight option.
Overall, vision models do a much better job than classic OCR (ex: tesseract) on tables in docs.
Hereβs a decent overview of the data pipeline orchestration tools on the market.
dataengineeringcentral.substack.com/p/review-of-...
Does this API exist in both the hosted and self-hosted versions?
When reading the docs last week I sometimes got mixed up in what features needed a subscription.
Nice! Hello!
Is the api giving you just the metadata of the metric or also translating to the SQL youβd run to build charts like you do in lightdash itself?
Lightdash may be a good UI option for me, but then Iβm defining metrics at the presentation layer and tightly coupled with it.
Iβm looking for something to be able to express KPIs centrally and cleanly, and have the BI layer autogenerated from it.
Dbt semantic layer could be that, but doesnβt seem like many open source BI layers support it.
I spent some time this week playing with some tools to setup a data pipeline for simple BI and data science.
Played with airbyte, dbt, duckdb and metabase.
Planning on trying lightdash next week.
Trying to avoid hosted data warehouses and use open source for the full chain.
Keep it up!
Iβve never felt motivated by hyping whatever Iβm building to peers whoβd never be clients/users.
I sold my solution by ignoring the sales rules and just flat out asking βhow do you do <process>?β and replying we had an app for that when they outlined their manual process.
I felt dirty not outlining benefits at first, but I later realized this opener was better aligned with my ICP (operations)
My past experiment with a tax form was bad because the LLM used basic OCR instead of something fancier for tables.
And here Iβm trying to do something more generic without knowing the form format ahead of time.
Imagine a government form with some weird tabular layout to shove as many fields into a condensed space as possible.
Generalized use case is read/write to forms. Basically reverse engineering a domain model from a form.
Azure DocIntel lets you do it well for a known form via training.
If making zip files named _final-website(2)-final-tuesday-3.zip is too complicated maybe something like dropbox with revision history baked in could work - at least for one file at a time.
Thatβs ironic but I guess totally expected since itβs their main revenue source hah
Interesting compensation model.
#chordle sounded kinda sus