Real-time alert broker powered by Apache Spark that processes astronomical transients from sky surveys, enriching cosmic events with ML classifications for breakthrough discoveries.
🔗 https://github.com/astrolabsoftware/fink-broker
#AlertBroker #TransientAstronomy #ApacheSpark
In this #InfoQ article, Hina Gandhi explores a #ReinforcementLearning (RL) approach built on #ApacheSpark, enabling distributed computing systems to autonomously learn optimal configurations.
📰 Read now: bit.ly/4thGGAf
#AI #bigdata #database #AIagents #InfoQ
Pinterest shared a deep dive into Moka; its new blueprint for the future of large-scale data processing.
They’re moving core workloads from legacy Hadoop to a #Kubernetes platform on Amazon EKS, with #ApacheSpark as the main engine – and support for more frameworks is coming soon.
👉 bit.ly/4bf1EJq
Agoda consolidated multiple independent data pipelines into a central #ApacheSpark platform, eliminating financial data inconsistencies.
A multi-layered quality framework ensures accurate financial metrics while handling millions of daily bookings.
⇨ bit.ly/45jKGpt
#InfoQ #DataPipelines #AI
Unpopular opinion: #Dataengineering is harder than #software #engineering, and we don't talk about it enough 🔥
Why data engineering ranks TOP 6 skills shortage:
Anyone else been in the "Spark OOM at 90% #progress" situation? 😅
www.kbstraining.com/blog/data-en...
#BigData #ApacheSpark #Airflow
Apache Spark's new Declarative Pipelines framework automates orchestration and error handling while engineers define transformations. Handles batch + streaming workloads with Python/SQL interfaces. Cuts boilerplate for complex data pipelines. Productivity win. #ApacheSpark #ETL
Discover how Decathlon, one of the world’s leading sports retailers, adopted the #opensource library Polars to optimize its data workflows.
By migrating from #ApacheSpark to #Polars for small input datasets, Decathlon achieved:
• Significant speed
• Cost savings
👉 bit.ly/4atNCTY
#InfoQ #AI
#CaseStudy - #Lyft rearchitected its ML platform, LyftLearn, into a hybrid system!
Offline workloads now run on AWS SageMaker, while Kubernetes continues to power online model serving.
The result❓ Read #InfoQ and find out 👉 bit.ly/4s4mf9j
#SoftwareArchitecture #AI #ML #ApacheSpark #Kubernetes
Distributed computing Advent Calendar 2025 の8日目の記事書きました🎄
Spark Operator特集・2日目 ハンズオン編:kubeflow/spark-operatorでSparkアプリをK8sにデプロイする - やっさんメモ yassan.hatenablog.jp/entry/advent...
#アドカレ #ApacheSpark #Kubernetes #k3d
🚀 New Lab Replay: Using Delta Tables in Apache Spark (Microsoft Fabric)
🎥 Watch the full session:
👉 www.youtube.com/live/gT21FS8...
#MicrosoftFabric #DeltaTables #ApacheSpark #DeltaLake #DP600 #DP700 #Lakehouse #DataEngineering #BigData #ACID #TimeTravel #SparkSQL #PySpark #MicrosoftLearn
🔥 New Lab Replay: Analyze Data with Apache Spark in Microsoft Fabric
🎥 Watch the full lab session:
👉 www.youtube.com/live/lsv2Oi8...
#MicrosoftFabric #ApacheSpark #SparkAnalytics #DP600 #DP700 #Lakehouse #PySpark #DeltaTables #BigData #DataEngineering #Analytics #FabricCommunity
⚙️ Optimiza UDFs en #Python para Arrow en Spark
✳️ El uso de UDFs en PySpark ha sido una solución flexible pero ineficiente
✳️ Desde #ApacheSpark 3.5, la integración con #ApacheArrow ha supuesto una mejora significativa de rendimiento
➡️ blog.damavis.com/como-optimiz...
#Spark #Arrow
Spark Streaming vs Structured Streaming — Key Differences
#apachespark
thedataforge.medium.com/spark-stream...
In this #InfoQ #podcast, Vivek Yadav shares his journey in building a testing system based on multi-year worth of data.
Discover why he chose #ApacheSpark and how it integrates with “traditional” engineering practices.
🎧Listen now: bit.ly/48Cs6uK
#SoftwareDevelopment #Testing #BigData #Database
The latest update for #DreamFactory includes "#PHP Configuration Essentials for DreamFactory: Critical Settings You Need to Know" and "DreamFactory #ApacheSpark and Databricks Integration: REST #APIs for Delta Lake and #Unity Catalog".
#iPaaS #DevOps #API https://opsmtrs.com/2ZoHHgr
You maybe missed it but #ApacheSpark can now log OOTB (using spark-submit) in #JSON: spark.log.structuredLogging.enabled=true
this is big for integration and ops!
Replay Available! DP-600 Lab – Ingest Data with a Pipeline in Microsoft Fabric
🎥 Watch the full session:
www.youtube.com/live/aQcbroJ...
#MicrosoftFabric #DP600 #FabricAnalyticsEngineer #DataPipelines #OneLake #ApacheSpark #ETL #ELT #DataEngineering #DeltaTables #Lakehouse #PowerBI #FabricCommunity
The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark
#apachekafka
#apacheflink
#apachespark
medium.com/@balaji.raja...
Apache Kafka (Kafka Connect) vs. Apache Flink vs. Apache Spark: Choosing the Right Ingestion Framework
#apachekafka
#apacheflink
#apachespark
www.onehouse.ai/blog/kafka-c...
Everything you need to know about Spark Structured Streaming
#apachespark
From its architecture, event-time processing, stateful processing to how it achieves fault tolerance.
vutr.substack.com/p/everything...
IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data.
#apachespark
github.com/indextables/...
Meet Spark Analyzer – a free tool to unearth Apache Spark bottlenecks
#apachespark
www.onehouse.ai/blog/meet-sp...
What makes #ApacheSpark Delta Tables a game-changer?
It's all about features like time travel, data skipping, & auto-optimization. This blog shows how they make #datamanagement simpler and more reliable.
Read the blog here 👉 antt.me/XXbnTnut
#DataEngineering #AntStack
Linux for Data Science: Tools and Distros You Need to Know
techrefreshing.com/linux-for-da...
#LinuxForDataScience #DataScience #MachineLearning #Python #JupyterNotebook #Ubuntu #Fedora #DATLinux #ApacheSpark #Grafana #OpenSource
Nuevo Podcast #AWSlatam 🎤 - EP289: Mejores Prácticas de Amazon Athena
#AmazonAthena #DataArchitecture #CostOptimization #ApacheSpark #BestPractices
Understanding Apache Spark's breakdown into Jobs, Stages, and Tasks is essential for performance tuning. #ApacheSpark #DataEngineering blog.stackademic.com/calculating-jobs-stages-...
Graphic showing an overview of how kube-scheduler filters and scores nodes for pod binding.
ICYMI: Abe Sharp looks at Volcano, a @cncf.io project that optimizes high-performance workloads on Kubernetes to avoid deadlocks
www.admin-magazine.com/Archive/2025...
#Kubernetes #scheduler #Volcano #CNCF #Queue #PodGroup #ApacheSpark #PyTorch #MachineLearning
Boost your data lake performance with #ApacheSpark Delta Tables.
The latest blog post breaks down key features like Time Travelling, Data Skipping, and more for better efficiency.
Read the full blog to learn more!👇
antt.me/XXbnTnut
#DeltaLake #DataEngineering #AntStack
🚀 Choosing the right data analytics platform in 2024? Ataira breaks down the top contenders📊
🔗 Comparing Popular Data Analytics Products in 2024 #DataAnalytics #PowerBI #Tableau #ApacheSpark #BusinessIntelligence #CloudComputing #TechTrends #DigitalTransformation