Home New Trending Search
About Privacy Terms
#
#apacheSpark
Posts tagged #apacheSpark on Bluesky

Real-time alert broker powered by Apache Spark that processes astronomical transients from sky surveys, enriching cosmic events with ML classifications for breakthrough discoveries.

🔗 https://github.com/astrolabsoftware/fink-broker

#AlertBroker #TransientAstronomy #ApacheSpark

0 0 0 0
Post image

In this #InfoQ article, Hina Gandhi explores a #ReinforcementLearning (RL) approach built on #ApacheSpark, enabling distributed computing systems to autonomously learn optimal configurations.

📰 Read now: bit.ly/4thGGAf

#AI #bigdata #database #AIagents #InfoQ

1 1 0 0
Post image

Pinterest shared a deep dive into Moka; its new blueprint for the future of large-scale data processing.

They’re moving core workloads from legacy Hadoop to a #Kubernetes platform on Amazon EKS, with #ApacheSpark as the main engine – and support for more frameworks is coming soon.

👉 bit.ly/4bf1EJq

0 0 0 0
Post image

Agoda consolidated multiple independent data pipelines into a central #ApacheSpark platform, eliminating financial data inconsistencies.

A multi-layered quality framework ensures accurate financial metrics while handling millions of daily bookings.

⇨ bit.ly/45jKGpt

#InfoQ #DataPipelines #AI

0 0 0 0
Preview
Data Engineering Job Support USA: Real-Time Help with ETL & Big Data | Spark, Airflow & Data Pipeline Support | KBS Training Expert data engineering support in USA for ETL pipelines, Spark, Airflow & big data. Real-time help from experienced data engineers. Data skills in top 6 shortage areas—we bridge the gap.

Unpopular opinion: #Dataengineering is harder than #software #engineering, and we don't talk about it enough 🔥
Why data engineering ranks TOP 6 skills shortage:

Anyone else been in the "Spark OOM at 90% #progress" situation? 😅

www.kbstraining.com/blog/data-en...

#BigData #ApacheSpark #Airflow

0 0 0 0
Post image

Apache Spark's new Declarative Pipelines framework automates orchestration and error handling while engineers define transformations. Handles batch + streaming workloads with Python/SQL interfaces. Cuts boilerplate for complex data pipelines. Productivity win. #ApacheSpark #ETL

0 0 1 0
Post image

Discover how Decathlon, one of the world’s leading sports retailers, adopted the #opensource library Polars to optimize its data workflows.

By migrating from #ApacheSpark to #Polars for small input datasets, Decathlon achieved:
• Significant speed
• Cost savings

👉 bit.ly/4atNCTY

#InfoQ #AI

2 0 0 0
Post image

#CaseStudy - #Lyft rearchitected its ML platform, LyftLearn, into a hybrid system!

Offline workloads now run on AWS SageMaker, while Kubernetes continues to power online model serving.

The result❓ Read #InfoQ and find out 👉 bit.ly/4s4mf9j

#SoftwareArchitecture #AI #ML #ApacheSpark #Kubernetes

3 0 0 0
Preview
Spark Operator特集・2日目 kubeflow/spark-operator×k3dで学ぶ実践Spark on Kubernetes - やっさんメモ kubeflow/spark-operator v2.4.0 と Apache Spark 4.0.1 を使い、k3d 上にマルチノードクラスタを構築して SparkApplication を動かす手順を解説します。S3 互換ストレージと History Server 連携、PySpark+PostgreSQL、Spark Operator の本番運用ポイントまでまとめた実践記事です。

Distributed computing Advent Calendar 2025 の8日目の記事書きました🎄

Spark Operator特集・2日目 ハンズオン編:kubeflow/spark-operatorでSparkアプリをK8sにデプロイする - やっさんメモ yassan.hatenablog.jp/entry/advent...
#アドカレ #ApacheSpark #Kubernetes #k3d

0 0 0 0

Scaling #ApacheSpark at #OpenAI

https://www.youtube.com/watch?v=Ek9dGFnih3c

1 1 0 0
Post image

🚀 New Lab Replay: Using Delta Tables in Apache Spark (Microsoft Fabric)
🎥 Watch the full session:
👉 www.youtube.com/live/gT21FS8...

#MicrosoftFabric #DeltaTables #ApacheSpark #DeltaLake #DP600 #DP700 #Lakehouse #DataEngineering #BigData #ACID #TimeTravel #SparkSQL #PySpark #MicrosoftLearn

0 0 0 0
Post image

🔥 New Lab Replay: Analyze Data with Apache Spark in Microsoft Fabric
🎥 Watch the full lab session:
👉 www.youtube.com/live/lsv2Oi8...
#MicrosoftFabric #ApacheSpark #SparkAnalytics #DP600 #DP700 #Lakehouse #PySpark #DeltaTables #BigData #DataEngineering #Analytics #FabricCommunity

0 0 0 0
Preview
Cómo optimizar UDFs en Python para Arrow en Spark Cómo mejorar el rendimiento optimizando funciones UDFs de Python para Apache Arrow con la llegada de la nueva versión de Apache Spark 3.5

⚙️ Optimiza UDFs en #Python para Arrow en Spark

✳️ El uso de UDFs en PySpark ha sido una solución flexible pero ineficiente
✳️ Desde #ApacheSpark 3.5, la integración con #ApacheArrow ha supuesto una mejora significativa de rendimiento
➡️ blog.damavis.com/como-optimiz...

#Spark #Arrow

0 0 0 0
Preview
🚀 Spark Streaming vs Structured Streaming — Key Differences 💡 Streaming isn’t just about speed — it’s about structure, consistency, and reliability.

Spark Streaming vs Structured Streaming — Key Differences

#apachespark

thedataforge.medium.com/spark-stream...

0 0 0 0
Post image

In this #InfoQ #podcast, Vivek Yadav shares his journey in building a testing system based on multi-year worth of data.

Discover why he chose #ApacheSpark and how it integrates with “traditional” engineering practices.

🎧Listen now: bit.ly/48Cs6uK

#SoftwareDevelopment #Testing #BigData #Database

1 0 0 0
Preview
DreamFactory Never Build an API Again. An enterprise-grade API as a service platform available in the cloud or on-premise. Generate database APIs instantly to build applications faster.

The latest update for #DreamFactory includes "#PHP Configuration Essentials for DreamFactory: Critical Settings You Need to Know" and "DreamFactory #ApacheSpark and Databricks Integration: REST #APIs for Delta Lake and #Unity Catalog".

#iPaaS #DevOps #API https://opsmtrs.com/2ZoHHgr

1 0 0 0

You maybe missed it but #ApacheSpark can now log OOTB (using spark-submit) in #JSON: spark.log.structuredLogging.enabled=true

this is big for integration and ops!

0 0 0 0
Post image

Replay Available! DP-600 Lab – Ingest Data with a Pipeline in Microsoft Fabric
🎥 Watch the full session:
www.youtube.com/live/aQcbroJ...

#MicrosoftFabric #DP600 #FabricAnalyticsEngineer #DataPipelines #OneLake #ApacheSpark #ETL #ELT #DataEngineering #DeltaTables #Lakehouse #PowerBI #FabricCommunity

1 0 0 0
Preview
The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark Read the full story for free …

The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark

#apachekafka
#apacheflink
#apachespark

medium.com/@balaji.raja...

0 0 0 0
Preview
Apache Kafka® (Kafka Connect) vs. Apache Flink® vs. Apache Spark™: Choosing the Right Ingestion Framework This article compares three data ingestion frameworks—Kafka, Flink, and Spark—highlighting their unique strengths, use cases, and performance capabilities.

Apache Kafka (Kafka Connect) vs. Apache Flink vs. Apache Spark: Choosing the Right Ingestion Framework

#apachekafka
#apacheflink
#apachespark

www.onehouse.ai/blog/kafka-c...

0 0 0 0
Preview
Everything you need to know about Spark Structured Streaming From its architecture, event-time processing, stateful processing to how it achieves fault tolerance.

Everything you need to know about Spark Structured Streaming

#apachespark

From its architecture, event-time processing, stateful processing to how it achieves fault tolerance.

vutr.substack.com/p/everything...

0 0 0 0
Preview
GitHub - indextables/indextables_spark: IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data. It integrates seamle... IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data. It integrates seamlessly with Spark SQL, allowing you to ...

IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data.

#apachespark

github.com/indextables/...

0 0 0 0
Preview
Meet Spark Analyzer – a free tool to unearth Apache Spark™ bottlenecks

Meet Spark Analyzer – a free tool to unearth Apache Spark bottlenecks

#apachespark

www.onehouse.ai/blog/meet-sp...

0 0 0 0
Preview
What makes Apache Spark + Delta Tables Nifty? | Build AI-Powered Software Agents with AntStack | Scalable, Intelligent, Reliable Enhance your data lake with Apache Spark + Delta Tables. Explore powerful features like ACID transactions, time travel, and data skipping in this insightful blog.

What makes #ApacheSpark Delta Tables a game-changer?

It's all about features like time travel, data skipping, & auto-optimization. This blog shows how they make #datamanagement simpler and more reliable.

Read the blog here 👉 antt.me/XXbnTnut

#DataEngineering #AntStack

0 0 0 0
Preview
Unleash Your Data Science Potential With Linux: Top Tools & Distros In 2025 Dive into the world of Linux for Data Science! Discover why Linux is the ultimate platform for data scientists, explore top distributions like Ubuntu, Fedora, and DAT Linux, and master essential tools...

Linux for Data Science: Tools and Distros You Need to Know
techrefreshing.com/linux-for-da...
#LinuxForDataScience #DataScience #MachineLearning #Python #JupyterNotebook #Ubuntu #Fedora #DATLinux #ApacheSpark #Grafana #OpenSource

3 0 0 0
Preview
EP289: Mejores Prácticas de Amazon Athena Podcast AWS LATAM · Episode

Nuevo Podcast #AWSlatam 🎤 - EP289: Mejores Prácticas de Amazon Athena

#AmazonAthena #DataArchitecture #CostOptimization #ApacheSpark #BestPractices

1 0 0 0
Preview
https://blog.stackademic.com/calculating-jobs-stages-and-tasks-in-apache-spark-a-data-engineers-guide-fc26a74f0ef8?source=rss----d1baaa8417a4---4 As a data engineer, understanding how Apache Spark breaks down workloads into Jobs, Stages, and Tasks is crucial for performance tuning…

Understanding Apache Spark's breakdown into Jobs, Stages, and Tasks is essential for performance tuning. #ApacheSpark #DataEngineering blog.stackademic.com/calculating-jobs-stages-...

0 0 0 0
Graphic showing an overview of how kube-scheduler filters and scores nodes for pod binding.

Graphic showing an overview of how kube-scheduler filters and scores nodes for pod binding.

ICYMI: Abe Sharp looks at Volcano, a @cncf.io project that optimizes high-performance workloads on Kubernetes to avoid deadlocks
www.admin-magazine.com/Archive/2025...
#Kubernetes #scheduler #Volcano #CNCF #Queue #PodGroup #ApacheSpark #PyTorch #MachineLearning

1 0 0 0
Preview
What makes Apache Spark + Delta Tables Nifty? | Build AI-Powered Software Agents with AntStack | Scalable, Intelligent, Reliable Enhance your data lake with Apache Spark + Delta Tables. Explore powerful features like ACID transactions, time travel, and data skipping in this insightful blog.

Boost your data lake performance with #ApacheSpark Delta Tables.

The latest blog post breaks down key features like Time Travelling, Data Skipping, and more for better efficiency.

Read the full blog to learn more!👇
antt.me/XXbnTnut

#DeltaLake #DataEngineering #AntStack

0 0 0 0
Comparing Popular Data Analytics Products in 2024 - Ataira Comparing Popular Data Analytics Products in 2024 - Choosing the right data analytics product depends on the organization's needs, including budget, technical expertise, and data scale

🚀 Choosing the right data analytics platform in 2024? Ataira breaks down the top contenders📊

🔗 Comparing Popular Data Analytics Products in 2024 #DataAnalytics #PowerBI #Tableau #ApacheSpark #BusinessIntelligence #CloudComputing #TechTrends #DigitalTransformation

2 0 0 0