#apacheSpark — Bluesky Posts

1 month ago

In this #InfoQ article, Hina Gandhi explores a #ReinforcementLearning (RL) approach built on #ApacheSpark, enabling distributed computing systems to autonomously learn optimal configurations.

📰 Read now: bit.ly/4thGGAf

#AI #bigdata #database #AIagents #InfoQ

1 1 0 0

1 month ago

Pinterest shared a deep dive into Moka; its new blueprint for the future of large-scale data processing.

They’re moving core workloads from legacy Hadoop to a #Kubernetes platform on Amazon EKS, with #ApacheSpark as the main engine – and support for more frameworks is coming soon.

👉 bit.ly/4bf1EJq

0 0 0 0

Data Engineering Job Support USA: Real-Time Help with ETL & Big Data | Spark, Airflow & Data Pipeline Support | KBS Training Expert data engineering support in USA for ETL pipelines, Spark, Airflow & big data. Real-time help from experienced data engineers. Data skills in top 6 shortage areas—we bridge the gap.

1 month ago

Agoda consolidated multiple independent data pipelines into a central #ApacheSpark platform, eliminating financial data inconsistencies.

A multi-layered quality framework ensures accurate financial metrics while handling millions of daily bookings.

⇨ bit.ly/45jKGpt

#InfoQ #DataPipelines #AI

0 0 0 0

KBS Training

@kbstraining.bsky.social

2 months ago

Unpopular opinion: #Dataengineering is harder than #software #engineering, and we don't talk about it enough 🔥
Why data engineering ranks TOP 6 skills shortage:

Anyone else been in the "Spark OOM at 90% #progress" situation? 😅

www.kbstraining.com/blog/data-en...

#BigData #ApacheSpark #Airflow

0 0 0 0

LavX News

@lavxnews.bsky.social

2 months ago

Apache Spark's new Declarative Pipelines framework automates orchestration and error handling while engineers define transformations. Handles batch + streaming workloads with Python/SQL interfaces. Cuts boilerplate for complex data pipelines. Productivity win. #ApacheSpark #ETL

0 0 1 0

2 months ago

Discover how Decathlon, one of the world’s leading sports retailers, adopted the #opensource library Polars to optimize its data workflows.

By migrating from #ApacheSpark to #Polars for small input datasets, Decathlon achieved:
• Significant speed
• Cost savings

👉 bit.ly/4atNCTY

#InfoQ #AI

2 0 0 0

Spark Operator特集・2日目 kubeflow/spark-operator×k3dで学ぶ実践Spark on Kubernetes - やっさんメモ kubeflow/spark-operator v2.4.0 と Apache Spark 4.0.1 を使い、k3d 上にマルチノードクラスタを構築して SparkApplication を動かす手順を解説します。S3 互換ストレージと History Server 連携、PySpark＋PostgreSQL、Spark Operator の本番運用ポイントまでまとめた実践記事です。

2 months ago

#CaseStudy - #Lyft rearchitected its ML platform, LyftLearn, into a hybrid system!

Offline workloads now run on AWS SageMaker, while Kubernetes continues to power online model serving.

The result❓ Read #InfoQ and find out 👉 bit.ly/4s4mf9j

#SoftwareArchitecture #AI #ML #ApacheSpark #Kubernetes

3 0 0 0

yassan

@yassan168.bsky.social

3 months ago

Distributed computing Advent Calendar 2025 の8日目の記事書きました🎄

Spark Operator特集・2日目ハンズオン編：kubeflow/spark-operatorでSparkアプリをK8sにデプロイする - やっさんメモ yassan.hatenablog.jp/entry/advent...
#アドカレ #ApacheSpark #Kubernetes #k3d

0 0 0 0

Burak Gürsoy

@burak.gursoy.social.ap.brid.gy

3 months ago

Scaling #ApacheSpark at #OpenAI

https://www.youtube.com/watch?v=Ek9dGFnih3c

1 1 0 0

Ilgar Zarbaliyev

@ilgarz.bsky.social

3 months ago

🚀 New Lab Replay: Using Delta Tables in Apache Spark (Microsoft Fabric)
🎥 Watch the full session:
👉 www.youtube.com/live/gT21FS8...

#MicrosoftFabric #DeltaTables #ApacheSpark #DeltaLake #DP600 #DP700 #Lakehouse #DataEngineering #BigData #ACID #TimeTravel #SparkSQL #PySpark #MicrosoftLearn

0 0 0 0

Ilgar Zarbaliyev

@ilgarz.bsky.social

3 months ago

🔥 New Lab Replay: Analyze Data with Apache Spark in Microsoft Fabric
🎥 Watch the full lab session:
👉 www.youtube.com/live/lsv2Oi8...
#MicrosoftFabric #ApacheSpark #SparkAnalytics #DP600 #DP700 #Lakehouse #PySpark #DeltaTables #BigData #DataEngineering #Analytics #FabricCommunity

0 0 0 0

Damavis

@damavis.bsky.social

3 months ago

Cómo optimizar UDFs en Python para Arrow en Spark Cómo mejorar el rendimiento optimizando funciones UDFs de Python para Apache Arrow con la llegada de la nueva versión de Apache Spark 3.5

⚙️ Optimiza UDFs en #Python para Arrow en Spark

✳️ El uso de UDFs en PySpark ha sido una solución flexible pero ineficiente
✳️ Desde #ApacheSpark 3.5, la integración con #ApacheArrow ha supuesto una mejora significativa de rendimiento
➡️ blog.damavis.com/como-optimiz...

#Spark #Arrow

0 0 0 0

🚀 Spark Streaming vs Structured Streaming — Key Differences 💡 Streaming isn’t just about speed — it’s about structure, consistency, and reliability.

3 months ago

Spark Streaming vs Structured Streaming — Key Differences

#apachespark

thedataforge.medium.com/spark-stream...

0 0 0 0

DreamFactory Never Build an API Again. An enterprise-grade API as a service platform available in the cloud or on-premise. Generate database APIs instantly to build applications faster.

3 months ago

In this #InfoQ #podcast, Vivek Yadav shares his journey in building a testing system based on multi-year worth of data.

Discover why he chose #ApacheSpark and how it integrates with “traditional” engineering practices.

🎧Listen now: bit.ly/48Cs6uK

#SoftwareDevelopment #Testing #BigData #Database

1 0 0 0

OpsMatters

@opsmatters.com

3 months ago

The latest update for #DreamFactory includes "#PHP Configuration Essentials for DreamFactory: Critical Settings You Need to Know" and "DreamFactory #ApacheSpark and Databricks Integration: REST #APIs for Delta Lake and #Unity Catalog".

#iPaaS #DevOps #API https://opsmtrs.com/2ZoHHgr

1 0 0 0

Romain Manni-Bucau

@rmannibucau.bsky.social

3 months ago

You maybe missed it but #ApacheSpark can now log OOTB (using spark-submit) in #JSON: spark.log.structuredLogging.enabled=true

this is big for integration and ops!

0 0 0 0

Ilgar Zarbaliyev

@ilgarz.bsky.social

3 months ago

Replay Available! DP-600 Lab – Ingest Data with a Pipeline in Microsoft Fabric
🎥 Watch the full session:
www.youtube.com/live/aQcbroJ...

#MicrosoftFabric #DP600 #FabricAnalyticsEngineer #DataPipelines #OneLake #ApacheSpark #ETL #ELT #DataEngineering #DeltaTables #Lakehouse #PowerBI #FabricCommunity

1 0 0 0

The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark Read the full story for free …

4 months ago

The Stream Processing Showdown: Kafka Streams vs. Flink vs. Spark

#apachekafka
#apacheflink
#apachespark

medium.com/@balaji.raja...

0 0 0 0

Apache Kafka® (Kafka Connect) vs. Apache Flink® vs. Apache Spark™: Choosing the Right Ingestion Framework This article compares three data ingestion frameworks—Kafka, Flink, and Spark—highlighting their unique strengths, use cases, and performance capabilities.

4 months ago

Apache Kafka (Kafka Connect) vs. Apache Flink vs. Apache Spark: Choosing the Right Ingestion Framework

#apachekafka
#apacheflink
#apachespark

www.onehouse.ai/blog/kafka-c...

0 0 0 0

Everything you need to know about Spark Structured Streaming From its architecture, event-time processing, stateful processing to how it achieves fault tolerance.

4 months ago

Everything you need to know about Spark Structured Streaming

#apachespark

From its architecture, event-time processing, stateful processing to how it achieves fault tolerance.

vutr.substack.com/p/everything...

0 0 0 0

GitHub - indextables/indextables_spark: IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data. It integrates seamle... IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data. It integrates seamlessly with Spark SQL, allowing you to ...

5 months ago

IndexTables is an experimental open-table format for Apache Spark that enables fast retrieval and full-text search across large-scale data.

#apachespark

github.com/indextables/...

0 0 0 0