Home New Trending Search
About Privacy Terms
#
#PySpark
Posts tagged #PySpark on Bluesky
Preview
GitHub - CodingJhames/de-zoomcamp-james Contribute to CodingJhames/de-zoomcamp-james development by creating an account on GitHub.

Week 7 #DataEngineering Zoomcamp 🏎️
Streamed 4.4M records via #Redpanda & #PySpark on my James-T-850. Speed is nothing without logic!

Results: 📍
📏 Dist: 9506
🏙️ Zone: 74
⏳ Session: 31m
💰 Peak Tip: 10-16 18:00

Progress: github.com/CodingJhames...

#Streaming #Python #BigData #DataTalksClub

0 0 0 0
Post image

Module 6 complete: Batch Processing with Spark by
@datatalks.bsky.social

Spark correctness depends on details like partitioning and timestamp handling, not only on writing transformation

#DataEngineering #Spark #PySpark #BatchProcessing

0 0 0 0
Preview
GitHub - CodingJhames/de-zoomcamp-james Contribute to CodingJhames/de-zoomcamp-james development by creating an account on GitHub.

Data Engineering Week 5: Done! 🏁

Pivoted to #AWS from GCP. ☁️
Ran #PySpark on a t3.micro (1GB RAM) using a 4GB Swapfile. Processed NYC Taxi data smoothly without crashes. 🧠

Adaptability > Tools. 🦾

Code: github.com/CodingJhames...

#DataEngineering #Spark #AWS #OpenSource

2 0 3 0
Awakari App

How to Write PySpark Code That Prevents Broadcast Joins from Blowing Up Executors Defensive patterns to control broadcast size, memory pressure, and executor stability in PySpark jobs Continue read...

#data-science #big-data #machine-learning #python #pyspark

Origin | Interest | Match

0 0 0 0
Post image Post image

Last week, we nailed an epic internal meetup: "From Pandas to PySpark: Thinking Spark-native" 🐼✨

Beyond code tweaks – full mindset shift to scalable PySpark workflows. Pro tips for distributed speed boost!

These sessions level up our data game. 🚀

#DataEngineering #PySpark

1 1 0 0
Preview
LinkedIn Data Scientist PySpark (Hard — Level) Interview Problem — Solution in Detail Steps Problem Based on User Continuous Streak on Visiting a Platform

Hello data scientists,
Here is an article published, LinkedIn Data Scientist PySpark (Hard — Level) Interview Problem — Solution in Detail Steps.

#DataScientists #PySpark #DataEngineers #Data #Medium #Articles #DataEngineering

medium.com/meanlifestud...

0 0 0 0
Preview
Amazon Data Engineer PySpark (Medium — Level) Interview Question — Solution Reinforce Your PySpark Scripting Skills by Solving Different Kinds of Problems

Hello Data Engineers,
Here is an article published on Amazon Data Engineer PySpark (Medium — Level) Interview Question — Solution.

#dataengineers #pyspark #dataengineering #dataanalytics #bigdata #medium #articles

medium.com/meanlifestud...

0 0 0 0
Post image

📌 3 ejemplos de cómo el rendimiento de las UDFs en PySpark mejora considerablemente con #Arrow

⚙️ La integración de Arrow en #Spark 3.5 elimina el costoso proceso de serialización entre la JVM y #Python optimizando así las UDFs

➡️ blog.damavis.com/como-optimiz...

#BigData #PySpark

0 0 0 0
Post image

⚙️ Las UDFs permiten personalizar las operaciones que se realizan con datos
🐍 Se definen en #Python y se aplican a las columnas de un DataFrame
⚠️ Su uso puede suponer una pérdida de rendimiento si no se optimizan de la forma adecuada

➡️ blog.damavis.com/como-optimiz...

#PySpark #Arrow

0 0 0 0
Post image

🚀 New Lab Replay: Using Delta Tables in Apache Spark (Microsoft Fabric)
🎥 Watch the full session:
👉 www.youtube.com/live/gT21FS8...

#MicrosoftFabric #DeltaTables #ApacheSpark #DeltaLake #DP600 #DP700 #Lakehouse #DataEngineering #BigData #ACID #TimeTravel #SparkSQL #PySpark #MicrosoftLearn

0 0 0 0
Post image

🔥 New Lab Replay: Analyze Data with Apache Spark in Microsoft Fabric
🎥 Watch the full lab session:
👉 www.youtube.com/live/lsv2Oi8...
#MicrosoftFabric #ApacheSpark #SparkAnalytics #DP600 #DP700 #Lakehouse #PySpark #DeltaTables #BigData #DataEngineering #Analytics #FabricCommunity

0 0 0 0

Ya disponible La Experimental #14

🌐 Tendencias #web
💻 Gestión de #Git hooks
🧑🏻‍💻 Diseño #TUI con #GoLang
🐍 #Python sin GIL
💾 Guía de #PySpark SQL
🤖 Agente #IA local
🐧 Guía de seguridad #Linux
🌩️ Monitorización #SelfHosted
💼 Informe laboral #Tech de #manfred

Link: open.substack.com/pub/laexperi...

2 0 0 0
Preview
Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study Food price volatility has always been a sensitive issue across Kenya. From urban households in...

Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study Food price volatility has always been a sensitive issue across Kenya. From urban households in Nairo...

#spark #pyspark #grafana #dataengineering

Origin | Interest | Match

1 0 0 0

Y isn't #Rust replacing #scala and #pyspark as the main functional language in #spark? Is there an alternative to #spark that is built on #rust?

0 0 1 0
Post image

👨‍💻📝🐍 #python #pandas 🆚 #pyspark

0 0 0 0
Post image

What is the default engine used in Fabric Notebooks?
The default engine is PySpark, which runs on top of the Apache Spark engine.
#MicrosoftFabric #FabricNotebooks #PySpark #ApacheSpark #BigData #DataEngineering #PowerBI #DataPlatform #OneLake #FabricCommunity #DP700 #SparkEngine #DataProcessing

0 0 0 0
Post image

What languages can be used in Fabric Notebooks?
Microsoft Fabric Notebooks support:
🔹 PySpark
🔹 Spark (Scala)
🔹 SparkSQL
🔹 SparkR (R)
🔹 HTML
#MicrosoftFabric #FabricNotebooks #PySpark #SparkSQL #SparkR #Scala #BigData #DataEngineering #DataScience #OneLake #FabricCommunity #DataPlatform #DP700

1 0 0 0
Cloud Data Driven | 2025-07-17 | Intro to PySpark in Microsoft Fabric​ | Jared Kuehn
Cloud Data Driven | 2025-07-17 | Intro to PySpark in Microsoft Fabric​ | Jared Kuehn YouTube video by DataDrivenCommunity

📣 Missed the community meetup from July 17nd, with Jared Kuehn and Ronen Ariely?

🚀 Dive into #PySpark in #Microsoft #Fabric with Jared Kuehn - a powerhouse speaker and veteran data engineer - as he demystifies how to work with PySpark in #MicrosoftFabric.

youtu.be/Y4Uxnj0CAeA?...

1 0 0 0
Post image

What are Fabric Notebooks best suited for?
They’re ideal for:
🔹 Handling large external datasets
🔹 Performing complex data transformations
🔹 Running custom code in languages like PySpark, SQL, or Scala
#MicrosoftFabric #FabricNotebooks #PySpark #BigData #DataTransformation #DataEngineering #PowerBI

0 0 0 0
Post image

📈 Monitoriza tus métricas con #Spark y #Prometheus

1️⃣ Requisitos previos
2️⃣ #Pyspark
3️⃣ JMX Exporter: ¿Qué es y cómo se configura?
4️⃣ Ejecución de Spark
5️⃣ Configuración de Prometheus

➡️ blog.damavis.com/integracion-...

#ApacheSpark #BigData #DataEngineering

1 0 0 0
Preview
Introduction to PySpark in Microsoft Fabric, with Jared Kuehn, Thu, Jul 17, 2025, 12:00 PM | Meetup **Presentation Title:** Introduction to PySpark in Microsoft Fabric **Description:** With all of the engineering features in Microsoft Fabric, which medium should you use

Unlock the power of #PySpark in #Microsoft #Fabric with Jared Kuehn!

Learn #Spark management, #Python tips, and boost #performance in this live event 🚀

🗓 July 17, 12 PM EDT
🎤 Hosted by Ronen Ariely @pitoach.bsky.social

👉 www.meetup.com/cloud-data-d...

#MicrosoftFabric #DataEngineering

2 1 0 0
Post image

🚀 Starting a new series: #PySpark + #AI
What happens when distributed computing meets intelligent automation?
I'm documenting hands-on work integrating PySpark with ML & LLMs (LangChain, Azure, etc).
Let's bridge Big Data + Smart Logic.
#DataScience #MLOps #LLM #BigData

1 0 0 0

PySpark: Read CSV like a pro

df = spark.read.csv("data.csv", header=True, inferSchema=True)
df.show(3)

✅ Auto schema
✅ Header as columns
✅ Ready to transform

Small win, big impact.
#PySpark #DataEngineer #BigData #xavierdatatech

2 0 0 0
Post image

🚀 Working with #PySpark in the cloud — juggling multiple #DataFrames in parallel.

🔍 Combining filter(), select(), and join() efficiently is teaching me how to optimize both loading and exploration on large datasets.

#BigData #Databricks #DataEngineering #ApacheSpark

1 0 0 0
Post image

🚀 Unlocking Big Data Potential with PySpark!
Key Features:
🔹 Spark SQL
🔹 Spark MLlib
🔹 Spark Streaming
🔹 DataFrame API

#PySpark #BigData #DataScience #ApacheSpark #MachineLearning #DataEngineering #XavierDataTech

1 0 0 0
Post image

Top 8 Data Visualization Libraries

#Python
#PySpark #SQL #BigData #Databricks #BusinessIntelligence #DataEngineering #PowerBI #DataAnalytics #SparkSQL #XavierDataTech

3 0 0 0
Post image

🚀 Working with PySpark SQL? Here's a quick and powerful example!

You can query DataFrames using SQL syntax in Spark — great for teams coming from SQL backgrounds.

#PySpark #BigData #SparkSQL #DataEngineering #ETL #ApacheSpark #SQL #DataScience #XavierDataTech

2 0 0 0
Post image

Supported chart types: scatter, line, bar, area, pie, histogram, box, and KDE — optimized for Spark performance with smart sampling.

#PySpark #BigData #AI #DataVisualization #Spark40 #DataScience #MLOps #XavierDataTech #Databricks

databricks.com/blog/pyspark-n…

2 0 0 0
Post image

🚀 PySpark in the Cloud:
💾 DataFrames · Delta Lake · Databricks
📊 Power BI Export · Semantic Layer via LangChain

🔁 Real-world pipelines, hands-on.
🔗 linkedin.com/in/xavier-mareca

#PySpark #BigData #DataEngineering #PowerBI #LangChain #Azure #AI

2 0 0 0