List: Data Engineering | Curated by Scott Haines

Nov 1, 2024
21 stories
Data EngineeringCollection of interesting or novel finds relating to the wide field of Data Engineering
In
Google Cloud - Community
by
Oscar Pulido
Stop Thinking in Data Pipelines, Think in Data Platforms: Introducing the Analytics Engineering…Imagine a world where you could deploy your entire enterprise-ready data platform in minutes and empower your data practitioners to…
Oct 28, 2024
4
Oct 28, 2024
4
In
Level Up Coding
by
Yousry Mohamed
Delta Lake Liquid Clustering — A visual explanationHow to optimize lakehouse data storage layout with minimal effort.
Jan 28, 2024
3
Jan 28, 2024
3
Steve Russo
Use Rust to Write Spark AppsUntil Spark 3.4, developing and deploying a Spark application was sometimes a big hassle. Getting Spark running locally for development…
Jul 3, 2024
Jul 3, 2024
In
TDS Archive
by
Bernd Wessely
Challenges and Solutions in Data Mesh — Part 2“Data as a Product” is a core principle in Data Mesh. Why the current definition needs adaptation to fully enable the mesh.
May 17, 2024
2
May 17, 2024
2
In
DBSQL SME Engineering
by
Databricks SQL SME
One Big Table vs. Dimensional Modeling on Databricks SQLWhy to use each and best practices in Databricks SQL
May 6, 2024
5
May 6, 2024
5
In
The PayPal Technology Blog
by
Ilay Chen
Leveraging Spark 3 and NVIDIA’s GPUs to Reduce Cloud Cost by up to 70% for Big Data PipelinesHow PayPal achieved a remarkable cloud cost reduction through strategic GPU utilization
Feb 21, 2024
7
Feb 21, 2024
7
Barr Moses
When a Data Mesh Doesn’t Make Sense for Your OrganizationData mesh requires the right mix of process, tooling, and internal resource to be effective. Find out what it takes to get data mesh-ready.
Feb 26, 2024
8
Feb 26, 2024
8
In
Towards AI
by
Muttineni Sai Rohith
Start using Liquid Clustering instead of Partitioning for Delta tables in DatabricksLiquid clustering replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance
Nov 17, 2023
Nov 17, 2023
In
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science
by
Phani Raj
Understanding Iceberg Table MetadataDated: 30-Jan-2023
Jan 31, 2023
2
Jan 31, 2023
2
Shingo OKAWA
OpenTableHub: Data Sharing Platform #1I am currently working on a project named OpenTableHub for a data-sharing social networking service platform. In this series, I would like…
Nov 29, 2023
Nov 29, 2023
Scott Haines
Working with Spark SQL Time FunctionsA Hands-On Guide to Time with Apache Spark
Mar 7, 2023
Mar 7, 2023
In
Dev Genius
by
Apache Doris
Replacing Apache Hive, Elasticsearch and PostgreSQL with Apache DorisSimplicity is the best policy.
Jun 15, 2023
1
Jun 15, 2023
1
In
TDS Archive
by
Mahdi Karabiben
Writing design docs for data pipelinesExploring the what, why, and how of design docs for data components  —  and why they matter.
May 22, 2023
1
May 22, 2023
1
In
Level Up Coding
by
Luís Oliveira
Polars vs PySpark: Testing with Middle Size DataChecking execution time
May 5, 2023
4
May 5, 2023
4
In
TDS Archive
by
Mahmoud Harmouch
Rust: The Next Big Thing in Data ScienceA Contextual Guide for Data Scientists and Analysts
Apr 24, 2023
16
Apr 24, 2023
16
Yousry Mohamed
Delta lake Z-Ordering from A to ZUnderstand how to optimise delta lake tables for high cardinality queries.
Sep 19, 2022
5
Sep 19, 2022
5
In
Better Programming
by
Steve Russo
I Asked ChatGPT to Build a Data Pipeline, and Then I Ran ItYour job might be safe. For now…
Apr 5, 2023
10
Apr 5, 2023
10
Ben Rogojan
Why Is Polars All The RageBy Daniel Beach author of Data Engineering Central
Mar 16, 2023
6
Mar 16, 2023
6
In
TDS Archive
by
Vitor Teixeira
Delta Lake— Keeping it fast and cleanEver wondered how to improve your Delta tables’ performance? Hands-on on how to keep Delta tables fast and clean.
Feb 15, 2023
5
Feb 15, 2023
5
Abhijit Menon
Using ChatGPT3 as a Data EngineerThe world has gone pretty crazy about ChatGPT3, while I was skeptical about it being useful for day-to-day data engineering use cases, that…
Dec 24, 2022
5
Dec 24, 2022
5