#data engineering

23 posts

20 Jul

Deepika Saini 20 Jul 2026 5 min read

Why Most Single Source of Truth Initiatives Fail (And What Successful Teams Do Differently)

“We have multiple dashboards showing different numbers. Which one is correct?” If you’ve worked in data long enough, you’ve probably heard this question more times than you’d like. Sales reports one revenue figure. Finance reports another. Product Analytics has a third. Executives spend more time debating whose dashboard is correct than discussing what action to take. The natural response is…

leadership databricks data-engineeringdata-architecturedata-strategy

13 Jul

Netflix Technology Blog 13 Jul 2026 23 min read

Building Service Topology at Scale: Architecture, Challenges, and Lessons Learned

Netflix Technology Blog

By Parth Jain , Rakesh Sukumar , Yingwu Zhao , Renzo Sanchez-Silva & Nathan Fisher A deep dive into the engineering challenges of building a real-time service dependency map at Netflix scale: from streaming architectures and distributed aggregation pipelines to time-travel queries and the methodology that made it work. Introduction In our first post , we introduced the problem: engineers…

backend-developmentdistributed-systems data-engineering observability software-engineering

13 Jul 2026 1 min read

Ultra-Fast Anomaly Detection using Apache Spark Real-Time Mode

Databricks

This post establishes a reusable pattern for operational workloads that genuinely move the needle: fraud detection...

data engineering

30 Jun

Sagibhuvana 30 Jun 2026 7 min read

Using LLMs to Analyze Spark SQL Plans: A Practical Approach to Debugging Long-Running Jobs

Expedia

Expedia Group Technology — Innovation Using large language models to reveal bottlenecks in Spark SQL execution plans Photo by Luis del Río If you’ve ever stared at a 300-plus-node physical plan at 2 a.m. trying to spot a missing broadcast or one cursed skewed partition, this is for you. Spark makes it deceptively easy to write complex SQL that looks…

apache-spark big-data innovation llm data-engineering

18 Jun

18 Jun 2026 1 min read

How Stagwell built privacy-safe ID matching on Databricks

Databricks

The identity matching problem brands face todayBrands invest heavily in building first-party data assets...

platform solutions engineering data engineering industries

Manav Mehta 18 Jun 2026 7 min read

Building a Centralized Alerting Framework for Data Quality Monitoring and Incident Management

Helpshift

Building a Centralized Alerting Framework for Data Quality using Snowflake Before We Knew Better As our data platform grew, so did the number of pipelines, scheduled tasks, and data quality checks running every day. While Snowflake provided a reliable platform for storing and processing data, operational monitoring was fragmented across multiple systems. Data quality failures were often discovered only after…

snowflake incident-managementdata-qualitydata-engineering observability

11 Jun

11 Jun 2026 1 min read

Ingesting the Milky Way: Petabyte-Scale with Zerobus Ingest

Databricks

Telemetry data is everywhere. IoT sensors on factory floors. Satellite arrays scanning...

engineering data engineeringdata streaming

9 Jun

Patrick Lam 9 Jun 2026 9 min read

Scaling beyond one: How Airbnb evolved its data architecture for a multi-product world

Airbnb

How Airbnb’s data engineers and analytics engineers built a consistent and flexible data modeling framework to support the expansion into Homes, Experiences, and Services. By : Patrick Lam , Namrata Lamba , Jamie Stober With the May 2025 Summer Release, Airbnb redesigned its app, relaunched Experiences, and debuted Services, pushing us beyond our traditional Homes focus. For the data teams,…

data-engineeringanalytics-engineeringtechnologydata-modelingdata-architecture

3 Jun

Poorva Patil 3 Jun 2026 6 min read

Migrating from a Monolithic Orchestrator to Apache Airflow

Helpshift

Photo by Corinne Kutz on Unsplash Before we knew better Our orchestration system started as a simple internal solution to manage event pipelines and trigger downstream jobs. Over time, as more workflows and dependencies were added, it gradually evolved into a tightly coupled monolithic scheduler that became increasingly difficult to understand and maintain. Understanding how a workflow executed often meant…

etlapache-airflowaws data-engineering software-architecture

25 May

Aarav Nigam 25 May 2026 7 min read

POIs for Hyperlocal Delivery: A Data-Centric Approach to the Last-Last-Mile

Swiggy Bytes

Authors: Charan , Aarav Nigam Special thanks to Meghana Negi for her contribution and guidance throughout the project. Introduction In hyperlocal delivery, finding a customer’s location is only half the problem. A latitude-longitude pin can tell us where a delivery ends on the map, but not how a delivery partner should interpret that location in the real world. In dense…

point-of-interestdata-engineeringhyperlocal-deliverygeospatial-dataaddress-resolution

5 May

Mahendran Vasagam 5 May 2026 13 min read

From SSH to REST: A Security-Driven Modernization of Slack’s EMR Data Pipelines

Slack

Excerpt By 2024, Slack’s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We’re talking daily search indexing that processed terabytes of data, analytics jobs powering business intelligence, the whole shebang. Every single one of these jobs required direct SSH access to production AWS Elastic MapReduce (EMR) clusters. We had a massive security…

uncategorizedairflowaws big-data data-engineering

22 Jan

Abhishek Bharti 22 Jan 2026 6 min read

High-Risk, High-Scale: Guaranteeing Ad Budget Precision at 1 Million Events/Second

Flipkart Tech

Flipkart serves thousands of sponsored ads across various pages and with search queries. Advertisers are charged based on the impression views or clicks generated by their campaign content. In the high-velocity domain of AdTech, the latency between an ad impression/click and a budget deduction represents a direct financial risk . If the system lags, advertisers overspend; if it blocks, revenue…

apache-flinkadtechlambda-architecturestream-processingdata-engineering

2 May 2025

Sameeksha Bhatia 2 May 2025 7 min read

Load Testing API’s on Redshift & Snowflake — A Quick POC

Helpshift

Load Testing API’s on Redshift & Snowflake — A Quick POC Overview At Helpshift, our data platform follows a Lakehouse architecture , combining the best of both data lakes and data warehouses . This architecture allows us to store and analyze large amounts of raw data in a structured and organized manner, while also providing the scalability and low-cost storage…

load-testingdata-engineering snowflakeredshiftperformance

2 Jul 2024

Nilanjana Mukherjee 2 Jul 2024 9 min read

Unlocking Efficiency and Performance: Navigating the Spark 3 and EMR 6 Upgrade Journey at Slack

Slack

Slack Data Engineering recently underwent data workload migration from AWS EMR 5 (Spark 2/Hive 2 processing engine) to EMR 6 (Spark 3 processing engine). In this blog, we will share our migration journey, challenges, and the performance gains we observed in the process. This blog aims to assist Data Engineers, Data Infrastructure Engineers, and Product…

uncategorized analytics aws big-data data-engineering

8 May 2024

Lakshmi Mohan 8 May 2024 8 min read

How Women Lead Data Engineering at Slack

Slack

The Data Engineering team is responsible for Slack’s data lake, analytics dashboards, and other data services. The team’s mission is to empower users to leverage data to make decisions quickly, accurately, and easily. Slack’s data lake grew in size from sub-petabyte to over 100 petabytes in recent years and it now spans millions of tables.…

data-engineering

8 Jan 2024

Bisman Sodhi 8 Jan 2024 4 min read

AN EVENTFUL SUMMER AT STRAVA

Strava

Hi my name is Bisman and I studied Computer Science at University of California, Santa Barbara. During summer of 2022, I had the most amazing experience working as a Software Engineer Intern on Strava’s Data Platform Team. In the first fews weeks, I learned the tools my team uses and then spent the rest of the time working on my…

software-engineeringdata-platformsdata-engineering

9 Oct 2023

Johnny Cao 9 Oct 2023 7 min read

Unleashing Impact at Slack’s Data Engineering Internship

Slack

Introduction Ever wondered what it’s like to intern as a software engineer at Slack? Picture yourself on the famous Ohana floor—the 61st floor of the Salesforce Tower in San Francisco— it is one of many privileges we had as interns. Not only did our experience with Slack’s Data Engineering team let us step onto the…

uncategorizedairflowdata-engineering internships search

28 Apr 2023

Lou Kratz 28 Apr 2023 7 min read

What was Old is New: Finding Joy in Modernising Legacy Systems

Bazaarvoice

(cover image from ThisisEngineering RAEng) Let’s face it: software is easier to write than maintain. This is why we, as software engineers, prefer to just “rip it out and start over” instead of trying to understand what another developer (or our past self) was thinking. We seem to have collectively forgotten that “programs must be […]

uncategorized artificial intelligence awsaws sagemakerdata engineering

13 Sept 2022

Lakshmi Mohan 13 Sept 2022 8 min read

Our Hybrid and Collaborative Summer @ Slack’s Data Eng

Slack

An internship at Slack is an exciting opportunity to learn new skills, meet other engineers, and build cool stuff. This was the reality for three interns on the Data Engineering team this summer. Throughout our time in this flex-work environment, we got to experience both the wide reach of the virtual environment and the benefits…

uncategorized data-engineering internships

17 Aug 2021

Samuel Bock 17 Aug 2021 8 min read

Data Lineage at Slack

Slack

Reinventing how the world does work inevitably creates a lot of data. Each year, Slack’s scale has increased and the volume of data ingested and stored has kept pace. To make it possible to understand relationships within our data, we’ve invested heavily in an automated data lineage framework. This facilitates producer/consumer coordination, improves risk mitigation,…

uncategorized big-data data-engineering

28 Jul 2021

Sarah Henkens 28 Jul 2021 10 min read

Email Classification

Slack

With the release of Slack Connect, people can now collaborate both with internal employees and external organizations in the same channel. To make this as smooth as possible, Slack does predictive email analysis to classify and recommend the best way for a user to work with people they want to collaborate with. To accomplish this,…

uncategorized algorithms data-engineering infrastructure