How Airbnb’s data engineers and analytics engineers built a consistent and flexible data modeling framework to support the expansion into Homes, Experiences, and Services. By : Patrick Lam , Namrata Lamba , Jamie Stober With the May 2025 Summer Release, Airbnb redesigned its app, relaunched Experiences, and debuted Services, pushing us beyond our traditional Homes focus. For the data teams,…
#data-engineering
14 posts
9 Jun
3 Jun
Photo by Corinne Kutz on Unsplash Before we knew better Our orchestration system started as a simple internal solution to manage event pipelines and trigger downstream jobs. Over time, as more workflows and dependencies were added, it gradually evolved into a tightly coupled monolithic scheduler that became increasingly difficult to understand and maintain. Understanding how a workflow executed often meant…
5 May
Excerpt By 2024, Slack’s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We’re talking daily search indexing that processed terabytes of data, analytics jobs powering business intelligence, the whole shebang. Every single one of these jobs required direct SSH access to production AWS Elastic MapReduce (EMR) clusters. We had a massive security…
2 May 2025
Load Testing API’s on Redshift & Snowflake — A Quick POC Overview At Helpshift, our data platform follows a Lakehouse architecture , combining the best of both data lakes and data warehouses . This architecture allows us to store and analyze large amounts of raw data in a structured and organized manner, while also providing the scalability and low-cost storage…
2 Jul 2024
Unlocking Efficiency and Performance: Navigating the Spark 3 and EMR 6 Upgrade Journey at Slack
SlackSlack Data Engineering recently underwent data workload migration from AWS EMR 5 (Spark 2/Hive 2 processing engine) to EMR 6 (Spark 3 processing engine). In this blog, we will share our migration journey, challenges, and the performance gains we observed in the process. This blog aims to assist Data Engineers, Data Infrastructure Engineers, and Product…
8 May 2024
The Data Engineering team is responsible for Slack’s data lake, analytics dashboards, and other data services. The team’s mission is to empower users to leverage data to make decisions quickly, accurately, and easily. Slack’s data lake grew in size from sub-petabyte to over 100 petabytes in recent years and it now spans millions of tables.…
8 Jan 2024
Hi my name is Bisman and I studied Computer Science at University of California, Santa Barbara. During summer of 2022, I had the most amazing experience working as a Software Engineer Intern on Strava’s Data Platform Team. In the first fews weeks, I learned the tools my team uses and then spent the rest of the time working on my…
9 Oct 2023
Introduction Ever wondered what it’s like to intern as a software engineer at Slack? Picture yourself on the famous Ohana floor—the 61st floor of the Salesforce Tower in San Francisco— it is one of many privileges we had as interns. Not only did our experience with Slack’s Data Engineering team let us step onto the…
28 Apr 2023
(cover image from ThisisEngineering RAEng) Let’s face it: software is easier to write than maintain. This is why we, as software engineers, prefer to just “rip it out and start over” instead of trying to understand what another developer (or our past self) was thinking. We seem to have collectively forgotten that “programs must be […]
13 Sept 2022
An internship at Slack is an exciting opportunity to learn new skills, meet other engineers, and build cool stuff. This was the reality for three interns on the Data Engineering team this summer. Throughout our time in this flex-work environment, we got to experience both the wide reach of the virtual environment and the benefits…
19 Oct 2021
A real data lake. Traditional Data Engineering relies on products such as Airflow, Hadoop, Spark and Spark-based architectures, or similar technologies. These are still viable solutions for a number of reason, not least the fact that Data Engineers are few and far between, and the vast majority of them will be familiar in the above technologies or similar products/frameworks. Go…
17 Aug 2021
Reinventing how the world does work inevitably creates a lot of data. Each year, Slack’s scale has increased and the volume of data ingested and stored has kept pace. To make it possible to understand relationships within our data, we’ve invested heavily in an automated data lineage framework. This facilitates producer/consumer coordination, improves risk mitigation,…
28 Jul 2021
With the release of Slack Connect, people can now collaborate both with internal employees and external organizations in the same channel. To make this as smooth as possible, Slack does predictive email analysis to classify and recommend the best way for a user to work with people they want to collaborate with. To accomplish this,…
16 Nov 2020
My thoughts and take homes after using Kedro for 6 months in various projects and teams.