~/devreads

#data science

20 posts

2 Jun

Harrison Katz 11 min read

How Airbnb used sequential geographic recovery signals and prior propagation to generate reliable corridor-level forecasts when local data was scarce. By: Harrison Katz The problem with unprecedented shocks Almost every forecasting system is built on the same implicit assumption: the future will resemble the past. You train on historical data, you validate on holdout periods, and you trust that past…

technologymachine-learningdata-sciencedata-modelingforecasting

18 May

29 Apr

Thumbtack Engineering 9 min read

A practical look at how Thumbtack navigates evaluation for emerging AI experiences and what we’ve learned along the way. By: Shishir Dash , Director of Applied Science & Teja Venkat Kolli , Senior Applied Scientist Evaluating AI at Scale Introduction AI is reshaping how people interact with products, and Thumbtack is no exception. We’re introducing AI into more aspects of…

aiai-evaluationgenaiai-engineeringdata-science

24 Mar

Prakhar Sapre 8 min read

Expedia Group Technology — Data Workload‑aware routing for Trino Photo by Joseph Barrientos on Unsplash Trino — a fork of PrestoSQL — is a powerful tool in modern data analytics, enabling organizations to query large datasets quickly and efficiently. As a distributed SQL query engine, Trino provides fast, scalable insights without requiring data relocation. While Trino is robust on its…

trino-gatewaysqlanalyticstrinosdata-science

17 Feb

Benjamin Stieger 9 min read

Expedia Group Technology — Data Quickly identifying winning ranking models before committing to A/B tests Authors: Adam Woznica, Benjamin Stieger, and Stefania Ebli Photo by Il Vagabiondo on Unsplash Expedia Group ™ covers a portfolio of brands such as Expedia.com, Hotels.com, and Vrbo, that power lodging searches for millions of travel shoppers every day. In this competitive market matching users…

experimentationdata-sciencehypothesis-testingrecommender-systemsa-b-testing

27 Jan

Alyssa White, PhD 7 min read

Expedia Group Technology — Data Two roles one goal — understanding users better By Sophie Rabet and Alyssa White Photo by Samsung Memory US on Unsplash Quantitative User Experience (UX) Research, as a discipline, is growing rapidly. Quant UX Con 2022, the first ever general industry conference for the discipline, was organized with the expectation of about 200 attendees. After…

career-advicedata-scienceanalyticsux-researchquantitative-ux-research

6 Jan

Manisha Sudhir 6 min read

Expedia Group Technology — Data Science Empowering developers with seamless vector embedding solutions Photo by Daniela Cuevas on Unsplash Introduction Rapid advances in Machine Learning (ML), especially Generative AI, have increased the need for specialized capabilities like vector embedding similarity search. Vector embeddings are the numerical representations created by machine learning models which allow disparate inputs to be compared against…

machine-learningvector-databasemlsdata-science

10 Oct 2025

James Chan 6 min read

As a fast-growing home services platform, we heavily utilize machine learning to elevate user experience and improve business processes such as reducing spam, improving search results, and providing recommendations. In recent years, Generative AI has taken the world by storm as a powerful addition to traditional ML. We embraced this mega trend by incorporating LLMs into various areas of our…

data-sciencedatabricksgenaiinformation-securitymachine-learning

19 Aug 2025

17 Mar 2025

João Palmeiro 6 min read

Data scientists use different Jupyter notebooks every day — ranging from disposable ones for quick tasks to those shareable with clients. Over time, more and more notebooks accumulate, making it increasingly difficult to reuse them in whole or in part. To mitigate this problem and make the most relevant pieces of code quickly accessible to every data scientist, we developed…

jupyterlabpythondata-sciencejupytersnippet

2 Dec 2024

4 Nov 2024

Jeffrey Mew 2 min read

AI did not write this blog post, but it will make your exploratory data analysis with Data Wrangler better! Today, we’re excited to introduce our first step of integrating the power of Copilot into Data Wrangler. With this first integration of Copilot with Data Wrangler, you’ll be able to: Use natural language to clean and […] The post Announcing GitHub…

data sciencedata wranglerpythonvisual studio codeai

7 May 2024

Jeffrey Mew 3 min read

Today, we are excited to announce the general availability of the Data Wrangler extension for Visual Studio Code! Data Wrangler is a free extension that offers data viewing and cleaning that is directly integrated into VS Code and the Jupyter extension. It provides a rich user interface to view and analyze your data, show insightful […] The post Announcing Data…

pythonvisual studio codecsvdatadata science

28 Nov 2023

Roland Meertens 8 min read

A clustering-based approach to create deep learning datasets in a day Introduction Understanding what’s happening in an image is both an important task, as well as a costly one. In the last few years, the field of computer vision has greatly accelerated due to the advances in neural networks. At Bumble Inc., we see potential value in computer vision for…

data-sciencemachine-learningclusteringdeep-learningdataset

17 Mar 2023

21 Sept 2022

Simone Spaccarotella 7 min read

How I learned to manipulate JSON data with Pandas on a Jupyter Notebook and deconstruct it to a DataFrame ready for queries. Image by author created from Jupiter photo by NASA and Pandas photo by Pascal Müller on Unsplash A bit of context first I started a self-study path to learn the theoretical fundamentals of Data Science and Machine Learning.…

data-sciencejupyter-notebookpythonpandaskaggle

21 Apr 2022

25 Oct 2021

Steve Dower 1 min read

Our friends at Anaconda have posted a joint announcement last week regarding the use of their repository from Microsoft cloud-hosted products. See the full announcement on their website. Today, Anaconda, Inc. announced a collaboration with Microsoft to enable customers to confidently access Anaconda’s curated library of open-source packages within Microsoft Cloud-hosted products and services, including […] The post Anaconda licensing…

azurepythonanacondadata sciencemachine learning

8 Jul 2020

Sid Unnithan 4 min read

The VS Code team is excited to announce releases of the Azure Machine Learning extension which aims to help you manage your core machine learning assets from directly within your favourite editor! The post Enhance your Azure Machine Learning experience with the VS Code extension appeared first on Microsoft for Python Developers Blog.

azureazure machine learningvisual studio codeazure machine learning extensiondata science

16 Apr 2019

Michael Droettboom 14 min read

Pyodide is an experimental project from Mozilla to create a full Python data science stack that runs entirely in the browser. We think it’s worthwhile to work on moving the JavaScript data science ecosystem forward, and that's why we built and released Iodide earlier this year. In the meantime, we’re meeting data scientists where they are by bringing the popular…

featured articlejavascriptdata scienceiodidepyodide