REX FinOps : en ouvrant AWS Cost Explorer sur un data lake à 6 500 $/mois, trois anomalies apparaissent. AWS KMS à 4 000 $/mois, des licences fantômes pour des développeurs partis, 10 pipelines CI/CD redondantes. Optimisation des coûts cloud : 100 000 $ d'économies annuelles. Sans outil, sans migration. Juste de la curiosité. Checklist incluse.
#aws
42 posts
9 Jun
3 Jun
Photo by Corinne Kutz on Unsplash Before we knew better Our orchestration system started as a simple internal solution to manage event pipelines and trigger downstream jobs. Over time, as more workflows and dependencies were added, it gradually evolved into a tightly coupled monolithic scheduler that became increasingly difficult to understand and maintain. Understanding how a workflow executed often meant…
28 May
In early 2023, Slack faced a foundational challenge: serving Large Language Models (LLMs) at enterprise scale with the security, reliability, and performance our customers expect. Over three years, we evolved from basic infrastructure to orchestrating a sophisticated multi-cloud architecture. We didn’t just want shiny new models; we needed a system resilient to regional outages and…
13 May
On a recent software development project that already planned to use AWS, we used AWS Cognito for authentication. Cognito is Amazon’s managed identity platform for web and mobile apps, offering features like MFA, password reset flows, and sign-in. On paper, it’s a strong fit for projects already using AWS. In practice, the rough edges cost […] The post 3 AWS…
5 May
Excerpt By 2024, Slack’s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We’re talking daily search indexing that processed terabytes of data, analytics jobs powering business intelligence, the whole shebang. Every single one of these jobs required direct SSH access to production AWS Elastic MapReduce (EMR) clusters. We had a massive security…
3 May
How to write reusable middleware for Rust Lambda functions using tower, the generic middleware engine that already underpins the AWS Lambda Rust runtime. Includes a complete DynamoDB-backed IP rate limiter with SAM deployment.
17 Dec 2025
Learn how to debug silent failures in AWS API Gateway HTTP when your OIDC provider doesn't implement the .well-known/openid-configuration endpoint. Enable FailOnWarnings to catch these issues before they break your production deployment.
12 Dec 2025
At the recent AWS re:Invent, Docker focused on a very real developer problem: how to run AI agents locally without giving them access to your machine, credentials, or filesystem. With AWS introducing Kiro, Docker demonstrated how Docker Sandboxes and MCP Toolkit allow developers to run agents inside isolated containers, keeping host environments and secrets out...
23 Oct 2025
Last year, I wrote a blog post titled Advancing Our Chef Infrastructure, where we explored the evolution of our Chef infrastructure over the years. We talked about the shift from a single Chef stack to a multi-stack model, and the challenges that came with it – from updating how we handle cookbook uploads to navigating…
28 May 2025
Supporting developers to debug and resolve issues with datastores in the Self-Service ecosystem. Welcome to the third blog post of our Self-Service Datastore series, where we share our journey towards creating a more efficient and reliable way to manage datastores at Zendesk. Previous blog posts: Unlocking Efficiency: A New Era for Datastore Provisioning Simplifying Datastore Provisioning with Kubernetes Operators We…
21 Dec 2024
Discover the journey behind FullStack Bulletin, a weekly newsletter for full-stack developers with 404 curated issues over 8 years. Learn about its origins, technical implementation, and future plans.
2 Dec 2024
Introduction Welcome to the second blog post of our Self-Service Datastore series, where we share our journey towards creating a more efficient and reliable way to manage datastores at Zendesk. In today’s dynamic application development landscape, the ability to swiftly provision datastores is crucial for maintaining agility and delivering exceptional user experiences. Provisioning encompasses all steps involved in requesting a…
17 Sept 2024
At Slack, we manage tens of thousands of EC2 instances that host a variety of services, including our Vitess databases, Kubernetes workers, and various components of the Slack application. The majority of these instances run on some version of Ubuntu, while a portion operates on Amazon Linux. With such a vast infrastructure, the critical question…
2 Jul 2024
Unlocking Efficiency and Performance: Navigating the Spark 3 and EMR 6 Upgrade Journey at Slack
SlackSlack Data Engineering recently underwent data workload migration from AWS EMR 5 (Spark 2/Hive 2 processing engine) to EMR 6 (Spark 3 processing engine). In this blog, we will share our migration journey, challenges, and the performance gains we observed in the process. This blog aims to assist Data Engineers, Data Infrastructure Engineers, and Product…
24 Apr 2024
Bazaarvoice notification system stands as a testament to cutting-edge technology, designed to seamlessly dispatch transactional email messages (post-interaction email or PIE) on behalf of our clients. The heartbeat of our system lies in the constant influx of new content, driven by active content solicitations. Equipped with an array of tools, including email message styling, default […]
18 Apr 2024
At Slack, we’ve long been conservative technologists. In other words, when we invest in leveraging a new category of infrastructure, we do it rigorously. We’ve done this since we debuted machine learning-powered features in 2016, and we’ve developed a robust process and skilled team in the space. Despite that, over the past year we’ve been…
12 Dec 2023
We are heavy users of Amazon Compute Compute Cloud (EC2) at Slack — we run approximately 60,000 EC2 instances across 17 AWS regions while operating hundreds of AWS accounts. A multitude of teams own and manage our various instances. The Instance Metadata Service (IMDS) is an on-instance component that can be used to gain an…
5 Nov 2023
When building a custom API Gateway authorizer, mysterious 500 errors can happen. This post shows how to enable CloudWatch logging for API Gateway to inspect the logs and debug problems.
28 Apr 2023
(cover image from ThisisEngineering RAEng) Let’s face it: software is easier to write than maintain. This is why we, as software engineers, prefer to just “rip it out and start over” instead of trying to understand what another developer (or our past self) was thinking. We seem to have collectively forgotten that “programs must be […]
18 Apr 2023
Authored by: Rojan Rijal, Tinder Security Labs | Johnny Nipper, Sr. Director | Tanner Emek, Sr Engineering Manager Summary In 2021, GitHub released support for OpenID Connect (OIDC) for GitHub Actions (GHA), allowing developers to securely interact with their infrastructure resources in Amazon Web Services (AWS), and other major cloud service providers. The OIDC support allows GHA jobs to retrieve…
24 Jan 2023
Slack launched GovSlack in July 2022. With GovSlack, government agencies, and those they work with, can enable their teams to seamlessly collaborate in their digital headquarters, while keeping security and compliance at the forefront. Using GovSlack includes the following benefits: Supports key government security standards, such as FedRAMP High, DoD IL4, and ITAR Runs in…
25 Oct 2022
At Slack, we use Terraform for managing our Infrastructure, which runs on AWS, DigitalOcean, NS1, and GCP. Even though most of our infrastructure is running on AWS, we have chosen to use Terraform as opposed to using an AWS-native service such as CloudFormation so that we can use a single tool across all of our…
26 Apr 2022
The AWS Solutions Architect Professional certification is one of the toughest IT certifications. This post shares preparation tips, exam strategies, study resources, and sample questions to help you succeed.
30 Mar 2022
BBC Online — A year with serverless Its been a little over a year since I published my last two blog posts, in which I outlined the process we went through to choose the technology for BBC online and the steps we took to optimise serverless for our use. Recently my colleague Graeme has published a blog post on the…
1 Nov 2021
This post explains how to conditionally create resources in AWS CDK using CfnCondition. It provides a practical example of creating an S3 bucket based on an SSM parameter value. The post covers defining a condition, attaching it to a low-level CDK construct, and importing the conditionally created resource.
29 Oct 2021
Pinion — The Load Framework Part-2 This post is the 2nd part of the “Pinion — The Load Framework” series. In case you have not read the 1st post, you can read it here . In this post, we are going to cover the following topics. How does Pinion use Delta Lake for SCD operations? Small file problem with Delta…
20 Oct 2021
About a year ago, I wrote a blog post called Building the Next Evolution of Cloud Networks at Slack. In it, we discussed how Slack’s AWS infrastructure has evolved over the years and the pain points that drove us to spin up a brand-new network architecture redesign project called Whitecastle. If you have not had…
6 Aug 2021
This post explains how to use CDK to provision Ubuntu EC2 instances on AWS. It covers finding the right AMI, adding security groups, using init scripts, installing AWS utilities, and more.
22 Jun 2021
The boto3 Python SDK allows intercepting requests before they are sent to AWS through an event handler system. This article shows how to use it to gzip the payload of PutMetricData requests sent to CloudWatch.
17 Jul 2020
You need a Data Lake. The Context Teamwork has been around for more than 10 years. Starting out as a project management and work collaboration platform and later expanding into other areas, such as help-desk, chat, document management and CRM software. As the company has grown and evolved, data has grown, changed, expanded, diversified, fragmented, then changed again. Analytics in…
6 May 2020
24 Jul 2019
At Clever, we lock down code access to customer data using AWS IAM roles with session policies. In Clever’s microservice AWS architecture, each service has a unique IAM role with access to the AWS resources it needs: S3 buckets, DynamoDB tables, and so on. Our services are multi-tenant and customer data is separated via logical […] The post Using IAM…
21 Oct 2018
The AWS Solutions Architect Associate exam covers a wide range of AWS services. This post shares helpful notes and tips for studying key concepts like EC2, S3, VPC, DynamoDB, and more. It provides advice on the exam mindset and lists official and unofficial preparation resources. The notes summarize important details around provisioned throughput, instance types, database replication and more that…
5 May 2018
The AWS CLI s3 cp command supports streaming content to and from S3 using stdin/stdout with the - argument. This enables powerful pipelines without intermediary files.
6 Feb 2018
At Clever, one of our tenets is “Always a Student”, and in that spirit of learning we wanted to share the changes we made to fix memory allocation issues in AWS Elastic Container Service related to swappiness. Swappiness is a Linux Kernel setting that specifies how likely it is for a page in memory to be […] The post Swappiness…
16 Dec 2017
This article explores the history of cloud computing from bare metal servers to serverless, explaining key innovations like IaaS, PaaS, containers and FaaS along the way.
14 Sept 2017
Recently I started a gig where I was handed a laptop and told I should take it home with me at night. Now this isn't the worst thing ever but I was already carrying one laptop with me (my personal/work laptop as opposed to the one given to me
11 Aug 2017
19 Jun 2017
Using Let’s Encrypt and Certbot to automate the creation of certificates for OpenVPN
Luciano MamminoThis post explains how to use Let's Encrypt and Certbot to automatically generate and renew SSL certificates for OpenVPN. It provides a complete Terraform setup as a practical example.
21 Apr 2017
11 Apr 2014
Cloudformation is a powerful tool for building large, coordinated clusters of AWS resources. It has a sophisticated API, capable of supporting many different enterprise use-cases and scaling to thousands of stacks and resources. However, there is a downside: the JSON interface for specifying a stack can be cumbersome to manipulate, especially as your organization grows […]
22 Jun 2013
Greetings all! In the world of SaaS, wiser men than I have referred to Operations as the “Secret Sauce” that distinguishes you from your competition. As manager of one of our DevOps teams, I wanted to talk to you about how Bazaarvoice uses the cloud and how we engineer our systems for maximum reliability. You […]