~/devreads

#incident-management

3 posts

14 Nov 2024

Scott Nelson Windels 11 min read

Incident Management takes time Incidents need responders that are trained and experienced. At Slack, training is a foundation of our incident management program. Self-service training and live courses based mainly on prepared content are one piece of the puzzle, but there can be a missing piece in many organizations. How can staff get practical experience…

uncategorizedincident-managementincident-response

19 Aug 2022

Frank Chen 15 min read

What happens when your distributed service has challenges with stampeding herds of internal requests? How do you prevent cascading failures between internal services? How might you re-architect your workflows when naive horizontal or vertical scaling reaches their respective limits? These were the challenges facing Slack engineers during their day-to-day development workflows in 2020. Multiple internal…

uncategorizedci-cddeveloper-productivityincident-managementinfrastructure

18 Feb 2022

Carlos Valdez 12 min read

In 2021, we changed developer testing workflows for Webapp, Slack’s main monorepo, from predominantly testing before merging to a multi-tiered testing workflow after merging. This changed our previous definition of safety and developer workflows between testing and deploys. In this project, we aimed to ensure frequent, reliable, and high-quality releases to our customers for a…

uncategorizedautomation-testingci-cddeploymentincident-management