This recap includes the following episodes and guests.Â
Guest title reflects the guest's position at the time of the episode.
Recipes:
Map Critical Dependencies
Conduct Resilience Assessments
Improving Resilience with Dependencies
Implement Release Readiness Gates
Measure Release Impact
Recipes for Building Scalable Reliability in Complex Systems
Goal: Identify internal and external system touchpoints that influence service reliability.
Marshall Lamp kicked off his episode with a story that perfectly captured the theme of this chapter. While traveling during the pandemic, he stopped at a fast food restaurant and saw a sign that read: âDue to supply chain issues, we are temporarily unable to serve chicken sandwiches.â He wasnât planning to order oneâbut the sign stuck with him. It was a clear signal that supply chain disruptions had become so visible, they were now part of everyday life. As he put it, âSuddenly, supply chain has become a household word... and I'm not sure that's a good thing..."
"At the end of the day, we only control what we control. And we can complain when someone else fails at their job. We can complain all we want, but we can't prevent them from failing at what they do. All we can do is be resilient to when they fail."
Marshall brought a similar lens from the supply chain world, where expectations for speed and transparency are rising. âWeâre moving rapidly away from batch-oriented data transfers to more nimble API-based transactions,â he said, noting that customer expectations now demand near real-time updates. He also emphasized the importance of understanding customer impact: âUnderstand how your disruptions impact customersâthatâs how you design better systems.â For both guests, the goal is not just to ship more, but to ship betterâwith fewer surprises and more confidence.
That story set the stage for a deeper conversation about dependencies. In Marshallâs words, âA supply chain is a stitched together series of systems and human-based processes.â No one owns the whole thing, and thatâs what makes it fragile. He reminded us that âI only own the bit that I own⊠I canât control what my partners do.â Thatâs why visibility and mapping dependenciesâespecially external onesâare essential to resilience.
Ron Baker brought this same mindset to software systems. He emphasized that âyouâve got to win the priority war, and it is a war,â especially when reliability features compete with new product ideas. His approach includes educating stakeholders on the risks of fragile dependencies and showing how they impact customer experience. âWe focus on three areas: education of risk, margin, and vision,â he explained. And to influence change, âyou have to be ahead of the curve or youâll miss the window to influence.â
Goal: Simulate failure scenarios to test business continuity assumptions.
Marshallâs perspective on resilience is rooted in realism: things will break, and the question is how fast you can recover. He introduced the concept of Time to Survive (TTS) alongside Time to Recovery (TTR), explaining that âif my time to recover exceeds my time to survive, I have a big problem.â He encouraged teams to model disaster scenarios and understand how long they can operate under stress before customer impact becomes unavoidable. This isnât just about technologyâitâs about people too. âWe practice business continuity by having our people work from home,â he shared, highlighting the importance of workforce readiness.
EMPATHY - Taking the perspective of the users
Ron echoed this need for preparation, but from a product development lens. He emphasized that âsuccess is not a single releaseâitâs measured over a two-year period.â To make the risk real for stakeholders, he uses storyboards to show what operators go through when things break: âYouâve got to hit their emotions, not their intellect.â These storyboards help teams visualize the human and customer cost of fragile systems, making the case for proactive investment in resilience. As he put it, âCreate a storyboard that shows what your operators are going through.â
Goal: Strengthen your systemâs ability to absorb and adapt to failures in external and internal dependencies.
In complex systems, dependencies are unavoidableâbut fragility doesnât have to be. Both Marshall and Ron emphasized that resilience isnât about eliminating dependencies, but about designing systems that can adapt when those dependencies fail.
Marshall explained that in supply chains, âI only own the bit that I own⊠I canât control what my partners do.â Thatâs why he focuses on building flexibility into the system. He shared that âhaving multiple suppliers and rotating between themâ is a key strategy to reduce risk. But itâs not just about redundancyâitâs about readiness. âDigitized processes make data portable and visible,â he said, which allows teams to respond quickly when something breaks down.
Ron brought this same thinking into the software world. He pointed out that âyouâve got to win the priority war, and it is a war,â especially when trying to get reliability features prioritized alongside product features. His approach includes making the risk of dependency failure visible to decision-makers. âYouâve got to hit their emotions, not their intellect,â he said, describing how storyboards and real-world operator pain points help shift priorities. He also emphasized the importance of proactive planning: âIf youâre not ahead of that curve⊠youâll miss the opportunity to influence.â
Both guests agreed that dependency resilience is not just about technical architectureâitâs about culture, visibility, and shared responsibility. Marshall summed it up well: âWe canât prevent failure, but we can prepare for it.â Â
Goal: Standardize reliability and performance checks before launch.
SRE Feature Prioritization with Stakeholders:
Education of the risk, Focus on margin, Create a vision of the goal
Ron emphasized that timing is everything when it comes to influencing product development. âIf youâre not ahead of that curve⊠youâll miss the opportunity to influence,â he warned. Thatâs why he advocates for shifting reliability leftâembedding it early in the planning and development process. He also shared a practical lesson learned: âDonât show whatâs promisedâshow whatâs tested and verified.â This builds trust and ensures that reliability isnât just a checkbox, but a real outcome. His use of an SRE scorecard helps teams prioritize and track maturity over time.
Marshall connected this to the operational side of supply chains. He explained that digitization is key to readiness: âDigitized processes make data portable and visible.â This not only supports remote work and continuity, but also enables earlier detection of issues. He also emphasized the need for engineers to understand their systems deeply: âYou need to understand everywhere in your code that could break.â That awareness is what allows teams to build in the right checks before launch.
Goal: Focus metrics on customer experience and incident trends, not just delivery speed.
Metrics are only useful if they lead to better decisions. Ron cautioned against relying too heavily on dashboards: âCreate a simple metric with green, yellow, redâbut use it to start a conversation.â Heâs learned that leadership often assumes something is done just because itâs on a roadmap. Thatâs why he insists on showing tested outcomes, not just intentions: âUse KPIs as tools, not checkmarks.â
Marshall brought a similar lens from the supply chain world, where expectations for speed and transparency are rising. âWeâre moving rapidly away from batch-oriented data transfers to more nimble API-based transactions,â he said, noting that customer expectations now demand near real-time updates. He also emphasized the importance of understanding customer impact: âUnderstand how your disruptions impact customersâthatâs how you design better systems.â For both guests, the goal is not just to ship more, but to ship betterâwith fewer surprises and more confidence.
đ How do you identify and track your most critical dependenciesâespecially the ones you donât control?
đ§Ș Whatâs your process for simulating failure scenarios, and how often do you revisit your assumptions?
đ Are your release metrics focused on volume or impactâand how do you measure customer experience post-launch?