Resilience Enablement
Jennifer Petoff
Jennifer Petoff
“Learning drives confidence, confidence drives behaviour, and behaviour repeated over time is what drives the culture.”
- Jennifer Petoff
Listen at:
Resilience & Learning at Scale with Jennifer Petoff
When we talk about reliability, most people think automation, monitoring, and uptime. But what happens when the unexpected hits - when systems fail, pressure mounts, and teams need to respond fast? That’s where resilience comes in.
This episode dives into why resilience isn’t just a technical property - it’s a cultural one. And who better to explore this than Dr. Jennifer Petoff, co-editor of Site Reliability Engineering: How Google Runs Production Systems and leader of Google’s Global SRE Education.
If you’ve ever been in a war room during an outage, or tried to convince leadership that reliability is a business advantage, this episode will resonate. Jennifer’s insights connect the dots between technical excellence and human adaptability—because resilience isn’t just about surviving incidents, it’s about thriving through change.
Organizations everywhere are asking:
How do we scale reliability without burning out teams?
How do we create environments where people learn from failure instead of fearing it?
How do we embed resilience into both systems and culture?
Jennifer shares why these questions matter more than ever—and why the answers start with psychological safety and learning.
Jennifer takes us behind the scenes of Google’s approach to SRE education:
Scaling learning like a production system: How Google treats education as a reliability function, ensuring every engineer has the confidence and tools to act under pressure.
The human side of reliability: Why blameless postmortems aren’t just a process—they’re a cultural signal that learning beats blame.
Confidence as a resilience multiplier: Jennifer explains how confidence drives behavior, and repeated behaviors shape culture.
She even shares a powerful idea:
“You can SRE anything.”
From onboarding to organizational processes, Jennifer shows how SRE principles apply far beyond infrastructure.
Listen at:
Check out Jennifer's How to SRE Anything here: https://www.reliablepgm.com/how-to-sre-anything/
Things to listen for:
Origins of SRE and Education at Google
How Google scaled SRE education globally.
Why education is treated like a production system (repeatable, reliable, measurable).
Psychological Safety and Learning
Why psychological safety is critical for resilience.
Creating environments where teams can share mistakes without fear of blame.
How this accelerates learning and reliability.
Hands-On Experience as a Learning Model
Importance of experiential learning (e.g., game days, simulations).
Why theory alone isn’t enough for building confidence under pressure.
Scaling Knowledge Across Large Organizations
Strategies Google uses to scale SRE principles globally.
Balancing standardization with flexibility for local teams.
Resilience Beyond Reliability
How resilience differs from reliability.
Building adaptive systems and teams that thrive through adversity.
Culture as a Foundation
Why culture is the “secret ingredient” for successful SRE adoption.
Encouraging curiosity and collaboration across roles.
Future of SRE Education
Trends in learning for distributed teams.
How continuous education supports evolving reliability practices.