This recap includes the following episodes and guests.
Guest title reflects the guest's position at the time of the episode.
Recipes:
Support Autonomy with Guardrails
Use Metrics to Inform, Not Police
Align Personal Values with Engineering Practices
Build Repeatable Culture Through Rituals and Practices
🍳 Cookbook Chapter: Managing SRE Teams Effectively
Welcome to this chapter of the SRE Omelette Cookbook, where we explore the art of managing Site Reliability Engineering teams—not just through frameworks and metrics, but through stories, values, and culture. In this chapter, we’re drawing from two rich conversations: one with Bill Higgins, and another with Marion Clelland. Both leaders bring unique perspectives on what it takes to build, support, and evolve high-performing SRE teams. From psychological safety to metrics that matter, and from asking for help to celebrating small wins, this chapter is packed with practical recipes for thoughtful leadership.
🎯 Goal: Empower teams with clear constraints and trust.
Bill Higgins believes that autonomy doesn’t mean working in isolation—it means having the clarity and support to make decisions confidently. He shared that effective collaboration starts with trust, not process. “Let me know how I can earn your trust and let me know if I ever do anything that hurts your trust in me,” he tells new collaborators. This simple but powerful invitation sets the tone for open communication and mutual respect—two essential ingredients for autonomy.
Bill also emphasized the importance of team design as a foundation for autonomy. “DevOps team is a team where you've got the right aggregate set of skills. It's not like a role,” he explained. By focusing on complementary skills rather than rigid roles, he creates teams that are empowered to solve problems independently—because they have the right mix of expertise and shared understanding to do so.
He models this by giving his teams space to lead, while stepping in only when needed. For example, during IBM’s GitHub Enterprise migration, he trusted the team to run dozens of test migrations and refine their approach. The result? A seamless production cutover that finished ahead of schedule. “We kept waiting for something to go wrong, but it didn’t. That’s what happens when you give people the space to prepare deeply,” he said.
Marion Clelland brings this principle to life in her day-to-day leadership. She sees her role as a shield—protecting her team from distractions so they can focus on what matters. “I see my primary job as trying to make the world a better place for my engineering team to work in,” she said. Whether it’s quietly resolving licensing issues or navigating organizational blockers, she handles the noise so her team can stay in flow.
But Marion also sets clear expectations. She encourages her team to take ownership of their systems, while making it safe to ask for help when needed. “Were they enjoying what they were doing? Was the stress of the on-call too much?” she asks in regular one-on-ones. These check-ins aren’t just about performance—they’re about ensuring that autonomy doesn’t become isolation.
She also models healthy boundaries herself. “I used to be dreadful. I used to still be online late at night, but now it is very much laptops shut. I need to stop, I need to do something else because it just gets too much otherwise,” she shared. By demonstrating that it’s okay to disconnect, she gives her team permission to do the same—reinforcing a culture of trust and sustainability.
Together, Bill and Marion show that autonomy thrives when it’s paired with clarity, support, and trust. It’s not about letting go—it’s about stepping back just enough to let others step forward.
🎯 Goal: Design KPIs that guide, not punish.
Bill Higgins challenges the traditional use of metrics in engineering by reframing them as tools for learning rather than judgment. He shared how his team moved away from the term “postmortem” in favor of “learning review,” explaining, “Don’t call them postmortems because that has negative connotations. Call them learning reviews... and don’t just do learning reviews when something goes wrong.” This shift in language signals a deeper cultural shift—one that values reflection over blame.
Bill also emphasized that metrics should be used to uncover insights, not enforce compliance. For example, when his team successfully migrated IBM’s massive GitHub Enterprise instance, they still held a learning review—not to find fault, but to understand what went right and how to replicate it. “It was supposed to take two and a half days, and I think they got it done in one and a half... and then we waited for something to go wrong, but it didn’t. So we did a learning review anyway.” That kind of proactive reflection builds confidence and maturity in how teams use data.
Marion Clelland brought a grounded, day-to-day perspective to the conversation. She highlighted the challenge of balancing alerting thresholds with team well-being: “How do you get the right balance of alerting so that your team can get sleep and rest... but also that you notice it before your customers do?” Her team iterates on alerting configurations regularly, using SLOs as a compass—not a stick. She’s clear that metrics should help teams prioritize, not punish: “Unless you highlight the tech debt in your plan somewhere, it will always go unnoticed and no one will ever fix it.”
To support this mindset, both guests shared practices that reinforce healthy metric use:
✅ Normalize learning reviews for both successes and failures.
✅ Use SLOs to guide engineering focus, not to assign blame.
✅ Make tech debt visible in planning tools so it can be prioritized.
✅ Encourage teams to own their alerting thresholds and refine them over time.
✅ Celebrate improvements in reliability and process—not just feature delivery.
Together, these stories show that metrics can be a powerful ally in building resilient systems—when they’re used to inform, not to control. The key is to pair data with empathy and context, so teams feel supported, not scrutinized.
🎯 Goal: Encourage authentic engagement and inclusive culture.
Bill Higgins believes that the most effective engineers—and teams—are those who are deeply engaged in their own growth. He encourages individuals to take ownership of their development by identifying specific skills they want to improve and pursuing them with intention. “Pick one or two skills in a six-month period that you want to level up,” he advised, describing a structured approach that includes studying, finding a mentor, and practicing on the job. This method not only builds technical depth but also gives engineers a sense of agency and purpose.
But Bill doesn’t stop at skill-building. He sees emotional intelligence as a core engineering competency. “If somebody says something that makes me angry, I'm going to count to 10 before I respond,” he shared, reflecting on how deliberate behavior change can improve team dynamics. He draws a parallel between learning to manage emotions and learning to write better code—both require feedback, repetition, and reflection. By modeling this mindset, Bill creates a culture where engineers feel safe to be vulnerable, ask questions, and grow.
He also emphasized the importance of inclusive rituals that reinforce shared values. His team’s “Thankful Thursdays” Slack tradition encourages public recognition and appreciation. “Celebrate awesomeness... because human nature is that we tend to speak up when something goes bad,” he said. These small, consistent practices help build a culture where everyone feels seen and valued.
Marion Clelland’s leadership journey is a testament to aligning personal values with professional impact. After years in technical roles, she made a thoughtful shift into management—not for status, but because it felt meaningful. “I wanted to do something that I found fun and I enjoyed... and I think I just felt naturally I was good at the role,” she said. That authenticity shows up in how she leads—with empathy, consistency, and a deep respect for her team’s well-being.
Marion fosters inclusive engagement by encouraging open dialogue and psychological safety. She holds regular one-on-ones not just to track progress, but to check in on how people are feeling. “Were they enjoying what they were doing? Was the stress of the on-call too much?” she asks. These conversations help her understand what motivates each person and how to support them in ways that align with their values.
From an engineering practice perspective, Marion promotes pairing, cross-training, and shared ownership of systems. “If we can't feel like we can go on holiday, then we're doing something wrong in the team,” she said. This philosophy leads to better documentation, more resilient systems, and a culture where no one feels like a single point of failure.
She also encourages lightweight, inclusive learning practices—like internal watch parties for conference talks or sharing relevant articles in Slack. These activities not only build skills but also create space for curiosity and connection.
Together, Bill and Marion show that when leaders model authenticity, emotional intelligence, and a commitment to growth, they create environments where engineers can thrive—not just as professionals, but as people. And when engineering practices are designed to support that kind of culture, the result is a team that’s not only high-performing, but deeply engaged and inclusive.
🎯 Goal: Establish scalable habits that reinforce team values and resilience.
Bill Higgins believes that culture isn’t just what you say—it’s what you do consistently. One of the most effective ways to scale culture, he says, is through lightweight, repeatable rituals that reinforce team values. His team’s “Thankful Thursdays” is a prime example: a weekly Slack reminder prompts team members to recognize each other’s contributions. “Celebrate awesomeness... because human nature is that we tend to speak up when something goes bad,” Bill explained. This ritual not only boosts morale but also surfaces the often-invisible work that keeps teams running smoothly.
He also emphasized that rituals like these are more than feel-good moments—they’re cultural signals. “There are cultural values which are intangible, and then there are manifestations of those values,” he said. By making appreciation visible and routine, teams reinforce a culture of gratitude, trust, and shared ownership. And because it’s simple and scalable, it works just as well for a 5-person team as it does for a 50-person one.
Marion also promotes inclusive learning rituals. She encourages her team to block time for skill-building and even proposed a creative idea: a team-curated “conference day” where everyone shares a favorite talk or article. “Everyone gets to share something they thought was really beneficial... and you get lots of different ideas from lots of different people,” she said. These kinds of grassroots rituals not only build skills but also foster a sense of community and shared curiosity.
Both leaders agree that the key to building a resilient, values-driven culture is consistency. Whether it’s a weekly Slack post, a monthly learning session, or a daily stand-up that starts with gratitude, the goal is the same: to create habits that reflect what the team values—and to make those habits easy to sustain as the team grows.
🧠 How do you personally balance autonomy with accountability in your team?
📉 What’s one metric your team uses that actually helps—not hinders—your work?
🧂 What’s your team’s “omelette recipe”? What are your essential ingredients?