This recap includes the following episodes and guests.Â
Guest title reflects the guest's position at the time of the episode.
Recipes:
Automate Manual Toil with Context Awareness
Incremental Innovation Loops
Shift Left with Predictive Insights
Integrate Innovation into Delivery Pipelines
Embrace Failure as a Path to Innovation
In this chapter of the SRE Omelette Cookbook, we dive into the essential ingredients behind two of the most powerful forces shaping our industry: Automation and Innovation. I sat down with two leaders whoâve helped shape how IBM approaches automation and innovation: Jerry Cuomo, IBM Fellow and CTO of Automation, and Steven Astorino, VP of Development for IBM Data & AI. Together, we explored how teams can move from manual toil to scalable systems, and from isolated ideas to integrated innovation. Hereâs what we learned.
Goal: Identify and automate high-friction, repetitive tasks where human effort is stretched thin.
Jerry reminded us that automation is not just about efficiencyâitâs about trust and context. He shared how IBMâs mainframe automation took over a decade to gain adoption. The breakthrough came when IBM introduced a âtraining modeâ that showed what automation would do before it acted.
âThe best way to gain trust is to fully understand how itâs going to react in that situation.â
He also reflected on the early days of operations, where runbooks were sticky notes and tribal knowledge was passed by word of mouth.
âYou donât start with 50%. You start with close to 0% and a lot of experience. And you build up from there with code.â
This is where automation becomes sustainableâwhen itâs built on real-world experience and codified into repeatable, reliable systems.
And Jerry reminded us that automation needs to be grounded in repeatability. âRepeatability drives scalability, drives high reliability, and is embodied in code.â If youâre still relying on tribal knowledge, thatâs a signal to start capturing it in code.
Goal: Create structured, time-boxed environments where teams can experiment and deliver focused breakthroughs.
Steven shared how IBMâs Area 631 program created space for focused innovation. The model is simple: âSix people, three months, one breakthrough.â Itâs a structured way to explore new ideas without the distractions of day-to-day work.
It wasnât just the cool space or the snacks (though those helped). It was the freedom to break rules and focus. âWe eliminate all the blockers. There are really no rules. Break it or make it in three months.â The freedom, paired with a clear timeline, helped teams move fast and stay focused.  Thatâs how you create space for real breakthroughs.
And the results werenât just prototypesâthey were production-ready. âOnce they graduate, we decide where they fit in⊠typically, they end up improving one of our technologies or features.âThatâs a great example of how to turn innovation into impact.
Goal: Use data and AI to anticipate issues before they impact users or systems.
Jerry painted a vision of proactive operations. Imagine a system that alerts you before a code change causes an outage:
âHey Kevin, if you push this, youâre not going to that hockey gameâthereâs a 94% chance itâll cause an outage.â
This kind of insight requires data from across the lifecycleâfrom GitHub commits to incident logsâand the ability to connect it meaningfully.
âGreat SRE practices are fueled by innovations that lower costs, reduce time to fix issues, and ultimately improve customer sentiment.â
Itâs not just about automationâitâs about contextual automation that understands business impact and helps teams act before problems escalate.
Goal: Ensure that innovation doesnât stay in a sandboxâit becomes part of the product.
Steven emphasized that innovation must be actionable. Every Area 631 project is designed to âgraduateâ into IBMâs core product lines.
âOnce they graduate, we decide where they fit in⊠typically, they end up improving one of our technologies or features.â
He shared how one projectâa sales configurator for Cloud Paksâstarted as a prototype and is now used across IBM to simplify complex licensing and pricing.
âThe value is pretty obvious⊠itâs working, itâs needed, and itâs improving how we operate.â
Goal: Normalize failure as a necessary part of learning and innovation.
One of the most powerful moments came when Steven talked about failure.
âWe canât be too busy to innovate, so we have to create that environment.â
But creating space isnât enoughâyou also have to make it safe to fail.
âItâs okay to fail. We forget that sometimes. Everyone wants to try something and expect it to just work or be successful all the time. In fact, thatâs far from reality.â
By giving teams a time-boxed, low-risk environment, Area 631 encourages experimentation without fear. Itâs not about failing fastâitâs about failing smart, learning quickly, and applying those lessons to build something better.
Both episodes reminded me that automation and innovation are two sides of the same omelette. Automation gives us the time and space to innovate. Innovation gives us the tools to automate smarter. And both require trust, collaboration, and a willingness to fail forward and learn fast.
They are also not one-time efforts. Theyâre iterative, and they require cultural support. Whether itâs automating log analysis or building a new sales configurator, the key is to start small, measure impact, and build from there.
Whether itâs Jerryâs vision of autopilot for IT or Stevenâs Shark Tank-style Hyper Blue program, the message is clear: we need to make room for creativity, invest in people, and build systems that learn and adapt.
These conversations reminded me that the best strategies are the ones that fit into the way teams actually work. Focus automation where toil are bottlenecks. Create loops where innovation can be tested and refined. And always keep the feedback flowing.
đ§ Whatâs one piece of tribal knowledge in your team that should be codified today?
đ ïž If you had three months and no blockers, what breakthrough would you chase?
đ€ How do you build trust in automation within your teamâwhatâs your explainability strategy?
đ Whatâs your teamâs process for measuring the impact of automation over time?