Lori Schafer in Forbes Tech Council: Expert Strategies To Boost Cloud Reliability And Disaster Recovery
- Tori Hamilton

- Dec 10, 2025
- 5 min read

Read the full article in Forbes Technology Council here.
As companies continue shifting mission-critical systems to the cloud, they’re discovering that 24/7/365 reliability isn’t a given. Even brief outages can interrupt sales, slow internal workflows and erode customer trust, putting real dollars on the line. And while cloud platforms offer impressive flexibility and scale, they also introduce new layers of complexity that require intentional planning.
Below, members of Forbes Technology Council share their perspectives on how organizations can strengthen cloud resilience and prepare for the unexpected. Their insights highlight the mindset shifts and strategic foundations that help companies rebound quickly when cloud services stumble.
Identify Core Business Journeys
Focus on workflow resilience rather than system uptime alone. Identify the core business journeys that must continue, such as login, order placement and billing, and ensure there are fallback paths if any cloud dependency fails. This may involve using cached access, queuing transactions or shifting to temporary read-only modes. The goal is not zero outages, but zero business paralysis. - Arun Goyal, Octal IT Solution LLP
Make DR A Continuous Pipeline
Make disaster recovery a continuous pipeline—every change builds a recoverable environment from code, replays the last hour of anonymized traffic, restores from immutable backups, and verifies data integrity with synthetic balances or checksums. If any step fails, block the release. Over time, reliability becomes a byproduct of delivery, not an afterthought rehearsed once a quarter, and auditors love the traceability. - Jagadish Gokavarapu, Wissen Infotech
Test Boundaries With Strong Instrumentation
The key is rethinking the relationship between applications and infrastructure. In data centers, teams were rewarded for over-provisioning and avoiding risk. Cloud reliability requires celebrating curiosity and experimentation. Give teams permission to test boundaries with instrumentation. True resilience comes from empowering teams to architect for dynamic scaling and fast recovery. - Nik Sathe, Blackhawk Network
Adopt A Multicloud Or Hybrid Approach
Adopting a multicloud or hybrid (cloud and on-premises) approach to database replication and clustering enhances resilience by eliminating single-provider dependency. When properly designed, this architecture aligns with key business objectives, reducing the total cost of ownership, ensuring continuous operations and providing seamless failover for both high availability and disaster recovery. - Eero Teerikorpi, Continuent
Ensure Architectural Resilience
Cloud reliability starts with architectural resilience. Use multiregion redundancy, automated failover and real-time observability to detect and isolate issues fast. Combine this with AI-driven simulation of outage scenarios to strengthen disaster recovery before disruptions ever occur. - Lori Schafer, Digital Wave Technology
Keep Live Customer Data Mirrored Across Regions
Cloud service disruptions mean interruptions to business-critical services, which are highly disruptive to customers and often require hours to fix. It’s not enough just to duplicate services across two or more cloud regions. Today’s businesses also need to keep live customer data instantly mirrored across regions using fast, in-memory data storage that ensures seamless failover when outages occur. - William Bain, ScaleOut Software, Inc.
Design For Failure From Day One
A key strategy for improving cloud reliability and disaster recovery is to design for failure from day one. Build systems with redundancy, isolation and automated recovery at their core. The mindset shift is critical: Assume things will break and architect for graceful degradation rather than perfection. It’s not just about having a disaster recovery plan on paper. - Nick Damoulakis, Orases
Run A Continuously Bootable Recovery Twin
Run a recovery twin—a continuously bootable clone in another region or provider that replays production write-ahead logs. Every merge must pass a twin cutover (cold to hot) in CI. Do weekly red-switch drills and measure time-to-cash-restored. If the twin drifts, block deploys. Cells and queues help cap the blast radius. - Margarita Simonova, ILoveMyQA
Tap Into The Strengths Of Both On-Premises And Cloud Systems
Hybrid redundant architectures that combine the strengths of on-premises and cloud systems are essential for ensuring uptime as critical workloads move to the cloud. Balancing flexibility with reliability enables innovation while protecting business continuity. Agile, resilient systems turn disruption into growth. - Luiz Domingos, Mitel
Combine IaC, RPO and RTO Goals, And Defined Ownership
Design for failure, not around it. Build redundancy, automate recovery and regularly test failover scenarios to ensure plans work in real conditions. Combine infrastructure as code, clear recovery point objective and recovery time objective goals, and defined ownership to turn disaster recovery from a backup plan into a core resilience strategy. - Diwakar Dwivedi, Circular Edge
Maximize Cloud Isolation And Security
Seek solutions that maximize a cloud environment’s isolation and security. Reliability isn’t about preventing every outage; it’s about quick recovery without losing trust. Many teams assume their Active Directory backups are solid until they fail. Regularly test recovery methods, keep clean standbys and ensure the team knows the process. Tools matter, but people are key to effective recovery. - Robert Bobel, Cayosoft
Understand Your Responsibilities As A Cloud Customer
Cloud reliability is impressive—until it isn’t. The end customer remains responsible for ensuring their own business continuity when the cloud fails. More frequent, though, are change-related problems, either on the part of a cloud-based software supplier or the end user. The answer always begins with engaging in sufficient systems testing before any updates go live—especially mission-critical services. - Martin Taylor, Content Guru
Pair Deep Visibility With Automated Alerts And On-Call Workflows
You can’t recover from what you can’t see. Observability is essential, but it’s only part of the picture. Cloud reliability comes from pairing deep visibility with dependable, automated alerting and on-call workflows so critical signals trigger the right actions immediately—preventing downtime from causing increased impact. - Judit Sharon, OnPage Corporation
Avoid Relying Solely On Vendor SLAs
Design for failure from day one by building multiregion, automated failover architectures rather than relying solely on vendor service-level agreements. Treat resilience as code, using infrastructure-as-code templates, chaos testing and real-time observability to continuously validate recovery readiness. - Gowtham Chilakapati, humana.com
Rediscover The Value Of On-Prem Infrastructure
For use cases that cannot afford downtime, a more modern approach to the cloud is needed. Decades of new college grads have been taught a “cloud first” or “cloud only” mentality. But technology has advanced significantly; on-premises infrastructure is powerful, simple and reliable. A more balanced architecture that blends on-prem and cloud systems can make a huge difference in avoiding downtime. - Bruce Kornfeld, StorMagic
Invest In End-To-End Recovery Automation
Invest in recovery automation, not just redundancy. The fastest detection means nothing if recovery requires 20 manual steps across three teams. Script everything, including rollbacks, data verification and customer communication. Your disaster recovery runbook should be one command. Human judgment during a crisis is valuable—human hands performing routine tasks in the midst of panic is dangerous. - Marc Fischer, Dogtown Media LLC
Design Workloads To Run Across Providers
Reliability isn’t about preventing all failures; it’s about failing gracefully. Don’t treat resilience and disaster recovery as an afterthought. Design workloads to run simultaneously across providers. When one fails, traffic automatically routes to another. True cloud reliability comes from continuous testing, automation and cross-team readiness, not vendor guarantees. - Pabitra Saikia, Truist Bank
Fuse AI-Driven Observability With Self-Healing Architectures
The next frontier in cloud reliability is autonomous resilience. Businesses should fuse AI-driven observability with self-healing architectures that detect anomalies, reroute traffic and auto-recover workloads in real time. Don’t just back up data—teach your systems to adapt. The goal isn’t zero downtime; it’s zero disruption through intelligent, continuously rehearsed recovery. - Pawan Anand, Persistent Systems
Hold Quarterly Cross-Functional Failover Drills
Stop treating reliability like an IT fire drill—treat it like a dress rehearsal. Build “chaos confidence” with quarterly failover drills that include tech, operations and communications teams. When systems go down (and they will), the fastest recoveries aren’t improvised—they’re practiced. Resilience isn’t just built in code; it’s built in muscle memory. - Andrew Siemer, Inventive
Assume The Cloud Can Fail At Any Time
Reliability and disaster recovery strategies should be designed and architected assuming the cloud can fail at any time. A cloud-centric strategy should leverage fully managed multiregion, globally distributed services. Adopt a hybrid, multicloud and multivendor strategy to avoid overreliance on a single cloud provider. Hold frequent, intentional disaster recovery and resiliency drills and “well-architected reviews.” - Mrutyunjay Mohapatra, Alysian

