Gremlin launches Disaster Recovery Testing for zone, region, and datacenter failovers
Gremlin, the proactive reliability platform, launched Disaster Recovery Testing: a new product built to safely and efficiently test zone, region, and datacenter evacuations and failovers. These large-scale tests ensure businesses maintain digital resilience and business continuity when faced with cloud migrations, compliance concerns, and catastrophic events.

There were multiple high-profile cloud outages in 2025, such as the AWS us-east-1 zone outage in October 2025 impacting 70,000 companies and incurring losses estimated at $581 million, that expose why business leaders relying on single clouds or regions must rethink their business continuity strategy.
“Many Business Continuity Plans and Disaster Recovery Plans include security-centric catastrophic scenarios, such as ransomware or malware taking over a data center. By using Disaster Recovery Testing, companies can shut off traffic or create other failure conditions in order to verify that the backup mechanisms engage correctly. A good way to think of it is as a new level of fuzzing. The environmental conditions become an input like any other. As an example, does your authentication system behave as expected in the presence of dependency failures, NTP failures, or certificate expirations?” Fred Bull, Security Officer at Gremlin, told Help Net Security.
As scaling companies prepare to IPO, Gremlin’s reporting capabilities assist in proving digital resilience in their S-1 filings for the SEC. These Gremlin-generated reports also support public companies in creating their 10-K annual filings that detail business operations and risks, providing a comprehensive overview of proactive reliability efforts for investors, regulators, and the public.
Key features of Gremlin Disaster Recovery include:
- Company-wide testing: Organizations can simulate the impact of major failures such as zone and region outages across the entire organization from a central command center.
- Enhanced safety measures: Health Checks automatically halt tests and return services to a healthy state to guarantee system integrity during testing.
- Reliability reports: The Gremlin platform produces detailed reports on service performance that identify weaknesses and prioritize remediation efforts. Gremlin has collaborated with dozens of Fortune 1000 companies, including four out of the top five U.S. banks, to facilitate effective zone and region-level failover tests. The expertise gained through these partnerships informs Gremlin’s approach to disaster recovery planning, providing clients with tailored guidance and support to optimize their testing strategy.
“Businesses and consumers worldwide expect Visa’s applications to be continuously available and deliver strong performance, even during major outages or provider failures,” said Sreekanth Rajagopal, Head of Non-Functional Testing at Visa Cross-Border Solutions. “Disaster Recovery Testing gives us a fast, centralized way to continuously validate and demonstrate our resilience to catastrophic events so we can stay prepared and keep services online.”