Design A High Availability Application With Azure App Service

This article aims to create a high-availability application using Microsoft Azure PaaS services, incorporating Cloud Native services whenever feasible. The application relies on caching, authentication, and document databases, with a strong emphasis on scalability and performance.

The Approach

The following high-level approach is recommended for designing a high-availability application using Azure App Service. This platform allows you to develop and host web applications, mobile backends, and RESTful APIs in your preferred programming language without the need to manage the underlying infrastructure. Azure App Service provides features like autoscaling and high availability and supports automated deployments from GitHub, Azure DevOps, or any other Git repository.

Define Availability Requirements

To effectively design and manage high-availability cloud workloads, it is essential to identify their usage patterns and establish relevant availability metrics. Here are the key metrics to consider.

  1. Percentage of Uptime: This metric measures the amount of time a system or service is operational and available, usually expressed as a percentage of total time. For example, an uptime of 99.9% means the system is down for no more than 8.76 hours per year.
  2. Mean Time to Recovery (MTTR): This is the average time required to restore a system to normal operation after a failure. It includes the time taken to detect the issue, diagnose the problem, and implement the fix.
  3. Mean Time Between Failures (MTBF): This metric measures the average time between failures of a system or component. A higher MTBF indicates better reliability and fewer failures over time.
  4. Recovery Time Objective (RTO): RTO is the maximum acceptable amount of time that a system, application, or function can be down after a failure or disaster occurs. It defines the target time to restore normal operations.
  5. Recovery Point Objective (RPO): RPO is the maximum acceptable amount of data loss measured in time. It indicates the point in time to which data must be recovered to resume normal operations after a disruption.

Plan your High Availability Architecture

  • Identify potential types of failures, their implications, and recovery strategies.
  • Ensure systems can fail gracefully and resume operations without service disruption. Isolate critical resources for enhanced reliability.
  • Replicate application data across two regions. During normal operations, route network traffic to the primary region. If the primary region becomes unavailable, reroute traffic to the secondary region.
  • Ensure data replication supports the redundancy strategy and aligns with the application’s RTO and RPO. The following table details the RPO for all consistency levels of an Azure Cosmos DB account deployed in at least two regions.
    Consistency level RPO in case of region outage
    Session, Consistent Prefix, Eventual < 15 minutes
    Bounded Staleness K & T
    Strong 0
  • K = The number of "K" versions (i.e., updates) of an item.
  • T = The time interval "T" since the last update.

For multi-region accounts, the minimum value of K and T is 100,000 write operations or 300 seconds.

Perform end-to-end testing

  • Perform fault injection testing to simulate different failure scenarios, including combinations of failures, and measure recovery time.
  • Run disaster recovery exercises, both planned and unplanned, to evaluate response capabilities.
  • Periodically review data from monitoring systems to ensure timely detection of failures.

Deploy Applications Consistently

  • Conduct fault injection testing to simulate various failure scenarios, including combinations of failures, and measure recovery time.
  • Execute disaster recovery exercises, both planned and unplanned, to assess response capabilities.
  • Regularly review monitoring system data to ensure timely detection of failures.

Architecture Diagram

The architecture shows proven practices for improving scalability and performance in an Azure App Service web application. Image reference taken from learn.microsoft.com.

Azure App

Technical Components

This reference architecture includes the following components.

Azure Components Description
Azure Web app Azure Web Apps is a fully managed service that can be leveraged to host web applications and REST APIs
Azure Front Door Front Door is a layer 7 load balancer and routes HTTP requests to the web front ends. Front Door also provides a web application firewall (WAF) that protects the application from common exploits and vulnerabilities.
Azure Function App Function Apps can be leveraged to run background tasks. Functions are invoked by a trigger, such as a timer event or a message being placed on the queue
Service Bus Queue Service Bus Queue primarily helps in load balancing of system-level messages. It provides pull-based messaging services that temporarily store messages in the queue so that the destination / consuming system can process messages
Azure Cache for Redis Azure Cache for Redis is a managed service that helps cache frequently used data to improve performance
Azure Cosmos DB Azure Cosmos DB is a multi-model, highly performant, and available NoSQL Database service
Azure Pipelines Azure Pipelines handles continuous deployment (CD) and release tasks, which consume the package versions exposed by GitHub Actions builds


Conclusion

Leverage Platform as a Service (PaaS) offerings as much as possible to significantly reduce operational overheads and streamline management. Ensure the application's design is resilient and stateless wherever feasible to fully benefit from scalability options. Deploy application components in multiple geographic regions to achieve higher availability and fault tolerance. Manage and provision infrastructure through code, utilizing Infrastructure as Code (IaC) practices instead of relying on manual processes.

Happy Learning!