Geo-replication in Azure SignalR
Companies seeking local presence or requiring a robust failover system often choose to deploy services across multiple Azure regions. With the integration of geo-replication in Azure SignalR, managing multi-region scenarios has become significantly easier.
Benefits of using geo-replication
- More resilient to regional outage: If a regional outage happens, the Azure SignalR DNS will be resolved to healthy replicas in other regions.
- Cross Region Communication. Different replicas could communicate with each other as if they are the same instance.
- Enhanced network speed: Geographically dispersed clients will connect to the nearest replica. These replicas communicate through Azure global network backbone, ensuring fast and stable networking.
- Shared configurations. All replicas retain the primary Azure SignalR Service resource's configuration.
Prerequisites
- An Azure SignalR Service in Premium tier.
Create a SignalR replica
To create a replica, Navigate to the SignalR Replicas blade on the Azure portal and click Add to create a replica. It will be automatically enabled upon creation.
After creation, you would be able to view/edit your replica on the portal by clicking the replica name.
Note
- The replica count is currently limited to a maximum of 8 per primary resource.
Pricing and resource unit
Each replica has its own unit
and autoscale settings
.
Replica is a feature of Premium tier of Azure SignalR Service. Each replica is billed separately according to its own unit and outbound traffic. Free message quota is also calculated separately.
There will be egress fees for cross region outbound traffic. If a message is transferred across replicas and successfully sent to a client or server after the transfer, it will be billed as an outbound message.
Delete a replica
After you've created a replica for your Azure SignalR Service, you can delete it at any time if it's no longer needed.
To delete a replica in the Azure portal:
- Navigate to your Azure SignalR Service, and select Replicas blade. Click the replica you want to delete.
- Click Delete button on the replica overview blade.
Understand how the SignalR replica works
The diagram below provides a brief illustration of the SignalR Replicas' functionality:
- The client negotiates with the app server and receives a redirection to the Azure SignalR service. It then resolves the SignalR service's Fully Qualified Domain Name (FQDN) —
contoso.signalr.azure.cn
. This FQDN points to a Traffic Manager, which returns the Canonical Name (CNAME) of the nearest regional SignalR instance. - With this CNAME, the client establishes a connection to the regional instance (Replica).
- The two replicas will synchronize data with each other. Messages sent to one replica would be transferred to other replicas if necessary.
- In case a replica fails the health check conducted by the Traffic Manager (TM), the TM will exclude the failed instance's endpoint from its domain resolution process. For details, refer to below Resiliency and Disaster Recovery
Note
- In the data plane, a primary Azure SignalR resource functions identically to its replicas
Resiliency and disaster recovery
Azure SignalR Service utilizes a traffic manager for health checks and DNS resolution towards its replicas. Under normal circumstances, when all replicas are functioning properly, clients will be directed to the closest replica. For instance:
- Clients close to
chinaeast2
will be directed to the replica located inchinaeast2
. - Similarly, clients close to
chinanorth2
will be directed to the replica inchinanorth2
.
In the event of a regional outage in chinaeast2 (illustrated below), the traffic manager will detect the health check failure for that region. Then, this faulty replica's DNS will be excluded from the traffic manager's DNS resolution results. After a DNS Time-to-Live (TTL) duration, which is set to 90 seconds, clients in chinaeast2
will be redirected to connect with the replica in chinanorth2
.
Once the issue in chinaeast2
is resolved and the region is back online, the health check will succeed. Clients in chinaeast2
will then, once again, be directed to the replica in their region. This transition is smooth as the connected clients will not be impacted until those existing connections are closed.
This failover and recovery process is automatic and requires no manual intervention.
For server connections, the failover and recovery work the same way as it does for client connections.
Note
- This failover mechanism is for Azure SignalR service. Regional outages of app server are beyond the scope of this document.
Disable or enable the replica endpoint
When setting up a replica, you have the option to enable or disable its endpoint. If it's disabled, the primary FQDN's DNS resolution won't include the replica, and therefore, traffic won't be directed to it.
You can also enable of disable the endpoint after it's been created. On the primary resource's replicas blade, click the ellipsis button on the right side of the replica and choose Enable Endpoint or Disable Endpoint:
Before deleting a replication, consider disabling its endpoint first. Over time, existing connections will disconnect. As no new connections are coming, the replication becomes idle finally. This ensures a seamless deletion process.
This feature is also useful for troubleshooting regional issues.
Note
- Due to the DNS cache, it may take several minutes for the DNS update to take effect.
- Existing connections remain unaffected until they disconnect.
Impact on performance after adding replicas
After replicas are enabled, clients will naturally distribute based on their geographical locations. While SignalR takes on the responsibility to synchronize data across these replicas, you'll be pleased to know that the associated overhead on Server Load is minimal for most common use cases.
Specifically, if your application typically broadcasts to larger groups (size >10) or a single connection, the performance impact of synchronization is barely noticeable. If you're messaging small groups (size < 10) or individual users, you might notice a bit more synchronization overhead.
To ensure effective failover management, it is recommended to set each replica's unit size to handle all traffic. Alternatively, you could enable autoscaling to manage this.
For more performance evaluation, refer to Performance.
Non-Inherited and Inherited Configurations
Replicas inherit most configurations from the primary resource; however, some settings must be configured directly on the replicas. Below is the list of those configurations:
- SKU: Each replica has its own SKU name and unit size. The autoscaling rules for replicas must be configured separately based on their individual metrics.
- Shared private endpoints: While shared private endpoints are automatically replicated to replicas, separate approvals are required on target private link resources. To add or remove shared private endpoints, manage them on the primary resource. Do not enable the replica until its shared private endpoint has been approved.
- Log Destination Settings. If not configured on the replicas, only logs from the primary resource will be transferred.
- Alerts.
All other configurations are inherited from the primary resource. For example, access keys, identity, application firewall, custom domains, private endpoints, and access control.