Disclaimer: Please contact Sophos Professional Services if you require assistance with your specific environment.
Also check out Sophos Firewall AA (Active-Active) deployment with Amazon Transit Gateway (TGW) in AWS!
In this document, we'll be talking about how to deploy the Sophos firewall in HA (High Availability) mode a.k.a. Active-Passive mode on the AWS platform. We will be using the Amazon transit gateway (TGW) feature to support the Hub and Spoke model for this deployment.
The transit gateway is used to facilitate node redundancy for the Sophos Firewalls and BGP is used to communicate the routing information with the rest of the AWS infrastructure in the customer account.
If you are interested and want to know more about this technology, check out the Amazon's documentation on Transit gateway: https://aws.amazon.com/transit-gateway/
Sophos Firewall is available from the AWS marketplace for both High Availability and Fault Tolerance methods of deployment, however in this document we will be focusing on High Availability deployment method.
It is recommended to deploy the Sophos Firewall nodes in a separate VPC for the traffic management and routing purposes.
While it is certainly possible to deploy the firewalls into the same VPC as other backend workloads, it will require different instructions for the TGW attachment and route table creation. Hence feel free to contact your Sophos account representative if your setup requires a single VPC deployment.
Here is the network diagram that we are considering for this deployment. Both the Sophos firewall instances will be deployed in a separate VPC, having connectivity with the LAN network VPC via the transit gateway.
Note: The IP addresses used in this setup and document are for demo purpose. You can always use other IP addresses in your deployment scenario.
"system gre tunnel add name TGW01 local-gw PortB remote-gw <Transit Gateway GRE address> local-ip <Peer BGP address> remote-ip <Transit Gateway BGP 1 address>"
bgp# configure terminal
bgp(config)# router bgp <This Firewall's ASN>
bgp(config-router)# neighbor <Transit Gateway BGP 1 IP> ebgp-multihop 2
bgp(config-router)# neighbor <Transit Gateway BGP 1 IP> activate
After deployment completes, the network load balancer used by the HA deployment will be configured to perform a health check on the firewall nodes using port TCP 4444.Since this port is part of the management port range affected by the Trusted Network security group, health checks are expected to fail due to the load balancer not being a part of said trusted network range.This is intentional as it avoids exposing the management ports or the load balancers to unintended traffic.In order to make the AWS Network Load Balancer functional, we recommend modifying the existing health check to match the service port used by the content published on the firewall. For example, if the WAF (Web Application Firewall) feature is being used to accept traffic on port TCP 443, we recommend setting the load balancer's health checks to use the same port. This ensures service delivery capabilities and health check status are aligned, making sure that failed firewall nodes are removed from service automatically.
One important thing to note regarding the HA deployment is that due to the default health check using the WebAdmin port (TCP 4444), both nodes are (technically) available from an uptime perspective.
This has the potential to become an issue when publishing resources publicly through the load balancer when not accounted for, as the load balancer only checks the target group's node availability by default, not the reachability of resources published through the nodes in the target group.
Given that both the firewall nodes appear up when checking TCP 4444, but traffic to the internet only gets routed out through one of the firewalls and this result in a potential black-hole scenario (when the original requester's source address is unchanged) for any traffic that flows in through the load balancer and subsequently gets directed at the secondary node.
Note: This issue only applies in scenarios where the original request's source address is not changed to match one of the firewall node's local IP addresses. This means that the network setup that use the WAF or SNAT combined with DNAT are not affected.
In order to prevent this from happening, Sophos suggests configuring the load balancer's target group health checks to not use the default port (4444), but to check the specific service port used by the backend service and to use DNAT on the firewall nodes to translate the health check port to the relevant the backend system(s).
To illustrate this concept, let's examine an example of DNAT health check:
This scenario assumes a backend server listening on TCP port 25, with the load balancer publicly using the same port.
For this setup to work and health checks to correctly fail for the secondary node the following needs to be configured:
With all this in place, the health check on the secondary node will fail, as the return traffic for any request routes through the secondary node will be sent to the primary firewall (as a result of routing preference on the TGW) where it is subsequently dropped as being out of sync.
This prevents black holes for traffic flowing in through the load balancer, as the secondary node will not be a viable target for the traffic until the primary node fails, upon which time the secondary node becomes the TGW's preferred route for the advertised network(s) and the health check succeeds.
For more details on health checks for target groups, see: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html
This concludes the Sophos Firewall HA deployment instructions in this document.
To use the security and scanning features of Sophos firewall, feel free to refer to online documentation repository available via following link: https://www.sophos.com/en-us/support/documentation/sophos-xg-firewall.aspx
Sophos firewall nodes deployed in the cloud have a different mechanism for HA, as compared to hardware sophos firewall devices.
Redundancy is achieved on the infrastructure level using AWS TGW…
This is a great set-up article. Thanks a bunch for the hard work and documentation.
One thing that is notably missing from this guide however is the actual set-up of HA under System Services > High Availability Configuration.
I understand this may be intentional because this is just a deployment guide, however this is the main focus of this set-up. Can you include set-up instructions, or at least a link to configuration instructions?
For anyone else wondering, I found this documentation here but I have not tested it yet: docs.sophos.com/.../index.html
Redundancy is achieved on the infrastructure level using AWS TGW, instead of device-level itself (by configuring System services > High availability).
The reason is that the Sophos Firewall HA solution designed for cloud deployments is having decoupled architecture, which means both the nodes act as independent firewall nodes and there is no direct communication/synchronization happening between them.
In order to have the same security and policies configuration on both the firewall nodes, we recommend using the firewall grouping functionality of Sophos Central, so that both the nodes are placed inside the same firewall group and all the configuration is done on the group level, which is automatically inherited by both the nodes.
Here's a how-to video link showing the group level management using Sophos Central: Multi-node cloud firewall management with Sophos Central - YouTubeHope this helps!
Ahh. I see. Thank you for the info. This helps a lot.