This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

High Availability and PPPoE?

I have HA setup and working. Both are on a esxi host with the active/standby on the same vswitch which is port channeled to a cisco switch where there is one connection to the internet via FTTC using PPPoE to authenticate.
If I reboot the master to test the failover, we lose the internet connection until it re authenticates. Is this normal behavior?



This thread was automatically locked due to age.
Parents
  • As zaphod mentioned, you'll need something else to do the authentication in front of your esxi host. My guess is you likely do since Sophos UTM doesn't have the necessary capability to do the PPPoE authentication but does handle the protocol. My current setup is similar with FTTH -> ISP Sagemcom modem/router for PPPoE auth -> Sophos UTM HA hardware. If you have the ability to place that ISP modem/router into bridged mode you'll avoid the double NAT problem. In my case I wasn't able to place the router in bridge mode because of IPTV so I've placed my UTM in the DMZ of the router. This allowed my UTMs to obtain an external IP rather than be NAT'ed through the ISP router.

  • The UTM does do the authentication. My setup is ISP > Hauwei modem > Cisco 2960s switch > ESXi Host

    The Cisco switch can be taken out of the equation as it's just there for vlan purposes so in effect it would be Hauwei modem > ESXi host

    The UTM is set for DSL (PPPoE) and the username/password is set there.

  • How do you have your vswitch setup? Each UTM VM requires it's own "internet" connection so in this case they would require a separate virtual network interface to connect them to the physical NIC that is connected to your Cisco switch. The UTMs also need a heartbeat virtual interface between each other. In short, each UTM VM should have 3 network connections...1 x WAN, at least 1 x LAN, and 1 x Hearbeat. Can you confirm this setup on your vswitch?

  • Yep.

    Both UTM's have:

    1x WAN (em0)  - PPPoE & vlan30 to Cisco switch which is connected to VDSL modem in bridge mode
    1x LAN (em1) - multiple vlans here eg lan, dmz1, dmz 2 etc connected back to Cisco switch via port channeled trunk
    1x HA link (em2)

    All are connected to the same vswitch on ESXi which allows all vlan tags through as the vlans are handled by the UTM.

    Basically, when I failover, I get one ping dropped between  the lan and the dmz (as expected) but the wan drops for about 30 secs due to it being PPPoE with authentication.

    Also, as I set a preferred master as master, 5 minutes after the master is rebooted to test and the slave takes over, PPPoE drops again as the UTM reverts back to the preferred master.

  • Ah, I see. It sounds like it's more a function of how long it takes the PPPoE auth than the actual UTM failover. Have you tried shutting down your primary UTM VM? I'm curious if your internet connection would remain down until the primary UTM VM is back up or if after the ~30 second wait, your failover UTM would assume primary function. If after shutting down your primary, you don't regain internet connectivity at all unless you restart the primary UTM then there's is something not working properly with the setup, otherwise I think you're at the mercy of your ISP's authentication speed.

    Another option is if this is a paid UTM license, you could try enabling active/active HA instead of active/passive. In active/active both UTMs would be providing connectivity and in the event of a failover, the second UTM should simply take over immediately since it has already authenticated with your ISP.

  • Rather than reboot within the UTM to test failover (as I have been doing), I actually shut the UTM 1 vm down and the UTM 2 vm did take over in exactly the same fashion eg 1 ping dropped between the lan & dmz's but a 30 second delay with the wan (PPPoE)


    I suspect this would work fine with an ethernet connection (authentication done elsewhere) but because it's a PPPoE connection with authentication on the UTM, I am indeed at the mercy of the ISP authentication unless Sophos could come up with a way of not instantly dropping the connection straight away

  • So it sounds like the failover is working as expected and the issue lies with how long the authentication process takes. Not sure there is anything you can do about that unless someone else has any ideas.

  • Active/Active doubles the cost of the license.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Not even sure if active/active would work either. I can understand it working if the was 2 x PPPoE separate connections but this is just one PPPoE connection.

    It's almost as if we want a delay in the hangup signal on the UTM or something like that to allow the upcoming slave enough time to grab the connection while it's still up.

    In this case, it's not as if the physical connection is interrupted because the modem is plugged into a switch which remains up at all times. The port that is connected to the ESXi server is also up at all times as well as the vswitch. The only thing that is actually turned off is the vm (UTM 1) and the standby vm (UTM 2) is constantly on.

    So I suspect, the UTM with PPPoE is sending out hangup signal instantly or the modem is detecting a hangup instantly.

Reply
  • Not even sure if active/active would work either. I can understand it working if the was 2 x PPPoE separate connections but this is just one PPPoE connection.

    It's almost as if we want a delay in the hangup signal on the UTM or something like that to allow the upcoming slave enough time to grab the connection while it's still up.

    In this case, it's not as if the physical connection is interrupted because the modem is plugged into a switch which remains up at all times. The port that is connected to the ESXi server is also up at all times as well as the vswitch. The only thing that is actually turned off is the vm (UTM 1) and the standby vm (UTM 2) is constantly on.

    So I suspect, the UTM with PPPoE is sending out hangup signal instantly or the modem is detecting a hangup instantly.

Children
  • Doesn't any momentary disconnection cause the ISP's equipment to request a re-auth?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • It's true that your second VM is constantly on but it's not authenticated until the heartbeat senses a disconnection, at which time it requests auth.

    My disconnect time is a bit shorter at around 5 seconds. I've even tried connecting the second UTM to another modem. My ISP allows 2 authentications from the same IP address so the thought was that if I have 2 full connection streams that the delay could be eliminated but that didn't work out. The backup UTM didn't authenticate until that disconnection and even with a second modem (each UTM connected to its own modem) I still experienced that delay.

  • Last time I worked with DSL service I simply had the modem handle the PPPoE authentication so you could do whatever you wanted to the network without hanging up the connection to the ISP. Could the same be done with your modem?

  • Unfortunately not. It's purely a vdsl modem which operates as a bridge and needs an ethernet device to authenticate. Obviously this could be a router but I'm not fussed really as this is a test rig.

    Our soon to be live rigs won't suffer from this as our upstream devices are Cisco routers or switches which handle the authentication there.