This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

DHCP WAN not working after HA failover?

I have the most recent UTM software appliance running in HA mode. ISP is verizon FIOS (DHCP address on WAN). I've noticed that when an HA failover occurs, the host that is now master no longer has WAN connectivity - the dashboard shows an error indication for the WAN interface. If I go into the Interfaces screen, and click the 'renew' button, everything comes up just fine, but this is obviously less than optimal :) I am using a 24-port edgeswitch, with ports 17, 18 and 19 in VLAN 2. The two UTM appliances connect to 18 and 19, and the verizon ONT (think cable modem) connects to 17. At first I thought this was some kind of spanning tree issue, but all 3 ports are configured as edge ports, so they should start working very quickly. I did the most recent firmware update this AM, and it of course had to update and reboot both nodes. Both times, the WAN interface in UTM showed as down, with an error indication. ssh to the master node and I see this:

2019:07:17-07:21:13 gateway-1 dhclient: DHCPREQUEST for XXX on eth1 to 255.255.255.255 port 67
2019:07:17-07:29:08 gateway-2 dhclient: DHCPREQUEST for XXX on eth1 to 255.255.255.255 port 67
2019:07:17-08:41:01 gateway-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 7
2019:07:17-08:41:01 gateway-1 dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6
2019:07:17-08:41:01 gateway-1 dhclient: DHCPOFFER of XXX from QQQ
2019:07:17-08:41:01 gateway-1 dhclient: DHCPREQUEST for XXX on eth1 to 255.255.255.255 port 67
2019:07:17-08:41:01 gateway-1 dhclient: DHCPACK of XXX from QQQ

07:21:13 was when I told the UTM to perform the update. You can see it did dhcprequest twice, with no answer, then gave up. 08:41:01 is when I noticed I was off the air, and clicked the 'renew' button, at which point it did the full sequence of operations. I freely admit I'm not that savvy with switching protocols, so I'm not sure what is going on here. Any help would be appreciated. As things are now, HA isn't really giving me any benefit, as a failure will cause a failover, but I will still be off the air :(

This thread was automatically locked due to age.

Parents

0 solae over 5 years ago

I‘m seeing the exact same issue at some of my customers‘s Networks.

After a Failover the new Master is using the same IP and Virtual MAC but there is no Connection (Error in Webadmin). After i renew the IP on the Interface (same IP) the connection is immediately online again.

In Switzerland there are many FTTH Provider that deliver a fixed IP in DHCP Mode (but just always the same IP). So we have to use DHCP WAN even if it is a „fixed“ IP.

I do not always see this Problem but i think that this phenomen exists since many months, maybe years.

I will try the suggestion to fix the Interface to 1000/Full, even if on the other side (HP Procurve) i can only select 1000/Auto (i think that‘s because of some RFC in Gigabit Ethernet).

I think that this Problem is not ISP related because how should the ISP block or even notice the Failover if the MAC (Virtual) stays the same? I also do not think that there is a DHCP Release at Failover, isn‘t it? So the ISP should not notice anything about this.

I‘m curious about other replys and expiriences.

- Michael
Cancel
Vote Up 0 Vote Down

Cancel
0 solae over 5 years ago in reply to solae

Unfortunately it has not worked, i had the same Issue again even though i had configered the WAN Interface for 1000/full. After shortly disconnect the Cable of the WAN interface (just 1-2 seconds) the connection was back.

There is no DHCPREQUEST or something in the Logfile, so the UTM seems to think the Internet is all good...
Cancel
Vote Up 0 Vote Down

Cancel
0 dswartz over 5 years ago in reply to solae

Does it only happen on failback like in my case?
Cancel
Vote Up 0 Vote Down

Cancel
0 solae over 5 years ago in reply to dswartz

No, respectively, i made an Upgrade of the Firmware (the Firewalls were online some Days at this configuration), and after the updated Slave was online an took over (master still at the old firmware) the Connection to the Internet was lost.
Cancel
Vote Up 0 Vote Down

Cancel
0 dswartz over 5 years ago in reply to solae

Well, damn. I *thought* forcing the speed to 100/full worked, but no. Just applied most recent firmware. Master failed over to slave, and... WAN offline until I hit 'renew'. This is really a showstopper for me. I don't *need* HA really - I wanted it in case of failure or so I could apply updates with no downtime, but if I have to sit here and click 'renew', that's not at all helpful :(
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 5 years ago in reply to dswartz

What does Sophos Support say about this?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 dswartz over 5 years ago in reply to BAlfson

I haven't contacted them. I'm running the free license version, so I assume I'm SOL?
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 dswartz over 5 years ago in reply to BAlfson

I haven't contacted them. I'm running the free license version, so I assume I'm SOL?
Cancel
Vote Up 0 Vote Down

Cancel

Children

0 BAlfson over 5 years ago in reply to dswartz

Scanning back over this thread and noticing that you have a preferred master reminded me that I don't trust that functionality. How about if you just leave that off and perform a manual failback with ha_daemon -c takeover at the command line - does that give the desired result?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 dswartz over 5 years ago in reply to BAlfson

Yeah, I'll give that a try, thanks!
Cancel
Vote Up 0 Vote Down

Cancel
0 dswartz over 4 years ago in reply to dswartz

Sorry for the late reply. Somehow, I never got an email ping for your latest reply. I've given up on HA for now, as the failure doesn't to be related to failback (e.g. I've seen more than one instance where the DHCP client doesn't get a reply quickly enough and gives up [apparently forever]). If I could somehow hook into the HA code, so that it ran a shell script on node X when node X takes over service, I could sleep for a few seconds and then run whatever sophos script kicks the dhcp client into action. Bummer, because this is so potentially useful to me...
Cancel
Vote Up 0 Vote Down

Cancel