Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Firewall DHCP Relay stops working until you delete an recreate a random DHCP Relay object

This issue is annoying us for years and happened today again after one year of being working.

XG 430 with lag and SFOS 19.5.3

XG has several VLAN. On one VLAN a Windows DHCP Server is serving DHCP addresses.

On several other VLAN configured also on XG there are DHCP forwarders pointing to the Windows DHCP server.

At some point the Clients will no longer receive DHCP offers and they do not get IP addresses anymore.

This situation only stops with a firewall reboot or when you delete any DHCP relay object on the XG and recreate it.

Then the clients will get IP addresses immediately.

Today it happened again I deleted a RED15 on the XG and powered on an other RED15W. Both have DHCP servers.

I have had several cased open since 2021 with GES and it cost a lot of time and frustration. They never found out anything helpful. Want us to reproduce the issue. But this is impossible - we have no idea how to reproduce it. We can only start logging and put logs to debug after it occoured.

Cases handling the issue were:

05521277 / direct to 2nd Level: XG DHCP server or DHCP relay failing after some time - clients not receiving DHCP offer

05158330 / 05128430 / XG DHCP server or DHCP relay failing after some time - clients not receiving DHCP offer

04704295 / XG DHCP server or DHCP relay failing after some time - clients not receiving DHCP offer

03953883 / DHCP Relay not working until deletion and recreation of a random DHCP Relay object 

You can see on XG, it is not sending DHCPREPLY, this only starts again, when you recreated the dhcp relay

172.16.xxx.xxx is the Windows DHCP Server Relay IP address.

XG430_WP02_SFOS 19.5.3 MR-3-Build652 HA-Primary# tail -f networkd.log
udhcpc: sending discover
Forwarded BOOTREQUEST for 54:e1:ad:76:c0:f2 to 172.16.xxx.xxx
Forwarded BOOTREQUEST for ec:79:49:4e:99:57 to 172.16.xxx.xxx
Forwarded BOOTREQUEST for e8:80:88:54:61:5e to 172.16.xxx.xxx
udhcpc: sending discover
Forwarded BOOTREQUEST for e8:80:88:54:61:5e to 172.16.xxx.xxx
Forwarded BOOTREQUEST for 28:16:ad:3a:4c:83 to 172.16.xxx.xxx
Forwarded BOOTREQUEST for ec:79:49:4e:99:57 to 172.16.xxx.xxx
....
dhcp relay recreation
....
udhcpc: sending discover
Forwarded BOOTREQUEST for 60:5b:30:00:29:1f to 172.16.xxx.xxx
Forwarded BOOTREPLY for 60:5b:30:00:29:1f to 172.16.aaa.aaa
Forwarded BOOTREQUEST for 60:5b:30:00:29:1f to 172.16.xxx.xxx
Forwarded BOOTREPLY for 60:5b:30:00:29:1f to 172.16.aaa.aaa
Forwarded BOOTREQUEST for 60:5b:30:00:29:1f to 172.16.xxx.xxx
udhcpc: sending discover
Forwarded BOOTREQUEST for e4:46:b0:3a:04:0a to 172.16.xxx.xxx
Forwarded BOOTREPLY for e4:46:b0:3a:04:0a to 192.168.bbb.bbb
Forwarded BOOTREQUEST for e4:46:b0:3a:04:0a to 172.16.xxx.xxx
Forwarded BOOTREPLY for e4:46:b0:3a:04:0a to 192.168.bbb.bbb
udhcpc: sending discover



Added V19.5 MR3 TAG
[edited by: Erick Jan at 1:58 AM (GMT -8) on 10 Jan 2024]
[gesperrt von: LuCar Toni um 9:03 AM (GMT -7) am 23 Jul 2024]
Parents Reply Children
  • from the graph above and the Admin Audit log i would say it started with RED deletion

    I'm in the process of creating a new support case

    ->

    Case Number
    07174270
  • Hey  , 

    Thanks for the service request number, will get the SR expedited !!  

    Thanks & Regards,
    _______________________________________________________________

    Vivek Jagad | Team Lead, Technical Support, Global Customer Experience

    Log a Support Case | Sophos Service Guide
    Best Practices – Support Case  | Security Advisories 
    Compare Sophos next-gen Firewall | Fortune Favors the prepared
    Sophos Community | Product Documentation | Sophos Techvids | SMS
    If a post solves your question please use the 'Verify Answer' button.

  • So - Could you actually reproduce this by deleting one RED? Because the way for success here is to be able to reproduce this problem. 

    __________________________________________________________________________________________________________________

  • I have some spare RED15 I could delete - wonder if I will junk them all? see my other recent post...

  • some still work. So I deleted and recreated them but the issue did not come back.

    the Cluster had been rebooted last 4 days ago. I think this was the first network change after. Maybe it only happens in that combination - FW reboot, delete RED, DHCP issue? not so easy to test - this is no playground.

    But I found a comment of mine in earlier case notes:

    9/20/2022 4:59 PM 05662019 / direct to 2nd Level: XG DHCP server or DHCP relay failing after some time - clients not receiving DHCP offer

    We cannot reproduce it. It just comes. Eventually it has to do with RED changes. Last time I deleted and recreated a RED. 1 hour later or so we noticed the issue. You should have it in the logs.

    Also at that time I tried to delete and recreate RED again - issue not reproduced but some how related.

  • So - my understanding from a past issue with this situation was: If there is a "Huge" DHCP relay config and one interface went offline, it can cause problems. After removing the faulty options (there were instances like customers added not support XFRM to the DHCP Relay and break it), it worked again for ages. 

    Not sure, what causes your problem, the Team will look into your time stamp on your appliance to check, if we can find an indicator of your problem. But still, in IT it is the hardest to debug an problem, which is not reproducible (Like you properly know with the SATC issue). 

    __________________________________________________________________________________________________________________

  • I'd say it's not a huge DHCP config. 20 DHCP servers on XG and a bunch of relays.

    I'd expect hundreds to be huge.

    Last weekend I rebooted both FW nodes and will recreate a RED deletion tomorrow if that may reproduce it. The initial RED I deleted is broken unfortunately so it will be a different one. But both have (had)  their own DHCP servers.

    In the meantime the support case is going to my statisfaction - I guess I have to thank   for that. They are looking in the backup before the first deletion of the RED and communication is really well currently.

  • will recreate a RED deletion tomorrow if that may reproduce it.

    unfortunately it did not

  • Hi LHerzog,

    Upon checking your case, the case handler is seeking assistance with the access-id.

    "I have started the investigation and I am still going through the available logs, but I have to re-check the old case as well.
    Therefore I need a bit more time to get a better overview of what we have so far.

    In parallel could you please provide me a new access id because the known one seems to be expired"

    Erick Jan
    Community Support Engineer | Sophos Technical Support
    Sophos Support Videos Product Documentation  |  @SophosSupport  | Sign up for SMS Alerts
    If a post solves your question use the 'Verify Answer' link.

  • update sent via case mail. thank you!