Important note about SSL VPN compatibility for 20.0 MR1 with EoL SFOS versions and UTM9 OS. Learn more in the release notes.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

DHCP relay issue after upgrade to SFOS 17.0.1 MR-1

We are running our XG 210 on version 16 for a few months now. Our network configuration has 3 VLAN's. DHCP for couple of the VLANS is setup as relay to Windows Domain controller and for another VLAN was setup as XG Firewall. Everything was working like a charm till the firmware upgrade.

Yesterday I upgraded to the firmware to v17 and then MR-1. Since the upgrade the devices on DCHP relay VLAN's  are not allocated IP address. I see a DHCP lease is created on Windows domain controller but IP is not allocated to the device.

I see posts where others had similar issues with previous versions, I am not exactly sure if any of those are applicable to my scenario. Any help in resolving is much appreciated.

- Kamal

 

Update 01

As mentioned in other posts with similar issues, I ran the following command and can see the entry for a MAC address matchine one of the devices.

console> drop-packet-capture

Date=2017-11-28 Time=10:16:08 log_id=0103021 log_type=Firewall log_component=Local_ACLs log_subtype=Denied log_status=N/A log_priority=Alert duration=N/A in_dev=Port1.300 out_dev= inzone_id=10 outzone_id=4 source_mac=38:a4:ed:67:41:25 dest_mac=ff:ff:ff:ff:ff:ff l3_protocol=IP source_ip=0.0.0.0 dest_ip=255.255.255.255 l4_protocol=UDP source_port=68 dest_port=67 fw_rule_id=0 policytype=0 live_userid=0 userid=0 user_gp=0 ips_id=0 sslvpn_id=0 web_filter_id=0 hotspot_id=0 hotspotuser_id=0 hb_src=0 hb_dst=0 dnat_done=0 proxy_flags=0 icap_id=0 app_filter_id=0 app_category_id=0 app_id=0 category_id=0 bandwidth_id=0 up_classid=0 dn_classid=0 source_nat_id=0 cluster_node=0 inmark=0x0 nfqueue=0 scanflags=0 gateway_offset=0 max_session_bytes=0 drop_fix=0 ctflags=0 connid=863434432 masterid=0 status=256 state=0 sent_pkts=N/A recv_pkts=N/A sent_bytes=N/A recv_bytes=N/A tran_src_ip=N/A tran_src_port=N/A tran_dst_ip=N/A tran_dst_port=N/A



This thread was automatically locked due to age.
Parents
  • I was running DHCP Relay in version 16 with no issues.  I upgraded to version 17 and DHCP Relay stopped working.  After reviewing my config, I noticed the DHCP Relay was enabled on the interface my DHCP server was running on.  After I removed the DHCP Relay from the network interface my DHCP server was in, all the other network DHCP Relays to the DHCP server started working again.  Since the DHCP server is in the network it really does not need a DHCP Relay agent on the interface.  I tested several times to confirm this is the case.

     

    Example:

    If DHCP Server is 192.168.10.10 on Network 192.168.10.1/24 adding a DHCP Relay on this network for the server will cause all DHCP Relays to the server to stop working.

     

    I hope this helps anyone in the same situation.

     

    Alan 

  • Thank you Alan. My setup was what you have described. I upgraded to SFOS 17.0.6 MR-6 and removed DHCP relay on subnet where DHCP server is located and everything is working perfectly so far.

     

    Regards

    Kamal

  • Exactly the same thing happened to me, this week I decided to update the firewall from SFOS version 16.05.8 MR-8 to SFOS 17.0.8 MR-8. I have a network segmented in VLAN and everything worked fine until when updating, half of the computers went into network and the other half did not (dhcp server windows did not serve IP address). I have tried to do what Alan says (remove the DHCP relay from the subnet where i have the DHCP Server) but it has not worked. I have an open ticket with support but in the end I decided to start the firewall with the image SFOS 16.05.8 MR-8 and everything has returned to normal.

  • Hi Jose,

    Could you try running the command under shell

    Go to Option 5>3

    #dhcrelay -i Port1.10 <DHCP Relay> -4 -d     for Vlan 10 on Port1

    Regards,

    Aditya Patel
    Global Escalation Support Engineer | Sophos Technical Support

    Knowledge Base  |  @SophosSupport | Sign up for SMS Alerts
    If a post solves your question use the 'This helped me' link.

  •  

    Hi Aditya Patel, 

    Mention that my network has several VLANs (VLAN 37-45). The DHCP Server is located on VLAN37

    I have released the command knowing that my DHCP Server is in LAG.0.37 and I get the following:

    XG310_WP01_SFOS 16.05.8 MR-8# dhcrelay -i Lag0.37 192.168.37.254 -4 -d
    Internet Systems Consortium DHCP Relay Agent 4.2.4-P2
    Copyright 2004-2012 Internet Systems Consortium.
    All rights reserved.
    For info, please visit https://www.isc.org/software/dhcp/
    INFO: recv_netlink_req: 3680 bytes received

    INFO: recv_netlink_req: 3764 bytes received

    INFO: recv_netlink_req: 2536 bytes received

    INFO: recv_netlink_req: 2616 bytes received

    INFO: recv_netlink_req: 3768 bytes received

    INFO: recv_netlink_req: 3572 bytes received

    INFO: recv_netlink_req: 2572 bytes received

    INFO: recv_netlink_req: 2740 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 20 bytes received

    NLMSG_DONE received

    Number of interfaces: <1>
    check_relay_port: index '20' found for Lag0.37

    INFO: recv_netlink_req: 96 bytes received

    find_route: index '20' found for 192.168.37.254

    is_relay_ok: Lag0.37 is out interface for 192.168.37.254, aborting registration

    Listening on LPF/Lag0.37/00:1a:8c:5c:c2:f4
    Sending on   LPF/Lag0.37/00:1a:8c:5c:c2:f4
    Sending on   Socket/fallback
    5 bad udp checksums in 5 packets
    Packet to bogus giaddr 192.168.38.10.

    Thank,
Reply
  •  

    Hi Aditya Patel, 

    Mention that my network has several VLANs (VLAN 37-45). The DHCP Server is located on VLAN37

    I have released the command knowing that my DHCP Server is in LAG.0.37 and I get the following:

    XG310_WP01_SFOS 16.05.8 MR-8# dhcrelay -i Lag0.37 192.168.37.254 -4 -d
    Internet Systems Consortium DHCP Relay Agent 4.2.4-P2
    Copyright 2004-2012 Internet Systems Consortium.
    All rights reserved.
    For info, please visit https://www.isc.org/software/dhcp/
    INFO: recv_netlink_req: 3680 bytes received

    INFO: recv_netlink_req: 3764 bytes received

    INFO: recv_netlink_req: 2536 bytes received

    INFO: recv_netlink_req: 2616 bytes received

    INFO: recv_netlink_req: 3768 bytes received

    INFO: recv_netlink_req: 3572 bytes received

    INFO: recv_netlink_req: 2572 bytes received

    INFO: recv_netlink_req: 2740 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 2576 bytes received

    INFO: recv_netlink_req: 20 bytes received

    NLMSG_DONE received

    Number of interfaces: <1>
    check_relay_port: index '20' found for Lag0.37

    INFO: recv_netlink_req: 96 bytes received

    find_route: index '20' found for 192.168.37.254

    is_relay_ok: Lag0.37 is out interface for 192.168.37.254, aborting registration

    Listening on LPF/Lag0.37/00:1a:8c:5c:c2:f4
    Sending on   LPF/Lag0.37/00:1a:8c:5c:c2:f4
    Sending on   Socket/fallback
    5 bad udp checksums in 5 packets
    Packet to bogus giaddr 192.168.38.10.

    Thank,
Children
  • Hi Jose,

    Could you run the command in Shell option 5>3 

    tcpdump -nei any port 67 or 68

    Output:

    tail -f /log/networkd.log

    Output

    ip route get <IPaddress of relay >

    Output:

    Regards,

    Aditya Patel
    Global Escalation Support Engineer | Sophos Technical Support

    Knowledge Base  |  @SophosSupport | Sign up for SMS Alerts
    If a post solves your question use the 'This helped me' link.

  • Good morning, right now I have the firewall with the version XG310_WP01_SFOS 16.05.8 MR-8 To perform the test I would need to do it with version 17 and having it in production is almost impossible to do. I will wait for version 17.1 to be released and if it continues to fail I will be able to release the catch, thank you very much for your help, regards

  • Hi there,

     

    I am also having such issues.

    DHCP relay via physical interface without VLAN works perfect.

    If I am using DHCP via VLAN it either takes 5 - 10 seconds until a device gets the IP lease back or I do not get any lease.

     

    I am using the newest version of XG -> SFOS 17.1.1 MR-1

     

    Looking forward to a fix.

  •  
    Hello Mathias Mühlbacher, sorry for the delay but I have been on vacation. I tell you:
    1.- I am waiting to update the Firewall. Currently through the graphical interface does not appear the possibility of downloading the latest update (now I have the SFOS 16.05.8 MR-8)
    2.- Have you found any solution regarding the DHCP ?, I have an open ticket and the last option offered is to update and look through the Logs that occur with DHCP. Obviously in a production environment I have refused, I do not see it seriously, thank you
     
    Greetings,
  • Hello Jose Manuel Gonzalez,

     

    no as it was not so vital for me I have deactivated the DHCP relay function for Sophos.

    But I could do some test on the newest Sophos XG Version and get back to you.

     

    Best regards,

    Mathias

  • Hi there,
    I have tested the DHCP relay function with my newest XG Version (SFOS 17.0.8 MR-8).
    The result is that DHCP relay over LAN works while DHCP releay over VLAN is not working.
    The log on my Linux server show DHCP offers and all those things but the tested device does still not get an IP address.
    Best regards,
    Mathias
  • Hi Everyone

    We are currently deploying 70 XG125 firewalls and run into the same issue.

    Our setup is:

    Port1 (nativ): Client VL1

    Port1.10 Client VL2

    Port1.20 Client VL3

     

    Port9 ("unused with dummy IP")

    Port9.110 Uplink / Default Gateway

     

    DHCP Relay is setup on Port1, Port1.10, Port1.20 and relays to a server that is reachable over Port9.110 behind the default gateway.

    We have noticed that after a reload of the XG DHCP Relay does not work properly anymore. 

    TCPdump showed us:

    Port1: Receives Discover

    Port9.110: relays Discover

    Port9.110: receives offer

    Port1: does not forward offer

     

    We removed VL10 and VL20, so that only Port1 and Port9.110 is involed, still does not work.

    During the tests we noticed, that if we re-apply the dhcp relay config in the GUI, everything works fine.

    From the logfiles we can see, that just after the reload, the "is_relay_ok" function fails.

    If I had to guess, i would say that after the reload the XG checks the relay, but maybe because Port9.110 is not up at that time, the relay server are marked dead and there is no periodic check to remark them alive?

     

  • This issue is now finally fixed with Version 18 with the introduction of this feature:

    "DHCP Relay Enhancements for Dynamic Routing Synchronizes dynamic routing updates (learned routes from OSPF) to DHCP relay, eliminating the need for manual reconfiguration."

     

    the problem in my case was, that the DHCP Server is behind the WAN interface (MPLS) and requires the default route to be installed in the routing table to be reachable. unfortunately during boot the DHCP Relay functions runs a check if the DHCP Server is reachable, but since default route is not installed at that moment, it fails.  this check runs only one and marks the DHCP server dead. 

     

     

    A workaround for pre V18 to force the DHCP Relay function to constantly check for DHCP Server:

     


    1. Mount the filesystem as read/write:

      mount -no remount, rw /

    2. Edit the startup scripts:

      vi /scripts/system/clientpref/customization_application_startup.sh

    3. Use vi to add the following line and save the file:

      nservice networkd:dhcprelay_up -dsnosync

    4. Write protect the filesystem again:

      mount -no remount, ro /

     

    To confirm the expected changes have applied, running cat on the startup script should now look like the following:


    # cat /scripts/system/clientpref/customization_application_startup.sh
    #!/bin/sh
    nservice networkd:dhcprelay_up -dsnosync

    exit 0;

     

     

    you need to reapply the workaround after an upgrade within the 17.x train.

    The bug is fixed with V18.