This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

WAN port LAG does not quire IP address via DHCP

on our SG430 with 2-Port LAG for one WAN line, we do not receive IP address anymore so that WAN line is unusable.

Cluster is HA, Firmware version:        9.711-5

When I disable and re-enable the lag interface, it tries do get IP by DHCP but fails to receive an offer.

I confirmed the ISP router is providing IP addresses.

The LAG terminates at a WAN switch where the router is connected.

In the logs of SG I see:

<M> fw:/home/login # ifconfig -v eth2
eth2      Link encap:Ethernet  HWaddr 00:1A:8C:F0:22:C2
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:31068470 errors:0 dropped:2 overruns:0 frame:0
          TX packets:34109934 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2870126907 (2737.1 Mb)  TX bytes:6440355307 (6142.0 Mb)

<M> fw:/home/login # ifconfig -v eth6
eth6      Link encap:Ethernet  HWaddr 00:1A:8C:F0:22:C2
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:34802665 errors:0 dropped:1 overruns:0 frame:0
          TX packets:763212 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3043897001 (2902.8 Mb)  TX bytes:120283402 (114.7 Mb)

<M> fw:/home/login # ifconfig -v lag3
lag3      Link encap:Ethernet  HWaddr 00:1A:8C:F0:22:C3
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:4000300544 errors:104 dropped:331275 overruns:0 frame:57
          TX packets:1905373056 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5325821115540 (5079098.8 Mb)  TX bytes:286893189767 (273602.6 Mb)

<M> fw:/home/login # /var/mdw/scripts/dhcpc restart
[dhcpc] :: restart  - from pid=30031, parent_pid=8658(bash)
:: Stop - interface info missing!!
[dhcpc] :: flock released (parent=8658(bash))
[dhcpc] :: flock aquired (parent=8658(bash))
[dhcpc] :: Start - interface info missing!
[dhcpc] :: flock released (parent=8658(bash))
[ failed ]

dhclient runs:

14811         00:00:00 dhclient

2022:10:18-00:01:52 fw-320-2 [user:notice] ' 
2022:10:18-00:01:57 fw-320-2 [daemon:info] dhcp_updown[32355]:  lag3 - reason:FAIL
2022:10:18-00:02:27 fw-320-2 [user:notice] cluster_sync[31896]:   

2022:10:18-13:20:49 fw-320-2 dhclient: DHCPDISCOVER on lag3 to 255.255.255.255 port 67 interval 5
2022:10:18-13:20:54 fw-320-2 dhclient: DHCPDISCOVER on lag3 to 255.255.255.255 port 67 interval 9
2022:10:18-13:21:03 fw-320-2 dhclient: DHCPDISCOVER on lag3 to 255.255.255.255 port 67 interval 7
2022:10:18-13:21:10 fw-320-2 dhclient: No DHCPOFFERS received.
2022:10:18-13:21:10 fw-320-2 dhclient: No working leases in persistent database - sleeping.





This thread was automatically locked due to age.
Parents
  • Hello ,

    Thank you for reaching out to the community, are the number of errors and drops seen on Lag keeps on increasing ? 

    Thanks & Regards,
    _______________________________________________________________

    Vivek Jagad | Team Lead, Global Support & Services 

    Log a Support Case | Sophos Service Guide
    Best Practices – Support Case


    Sophos Community | Product Documentation | Sophos Techvids | SMS
    If a post solves your question please use the 'Verify Answer' button.

  • yes, the drops increase. errors not

    I have not taken tcpdump when dhcp discover runs, will do that later.

  • Okay, please share once you have collected the tcpdump !!

    Thanks & Regards,
    _______________________________________________________________

    Vivek Jagad | Team Lead, Global Support & Services 

    Log a Support Case | Sophos Service Guide
    Best Practices – Support Case


    Sophos Community | Product Documentation | Sophos Techvids | SMS
    If a post solves your question please use the 'Verify Answer' button.

  • there is no offer. we'll just reboot the machine tonight if there are better ideas.

    the issue just started on saturday where no admin did any chance on the infrastructure so it just happened.

    <M> fw:/home/login # tcpdump -i lag3 port 67 or port 68
    tcpdump: WARNING: lag3: no IPv4 address assigned
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on lag3, link-type EN10MB (Ethernet), capture size 65535 bytes
    13:59:25.808005 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    13:59:30.211947 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    13:59:41.035973 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:01:15.620977 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:01:22.234075 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:02:28.023973 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:02:31.203484 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:02:36.047992 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:03:31.784010 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:03:39.175970 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:04:57.212009 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:05:01.119966 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300
    14:05:11.847972 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:1a:8c:f0:22:c3 (oui Unknown), length 300

  • Try restarting the dhcp service from the GUI, system services > Services > DHCP server > STOP > START 
    But this will indeed impact other DHCP servers configured...
    OR You can also toggle on/off the status of the DHCP server from the Network > DHCP 

    Thanks & Regards,
    _______________________________________________________________

    Vivek Jagad | Team Lead, Global Support & Services 

    Log a Support Case | Sophos Service Guide
    Best Practices – Support Case


    Sophos Community | Product Documentation | Sophos Techvids | SMS
    If a post solves your question please use the 'Verify Answer' button.

  • but you are mentioning the DHCP server, SG is DHCP client here. Sure I should do that?

  • Sorry my bad for the UTM9 it is the following, Network services > DHCP > servers > toggle on/off 
    OR
    from the shell access:  /var/mdw/scripts/dhcpd restart

    Thanks & Regards,
    _______________________________________________________________

    Vivek Jagad | Team Lead, Global Support & Services 

    Log a Support Case | Sophos Service Guide
    Best Practices – Support Case


    Sophos Community | Product Documentation | Sophos Techvids | SMS
    If a post solves your question please use the 'Verify Answer' button.

  • I guess it would be dhcpc ? but that failed - can be seen in my first post.

    I will restart dhcpd.

    <M> fw:/home/login # ps aux | grep dhcp
    root     18219  0.0  0.0   5668   748 pts/0    S+   14:44   0:00 grep dhcp
    <M> fw:/home/login # cat /var/log/selfmon.log
    <M> fw:/home/login # cat /var/mdw-debug.log
    cat: /var/mdw-debug.log: No such file or directory
    <M> fw:/home/login # version

    Current software version...: 9.711005
    Hardware type..............: 430r1
    Serial number..............: S4000xxxxDF1
    Installation image.........: 9.403-4.1
    Installation type..........: ssi
    Installed pattern version..: 214529
    Downloaded pattern version.: 214529
    Up2Dates applied...........: 40 (see below)
                                 sys-9.403-9.404-4.5.3.tgz (Jul  1  2016)
                                 sys-9.404-9.405-5.5.1.tgz (Aug 18  2016)
                                 sys-9.405-9.406-5.3.1.tgz (Sep 26  2016)
                                 sys-9.406-9.407-3.3.1.tgz (Oct  7  2016)
                                 sys-9.407-9.408-3.4.1.tgz (Nov 10  2016)
                                 sys-9.408-9.409-4.9.1.tgz (Jan  4  2017)
                                 sys-9.409-9.410-9.6.1.tgz (Feb  6  2017)
                                 sys-9.410-9.411-6.3.3.tgz (Feb  9  2017)
                                 sys-9.411-9.412-3.2.2.tgz (May 30  2017)
                                 sys-9.412-9.413-2.4.3.tgz (May 30  2017)
                                 sys-9.413-9.414-4.2.3.tgz (Jul 12  2017)
                                 sys-9.414-9.501-2.5.1.tgz (Oct  7  2017)
                                 sys-9.501-9.502-5.4.1.tgz (Oct  7  2017)
                                 sys-9.502-9.503-4.4.2.tgz (Oct  7  2017)
                                 sys-9.503-9.504-3.1.4.tgz (Nov  3  2017)
                                 sys-9.504-9.505-1.4.1.tgz (Nov  3  2017)
                                 sys-9.505-9.506-4.2.2.tgz (Dec 13  2017)
                                 sys-9.506-9.507-2.1.4.tgz (Mar 26  2018)
                                 sys-9.507-9.508-1.10.1.tgz (Mar 26  2018)
                                 sys-9.508-9.509-10.3.2.tgz (Jun  8  2018)
                                 sys-9.509-9.510-3.5.2.tgz (Aug 20  2018)
                                 sys-9.510-9.600-5.5.1.tgz (Apr 11  2019)
                                 sys-9.600-9.601-5.5.2.tgz (Apr 11  2019)
                                 sys-9.601-9.602-5.3.1.tgz (Jul 20  2019)
                                 sys-9.602-9.603-3.1.1.tgz (Jul 20  2019)
                                 sys-9.603-9.604-1.2.1.tgz (Jul 20  2019)
                                 sys-9.604-9.605-2.1.4.tgz (Oct 10  2019)
                                 sys-9.605-9.700-1.5.2.tgz (Jan 11  2020)
                                 sys-9.700-9.701-5.6.1.tgz (Mar 28  2020)
                                 sys-9.701-9.702-6.1.1.tgz (Mar 28  2020)
                                 sys-9.702-9.703-1.3.3.tgz (Sep  1  2020)
                                 sys-9.703-9.704-3.2.3.tgz (Oct 14  2020)
                                 sys-9.704-9.705-2.3.1.tgz (Oct 14  2020)
                                 sys-9.705-9.706-3.8.1.tgz (May 20  2021)
                                 sys-9.706-9.706-8.9.1.tgz (Jul  3  2021)
                                 sys-9.706-9.707-9.5.1.tgz (Sep  8  2021)
                                 sys-9.707-9.708-5.6.1.tgz (Mar 12  2022)
                                 sys-9.708-9.709-6.3.1.tgz (Mar 12  2022)
                                 sys-9.709-9.710-3.1.1.tgz (May 19 17:31)
                                 sys-9.710-9.711-1.5.1.tgz (May 19 17:32)
    Up2Dates available.........: 1
    Factory resets.............: 0
    Timewarps detected.........: 1

    <M> fw:/home/login # rpm -qa | grep dhcp
    dhcp-chroot-client-4.4.1-3.g629f991.rb5
    dhcp-chroot-server-4.4.1-3.g629f991.rb5
    ep-chroot-dhcpc-9.70-14.gde59063.rb5
    ep-chroot-dhcps-9.70-15.ge43a374.rb6
    <M> fw:/home/login # cat /var/mdw/scripts/dhcpd
    #!/bin/bash
    #
    # Copyright (C) 2005-2010 Astaro AG
    # For copyright information look at /doc/astaro-license.txt
    # or www.astaro.com/.../astaro-license.txt
    #
    # Author: Stephan Scholz <sscholz@astaro.com>
    # Maintainer: Ulrich Weber <uweber@astaro.com>
    #
    ##############################################################################

    PATH=/sbin:/bin:/usr/sbin:/usr/bin
    PNAME="DHCP Daemon"
    PROG="dhcpd"
    NOSELFM="/etc/no-selfmonitor/dhcpd"
    CHROOT="/var/chroot-dhcps"

    function usage() {
            echo "Usage: $0 [start|stop|restart|trigger]"
            exit 1
    }

    ret_code=0
    case "$1" in
            start)
                    echo ":: Starting $PNAME"
                    PID=`pidof $PROG`
                    if  [ ! -z "$PID" ] ; then
                            echo "   $PNAME already running"
                            if [ -e $NOSELFM ] ; then
                                    echo "no-selfmonitor file exists so deleting it."
                                    rm -f $NOSELFM
                            fi
                            ret_code=1
                    else
                            read -a INTERFACES < $CHROOT/etc/dhcpd.ifaces
                            chroot $CHROOT /usr/sbin/__dhcpd -cf /etc/dhcpd.conf ${INTERFACES[@]} >/dev/null 2>&1|| ret_code=1
                            if [ $ret_code = 0 ] ; then #only remove $NOSELFM if start succeeded otherwise selfmon will keep spamming
                                    rm -f $NOSELFM
                            fi
                    fi
                    ;;

            stop)
                    echo ":: Stopping $PNAME"
                    touch $NOSELFM
                    killproc -p /var/chroot-dhcps/var/run/dhcpd.pid /var/chroot-dhcps/usr/sbin/dhcpd || ret_code=1
                    num_try=0
                    PID=`pidof $PROG`
                    while  [[ ! -z "$PID" ]] && [[ $num_try -lt 40 ]]
                    do
                            echo "   $PNAME still running"
                            sleep 0.25
                            ((num_try++))
                    done
                    ;;

            restart|trigger)
                    [ "$1" = "trigger" ] && [ -e $NOSELFM ] && exit 0
                    $0 stop ||  ret_code=1
                    $0 start $@ ||  ret_code=1
                    echo -e "\033[33m\033[1m:: Restarting $PNAME\033[m"
                    ;;

            *)
                    usage
                    ;;
    esac

    /var/mdw/scripts/retcode $ret_code
    exit $ret_code;
    <M> fw:/home/login #
    <M> fw:/home/login # cat /var/mdw-debug.log
    cat: /var/mdw-debug.log: No such file or directory
    <M> fw:/home/login # cat /var/mdw/debug.log
    cat: /var/mdw/debug.log: No such file or directory
    <M> fw:/home/login #

    <M> fw-320:/home/login # /var/mdw/scripts/dhcpd restart
    :: Stopping DHCP Daemon
    [ ok ]
    :: Starting DHCP Daemon
    [ failed ]
    :: Restarting DHCP Daemon
    [ failed ]
    <M> fw-320:/home/login #

  • <M> fw-320:/home/login # /var/mdw/scripts/dhcpd stop
    :: Stopping DHCP Daemon
    [ ok ]
    <M> fw-320:/home/login # /var/mdw/scripts/dhcpd start
    :: Starting DHCP Daemon
    [ failed ]

    <M> fw-320:/home/login # /var/mdw/scripts/dhcpd stop
    :: Stopping DHCP Daemon
    [ ok ]
    <M> fw-320:/home/login # ps aux | grep dhcp
    root     28424  0.0  0.0   5668   744 pts/0    S+   14:49   0:00 grep dhcp


    <M> fw-320:/home/login # ps aux | grep dhc
    root      7395  0.0  0.0   7352   192 ?        Ss   May19   0:03 /usr/sbin/dhcrelay -q -i lag1.500 -i lag1.2500 172.16.xxx.xxx
    root     12078  0.0  0.0   7720  2128 ?        Ss   13:55   0:00 /usr/sbin/dhclient -nw -cf /etc/lag3.conf -lf /var/db/lag3.leases -pf /var/run/dhclient_lag3.pid lag3
    root     28550  0.0  0.0   5672   744 pts/0    S+   14:49   0:00 grep dhc

Reply
  • <M> fw-320:/home/login # /var/mdw/scripts/dhcpd stop
    :: Stopping DHCP Daemon
    [ ok ]
    <M> fw-320:/home/login # /var/mdw/scripts/dhcpd start
    :: Starting DHCP Daemon
    [ failed ]

    <M> fw-320:/home/login # /var/mdw/scripts/dhcpd stop
    :: Stopping DHCP Daemon
    [ ok ]
    <M> fw-320:/home/login # ps aux | grep dhcp
    root     28424  0.0  0.0   5668   744 pts/0    S+   14:49   0:00 grep dhcp


    <M> fw-320:/home/login # ps aux | grep dhc
    root      7395  0.0  0.0   7352   192 ?        Ss   May19   0:03 /usr/sbin/dhcrelay -q -i lag1.500 -i lag1.2500 172.16.xxx.xxx
    root     12078  0.0  0.0   7720  2128 ?        Ss   13:55   0:00 /usr/sbin/dhclient -nw -cf /etc/lag3.conf -lf /var/db/lag3.leases -pf /var/run/dhclient_lag3.pid lag3
    root     28550  0.0  0.0   5672   744 pts/0    S+   14:49   0:00 grep dhc

Children
  • did a HA failover - lag3 WAN interface is working again. will fall back to the previous node later and check if the issue is also resolved on that machine.

    <M> fw:/home/login # ifconfig lag3
    lag3      Link encap:Ethernet  HWaddr 00:1A:8C:F0:22:C3
              inet addr:10.1.254.23  Bcast:10.1.254.31  Mask:255.255.255.240
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
              RX packets:1189715296 errors:0 dropped:615 overruns:0 frame:0
              TX packets:570845987 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:1567178041026 (1494577.4 Mb)  TX bytes:75816317166 (72304.0 Mb)

  • WAN was also fine after the fall back to the original firewall (rebooted it first).

    So it was some weirdness inside UTM DHCP Client