This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unable to obtain DHCP lease through VDSL 2Wire router with UTM at or above 9.354-4 unless using gigabit switch

I have a problem with my UTM in conjunction with my ISP's 2Wire 3801HGV whenever the UTM gets updated to 9.354-4 or higher whereas the 2Wire's DHCP ACK response seems to not be recognized by the UTM.

I am asking the community what steps I should do next? Get my 2Wire 3801HGV replaced with the current model 5168 so I can go back to DMZPlus mode - and just assume the 3801HGV hardware is defective? Has anyone heard of using a managed gigabit switch to allow DHCP ACKs to be seen by the UTM? Why does the DHCP ACK problem seem tied to UTM 9.354-4 ?

UTM 9.354-4 was released Feb 2, 2016 according to https://blogs.sophos.com/2016/02/17/utm-up2date-9-354-released which has a fix for 36136 ISC DHCP security update (CVE-2015-8605) which according to kb.isc.org/.../AA-01334 is a fix for "A badly formed packet with an invalid IPv4 UDP length field can cause a DHCP server, client, or relay program to terminate abnormally."

Background:

In early morning July 7, my ISP's 2Wire 3801HGV (p/n: 4201-030001-000 first used Dec 2, 2012, software version 6.3.7.55-enh.tm) started to not serve my UTM 9.403-4 WAN port with a DHCP assigned IP. My 2Wire was set for DMZPlus mode so that all traffic gets forwarded to the WAN port of the UTM plugged into the LAN port 1 of the 2Wire. According to the UTM system log, prior to 2am, the lease renewal time was 4.5 minutes. Afterwards at 2:30am, the 2Wire suddenly changed to not use DMZPlus mode (and so the UTM WAN assigned IP became the 2Wire internal nonroutable NAT'ed IP), lease renewal time was 12-15 seconds until 2:34am when it returned to 3.9 minutes and the UTM WAN got an external routeable WAN IP. At 2:43am, the 2Wire sent a DHCPNAK and then the UTM requested and got an internal nonroutable IP so DMZPlus mode was again no longer active. Eventually, during the night, the 2Wire got back to DMZPlus mode but....

There was a pattern where the 2Wire would give the UTM an external IP (via DHCP) and all was well for an hour. Then the 2Wire would refused to give the UTM an IP for 35-40 minutes. Then the UTM would get the IP address and all was well again for an hour. I installed pfsense instead of UTM, and even swapped out the UTM for off the shelf router like an Asus RT-N12 running Tomato 1.28 and an Apple Airport Base Station. All exhibited the problem where:

1. DHCP Discover is broadcast by the DLink card on the wan port of the UTM.
2. DHCP Offer is broadcast by the 2Wire box.
3. DHCP Request gets repeatedly broadcast by the DLink card on the wan port of the UTM.
4. Step 1-3 gets repeated for 40 minutes as there is no response (DHCP ACK) from the 2Wire.
5. Finally, 2Wire processes the DHCP Request and responds with DHCP Ack and life goes on as the UTM binds to the offered IP.

I was able to make my observations because pfSense has a packet capture mode (and using Wireshark to view the capture) that can be set in the web gui for the WAN port.

2Wire Dumb Mode (no longer a router):

On July 25, I made my ISP realize that their 2Wire box was not working properly in DMZPlus mode and they made the 2Wire dumb (so no more DMZPlus mode and the 2Wire would no longer be a router but allow the 2Wire LAN ports to connect directly to the DSLAM). I restored the UTM back to 9.351-3 from my burned DVD and restored the config from May. Everything seemed great until...

Upgrade to UTM 9.354-4 or higher:

I discovered that when I upgraded to UTM 9.354-4 or higher that the Sophos UTM would no longer get an IP address from the dumbed down 2Wire.

UTM system log:
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 7
dhclient: No DHCPOFFERS received.

UTM hardware swap:

I removed the UTM (dual core 2 with 8GB RAM and 4 nics, with the wan nic being a Dlink DGE-530T v11 NIC card that was working great until July), thinking the wan nic was faulty, and put into its place my older UTM box (Pentium 4 with 4GB RAM and 5 nics, with the wan nic being a Intel Pro 100, which was used for years and was still at 9.351-3). Everything worked great until I upgraded the old UTM box to 9.354-4. Then I could no longer get a WAN IP from the 2Wire.

Managed gigabit switch between the 2wire and UTM WAN port:

I connected a managed Netgear GS105E gigabit 5 port switch (with the idea that I would port mirroring and wireshark the 2Wire port to observe packet traffic) so the 2Wire was wired to the switch, and then the switch wired to the WAN port of the UTM. The UTM instantly got an IP address when connected this way. If I used a dumb gigabit switch or 10/100 switch, I would get the same results as if I don't use a switch - which is the UTM WAN port cannot get an IP.

UTM system log after using the gigabit switch between the 2Wire and the UTM WAN port:
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPOFFER from 207.xxx.xxx.1
dhclient: DHCPACK from 207.xxx.xxx.1
dhclient: bound to 207.xxx.xxx.170 -- renewal in 36323 seconds.

I eventually realized that the 5 port gigabit switch was somehow fixing the problem as DHCP ACK problem never happened when I was using the switch - regardless of UTM version. It made no sense to port mirror the 2Wire because the problem could never be recreated while using the switch.

Using the UTM tcpdump command (sticking with the older P4 4GB RAM UTM box now at the latest 9.408-4), I was able to get wireshark capture files of when I renew the IP (in the UTM, under Interfaces & routing: Interfaces), and when I remove the gigabit switch from the equation.

Renew IP with gigabit switch in play:

tcpdump -ni eth1 port 67 or port 68 -s0 -w - > /home/login/capture_gigabitsw_renew.pcap
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C2 packets captured
2 packets received by filter
0 packets dropped by kernel

UTM System log:
2016:11:11-01:36:44 router dhclient: Killed old client process
2016:11:11-01:36:48 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-01:36:48 router dhclient: DHCPACK from 207.xxx.xxx.1
2016:11:11-01:36:50 router dhclient: bound to 207.xxx.xxx.170 -- renewal in 42778 seconds.

Wireshark showed the DHCP ACK packet and the UTM showed that it was received and applied (bound).

No gigabit switch between WAN and 2Wire - cable swapping - no device reboots:


router:/home/login # tcpdump -ni eth1 port 67 or port 68 -s0 -w - > /home/login/capture_withoutgigabitsw_cableswap.pcap
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C12 packets captured
12 packets received by filter
0 packets dropped by kernel

UTM System log:
2016:11:11-02:04:28 router dhclient: Killed old client process
2016:11:11-02:04:31 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:04:33 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
016:11:11-02:04:38 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:04:40 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-02:04:52 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5
2016:11:11-02:04:57 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 10
2016:11:11-02:05:05 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:05:07 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6
2016:11:11-02:05:13 router dhclient: No DHCPOFFERS received.
2016:11:11-02:05:13 router dhclient: Trying recorded lease 207.xxx.xxx.170
2016:11:11-02:05:14 router dhclient: bound: renewal in 41074 seconds.
2016:11:11-02:05:15 router dhclient: Killed old client process
2016:11:11-02:05:18 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-02:05:24 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-02:05:39 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
2016:11:11-02:05:47 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
2016:11:11-02:06:00 router dhclient: No DHCPOFFERS received.
2016:11:11-02:06:00 router dhclient: Trying recorded lease 207.xxx.xxx.170
2016:11:11-02:06:00 router dhclient: bound: renewal in 41028 seconds.
2016:11:11-02:06:06 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:07:07 router dns-resolver[4207]: DNS server failed to contact!

Wireshark showed that there was 2 DHCP Offers being sent out by the 2Wire (or could be the DSLAM as I don't recognize the MAC address of the source) but the UTM was not seeing them. Then the UTM gave up and sent out DHCP Discover. I could see 2 DHCP ACK being sent by the 2Wire but the UTM just wanted to bind to the recorded lease which did not help. And then the UTM immediately went back to sending out DHCP Discover broadcasts and ignoring the DHCP Offers, all documented in the packet capture.

It did not help to power off the 2Wire for 2 minutes with the network cable unplugged from the UTM, then powering the 2Wire up and rebooting the UTM, and plugging back in the network cable.

Since the 2Wire is now a dumb device, I have no way of changing the speed/duplex of the port going from the 2Wire to the WAN port of the UTM. Changing the UTM WAN port speed/duplex did not make a difference.

I have tried changing the dhcp timeout to no effect in /var/chroot-dhcpc/etc/eth1.conf (WAN port) from the default of 20 seconds to 40 seconds as per: https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/76599/dhcp-issues-on-external-interface-with-isp/305423#pi2132219853=2

router/root # edit /var/chroot-dhcpc/etc/eth1.conf

interface "eth1" {
timeout 40;
retry 60;
script "/usr/sbin/dhcp_updown.plx";
request subnet-mask, broadcast-address, time-offset,
routers, domain-name, domain-name-servers, host-name,
domain-search, nis-domain, nis-servers,
ntp-servers;

}

As a last shot, (even though I do not suspect MTU is a problem in my case) I tried changing the UTM so it ignores the ISP's DHCP assigned MTU setting (MTU auto discovery set to 0) as per Bob Alfson's simplifications of Giovani's (actually Twister5800) fix: https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/80641/sophos-utm-9-407-3-released#pi2132219853=2

router/root # cc get_object REF_IntCabExternaWan
{
'autoname' => 0,
'class' => 'interface',
'data' => {
'additional_addresses' => [],
'bandwidth' => 0,
'comment' => 'Added by installation wizard',
'inbandwidth' => 0,
'itfhw' => 'REF_ItfEthEth1',
'link' => 1,
'mtu' => 1500,
'mtu_auto_discovery' => 0,
'name' => 'External (WAN)',
'outbandwidth' => 0,
'primary_address' => 'REF_ItfPri000024',
'proxyarp' => 0,
'proxyndp' => 0,
'status' => 1
},
'hidden' => 0,
'lock' => '',
'nodel' => '',
'ref' => 'REF_IntCabExternaWan',
'type' => 'ethernet'
}

router/root # ethtool eth1
Settings for eth1:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes

If I use a Netgear WNDR3700 running "DD-WRT v24-sp2 (04/18/14) std pm" without the gigabit switch in between, the ISP provides the Netgear a WAN IP and all is well.

If I use the dual core 2 (8GB RAM) installed with pfSense 2.3.2 amd64, I cannot get a WAN IP without having the gigabit switch in between the 2Wire and the WAN port. Packet capture seems to suggest the same behaviour: DHCP Offers being ignored.

Summary:

Should I give up and get my ISP to give me the 2Wire 5168 (the newer model)? Should I downgrade to UTM 9.353-4 where I would no longer need to use a gigabit switch in between the 2Wire and the UTM WAN port? Or should I continue to use a gigabit switch? Or should I go for an consumer based router like Netgear which seems to work fine without the gigabit switch? Is the fix for CVE-2015-8605 causing a problem with my ISP's DHCP server. This would explain why pfSense is behaving like the UTM (>9.353-4) whereas the old 2014 dd-wrt netgear has no problems. But that does not explain why the gigabit switch is mitigating the problem. What should be my next step in troubleshooting this?

Thanks for reading this ultra long post.



This thread was automatically locked due to age.
  • Having the same problems on my own SOPHOS UTM. With a managed switch it works fine. Without the switch the interface stayed down en cant get an DHCP address from PPPOE server. 

     

    Sophos customer care cant solve the problem. Do you allready have an solution?

     

    love to hear from you.

     

    Stephan

  • Sounds like auto negotiate might be disabled or you are using a crossover cable?

    XG115W - v20.0.2 MR-2 - Home

    XG on VM 8 - v21 GA

    If a post solves your question please use the 'Verify Answer' button.

  • Nope, Auto negotiation is on. No cross cable used.

    Ive got a tip to set my Hardware NIC to 1000 Mbit instead of AN.

    Gonna try this in a couple a hours!

     

    Gr

    Stephan

  • I have checked the cable, cable is fine. 

    Setting Hardware on 1000 mbit is not getting it done. 

    any other toughts?

    regards, 

     

    Stephan

     

  • Hi, Stephan, and welcome to the UTM Community!

    Guys, try #7.7 in Rulz.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • As I still have no management access to the now dumbed down ISP provided vDSL modem, I am unable to follow through on Rulz 7.7 (Make the same settings on the router/switch/modem to which the interface connects.). I wonder why it seems the problem goes away if I downgrade the UTM. Time to get the vDSL modem replaced (really it should have been 6 mo ago).

  • In the months of having this trouble, I reverted the UTM hardware back to the dual core 2 with 8GB RAM and 4 nics, but with the wan nic being an Intel 82574L and not the Dlink DGE-530T v11 NIC card. It made no difference and I still required the gigabit switch intermediary to get a WAN IP address from the dumbed down 2Wire box.

    Yesterday, my ISP replaced the 2Wire with a Pace 5168N-101 (p/n: 4201-030007-000, first used Aug 24, 2017, software version 10.5.4.527158). After setting the Pace to DMZ mode (in the Pace management site at 192.168.100.254: Settings: Firewall: Allow all applications (DMZ mode)), pointing to my Sophos UTM, running 9.411-3, everything was back to normal.

    No more need for a gigabit switch between the UTM and the Pace router for the router to give my UTM a WAN IP address. No need to manipulate NIC speed/duplex on the Pace or the UTM. Looking at the Pace's Settings: LAN: WAN IP Address Allocation, I can see the Sophos UTM being treated like a DMZ device, with an address Assignment of Public (select WAN IP mapping).

    Looking at the UTM system.log, it shows the normal, ordinary task of the UTM getting its WAN IP address from the Pace. No more endless DHCP requests. Just have to change over the WAN NIC on the UTM back to the Dlink to get everything back to normal.

    Hope this helps Stephan.

    System log:

    2017:08:24-23:50:30 sajacka dhclient: DHCPDISCOVER on eth3 to 255.255.255.255 port 67 interval 8
    2017:08:24-23:50:38 sajacka dhclient: DHCPDISCOVER on eth3 to 255.255.255.255 port 67 interval 13
    2017:08:24-23:50:38 sajacka dhclient: DHCPREQUEST on eth3 to 255.255.255.255 port 67
    2017:08:24-23:50:38 sajacka dhclient: DHCPOFFER from 192.168.100.254
    2017:08:24-23:50:38 sajacka dhclient: DHCPACK from 192.168.100.254
    2017:08:24-23:50:44 sajacka dhclient: bound to 142.xxx.xx.xxx -- renewal in 255 seconds.
    2017:08:24-23:51:43 sajacka dhclient: DHCPREQUEST on eth3 to 255.255.255.255 port 67 2017:08:24-23:51:43 sajacka dhclient: DHCPACK from 192.168.100.254 2017:08:24-23:51:43 sajacka dhclient: bound to 142.xxx.xx.xxx -- renewal in 295 seconds.

  • Had the same issue on a Danish Internet connection.

    Solved it by setting a host name on the WAN interface, under Advanced. Seems the DHCP server on the other end doesn't like leases with no names attached?