This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Unable to obtain DHCP lease through VDSL 2Wire router with UTM at or above 9.354-4 unless using gigabit switch

I have a problem with my UTM in conjunction with my ISP's 2Wire 3801HGV whenever the UTM gets updated to 9.354-4 or higher whereas the 2Wire's DHCP ACK response seems to not be recognized by the UTM.

I am asking the community what steps I should do next? Get my 2Wire 3801HGV replaced with the current model 5168 so I can go back to DMZPlus mode - and just assume the 3801HGV hardware is defective? Has anyone heard of using a managed gigabit switch to allow DHCP ACKs to be seen by the UTM? Why does the DHCP ACK problem seem tied to UTM 9.354-4 ?

UTM 9.354-4 was released Feb 2, 2016 according to https://blogs.sophos.com/2016/02/17/utm-up2date-9-354-released which has a fix for 36136 ISC DHCP security update (CVE-2015-8605) which according to kb.isc.org/.../AA-01334 is a fix for "A badly formed packet with an invalid IPv4 UDP length field can cause a DHCP server, client, or relay program to terminate abnormally."

Background:

In early morning July 7, my ISP's 2Wire 3801HGV (p/n: 4201-030001-000 first used Dec 2, 2012, software version 6.3.7.55-enh.tm) started to not serve my UTM 9.403-4 WAN port with a DHCP assigned IP. My 2Wire was set for DMZPlus mode so that all traffic gets forwarded to the WAN port of the UTM plugged into the LAN port 1 of the 2Wire. According to the UTM system log, prior to 2am, the lease renewal time was 4.5 minutes. Afterwards at 2:30am, the 2Wire suddenly changed to not use DMZPlus mode (and so the UTM WAN assigned IP became the 2Wire internal nonroutable NAT'ed IP), lease renewal time was 12-15 seconds until 2:34am when it returned to 3.9 minutes and the UTM WAN got an external routeable WAN IP. At 2:43am, the 2Wire sent a DHCPNAK and then the UTM requested and got an internal nonroutable IP so DMZPlus mode was again no longer active. Eventually, during the night, the 2Wire got back to DMZPlus mode but....

There was a pattern where the 2Wire would give the UTM an external IP (via DHCP) and all was well for an hour. Then the 2Wire would refused to give the UTM an IP for 35-40 minutes. Then the UTM would get the IP address and all was well again for an hour. I installed pfsense instead of UTM, and even swapped out the UTM for off the shelf router like an Asus RT-N12 running Tomato 1.28 and an Apple Airport Base Station. All exhibited the problem where:

1. DHCP Discover is broadcast by the DLink card on the wan port of the UTM.
2. DHCP Offer is broadcast by the 2Wire box.
3. DHCP Request gets repeatedly broadcast by the DLink card on the wan port of the UTM.
4. Step 1-3 gets repeated for 40 minutes as there is no response (DHCP ACK) from the 2Wire.
5. Finally, 2Wire processes the DHCP Request and responds with DHCP Ack and life goes on as the UTM binds to the offered IP.

I was able to make my observations because pfSense has a packet capture mode (and using Wireshark to view the capture) that can be set in the web gui for the WAN port.

2Wire Dumb Mode (no longer a router):

On July 25, I made my ISP realize that their 2Wire box was not working properly in DMZPlus mode and they made the 2Wire dumb (so no more DMZPlus mode and the 2Wire would no longer be a router but allow the 2Wire LAN ports to connect directly to the DSLAM). I restored the UTM back to 9.351-3 from my burned DVD and restored the config from May. Everything seemed great until...

Upgrade to UTM 9.354-4 or higher:

I discovered that when I upgraded to UTM 9.354-4 or higher that the Sophos UTM would no longer get an IP address from the dumbed down 2Wire.

UTM system log:
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 7
dhclient: No DHCPOFFERS received.

UTM hardware swap:

I removed the UTM (dual core 2 with 8GB RAM and 4 nics, with the wan nic being a Dlink DGE-530T v11 NIC card that was working great until July), thinking the wan nic was faulty, and put into its place my older UTM box (Pentium 4 with 4GB RAM and 5 nics, with the wan nic being a Intel Pro 100, which was used for years and was still at 9.351-3). Everything worked great until I upgraded the old UTM box to 9.354-4. Then I could no longer get a WAN IP from the 2Wire.

Managed gigabit switch between the 2wire and UTM WAN port:

I connected a managed Netgear GS105E gigabit 5 port switch (with the idea that I would port mirroring and wireshark the 2Wire port to observe packet traffic) so the 2Wire was wired to the switch, and then the switch wired to the WAN port of the UTM. The UTM instantly got an IP address when connected this way. If I used a dumb gigabit switch or 10/100 switch, I would get the same results as if I don't use a switch - which is the UTM WAN port cannot get an IP.

UTM system log after using the gigabit switch between the 2Wire and the UTM WAN port:
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5
dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
dhclient: DHCPOFFER from 207.xxx.xxx.1
dhclient: DHCPACK from 207.xxx.xxx.1
dhclient: bound to 207.xxx.xxx.170 -- renewal in 36323 seconds.

I eventually realized that the 5 port gigabit switch was somehow fixing the problem as DHCP ACK problem never happened when I was using the switch - regardless of UTM version. It made no sense to port mirror the 2Wire because the problem could never be recreated while using the switch.

Using the UTM tcpdump command (sticking with the older P4 4GB RAM UTM box now at the latest 9.408-4), I was able to get wireshark capture files of when I renew the IP (in the UTM, under Interfaces & routing: Interfaces), and when I remove the gigabit switch from the equation.

Renew IP with gigabit switch in play:

tcpdump -ni eth1 port 67 or port 68 -s0 -w - > /home/login/capture_gigabitsw_renew.pcap
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C2 packets captured
2 packets received by filter
0 packets dropped by kernel

UTM System log:
2016:11:11-01:36:44 router dhclient: Killed old client process
2016:11:11-01:36:48 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-01:36:48 router dhclient: DHCPACK from 207.xxx.xxx.1
2016:11:11-01:36:50 router dhclient: bound to 207.xxx.xxx.170 -- renewal in 42778 seconds.

Wireshark showed the DHCP ACK packet and the UTM showed that it was received and applied (bound).

No gigabit switch between WAN and 2Wire - cable swapping - no device reboots:


router:/home/login # tcpdump -ni eth1 port 67 or port 68 -s0 -w - > /home/login/capture_withoutgigabitsw_cableswap.pcap
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C12 packets captured
12 packets received by filter
0 packets dropped by kernel

UTM System log:
2016:11:11-02:04:28 router dhclient: Killed old client process
2016:11:11-02:04:31 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:04:33 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
016:11:11-02:04:38 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:04:40 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-02:04:52 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5
2016:11:11-02:04:57 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 10
2016:11:11-02:05:05 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:05:07 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6
2016:11:11-02:05:13 router dhclient: No DHCPOFFERS received.
2016:11:11-02:05:13 router dhclient: Trying recorded lease 207.xxx.xxx.170
2016:11:11-02:05:14 router dhclient: bound: renewal in 41074 seconds.
2016:11:11-02:05:15 router dhclient: Killed old client process
2016:11:11-02:05:18 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-02:05:24 router dhclient: DHCPREQUEST on eth1 to 255.255.255.255 port 67
2016:11:11-02:05:39 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
2016:11:11-02:05:47 router dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
2016:11:11-02:06:00 router dhclient: No DHCPOFFERS received.
2016:11:11-02:06:00 router dhclient: Trying recorded lease 207.xxx.xxx.170
2016:11:11-02:06:00 router dhclient: bound: renewal in 41028 seconds.
2016:11:11-02:06:06 router dns-resolver[4207]: DNS server failed to contact!
2016:11:11-02:07:07 router dns-resolver[4207]: DNS server failed to contact!

Wireshark showed that there was 2 DHCP Offers being sent out by the 2Wire (or could be the DSLAM as I don't recognize the MAC address of the source) but the UTM was not seeing them. Then the UTM gave up and sent out DHCP Discover. I could see 2 DHCP ACK being sent by the 2Wire but the UTM just wanted to bind to the recorded lease which did not help. And then the UTM immediately went back to sending out DHCP Discover broadcasts and ignoring the DHCP Offers, all documented in the packet capture.

It did not help to power off the 2Wire for 2 minutes with the network cable unplugged from the UTM, then powering the 2Wire up and rebooting the UTM, and plugging back in the network cable.

Since the 2Wire is now a dumb device, I have no way of changing the speed/duplex of the port going from the 2Wire to the WAN port of the UTM. Changing the UTM WAN port speed/duplex did not make a difference.

I have tried changing the dhcp timeout to no effect in /var/chroot-dhcpc/etc/eth1.conf (WAN port) from the default of 20 seconds to 40 seconds as per: https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/76599/dhcp-issues-on-external-interface-with-isp/305423#pi2132219853=2

router/root # edit /var/chroot-dhcpc/etc/eth1.conf

interface "eth1" {
timeout 40;
retry 60;
script "/usr/sbin/dhcp_updown.plx";
request subnet-mask, broadcast-address, time-offset,
routers, domain-name, domain-name-servers, host-name,
domain-search, nis-domain, nis-servers,
ntp-servers;

}

As a last shot, (even though I do not suspect MTU is a problem in my case) I tried changing the UTM so it ignores the ISP's DHCP assigned MTU setting (MTU auto discovery set to 0) as per Bob Alfson's simplifications of Giovani's (actually Twister5800) fix: https://community.sophos.com/products/unified-threat-management/f/hardware-installation-up2date-licensing/80641/sophos-utm-9-407-3-released#pi2132219853=2

router/root # cc get_object REF_IntCabExternaWan
{
'autoname' => 0,
'class' => 'interface',
'data' => {
'additional_addresses' => [],
'bandwidth' => 0,
'comment' => 'Added by installation wizard',
'inbandwidth' => 0,
'itfhw' => 'REF_ItfEthEth1',
'link' => 1,
'mtu' => 1500,
'mtu_auto_discovery' => 0,
'name' => 'External (WAN)',
'outbandwidth' => 0,
'primary_address' => 'REF_ItfPri000024',
'proxyarp' => 0,
'proxyndp' => 0,
'status' => 1
},
'hidden' => 0,
'lock' => '',
'nodel' => '',
'ref' => 'REF_IntCabExternaWan',
'type' => 'ethernet'
}

router/root # ethtool eth1
Settings for eth1:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes

If I use a Netgear WNDR3700 running "DD-WRT v24-sp2 (04/18/14) std pm" without the gigabit switch in between, the ISP provides the Netgear a WAN IP and all is well.

If I use the dual core 2 (8GB RAM) installed with pfSense 2.3.2 amd64, I cannot get a WAN IP without having the gigabit switch in between the 2Wire and the WAN port. Packet capture seems to suggest the same behaviour: DHCP Offers being ignored.

Summary:

Should I give up and get my ISP to give me the 2Wire 5168 (the newer model)? Should I downgrade to UTM 9.353-4 where I would no longer need to use a gigabit switch in between the 2Wire and the UTM WAN port? Or should I continue to use a gigabit switch? Or should I go for an consumer based router like Netgear which seems to work fine without the gigabit switch? Is the fix for CVE-2015-8605 causing a problem with my ISP's DHCP server. This would explain why pfSense is behaving like the UTM (>9.353-4) whereas the old 2014 dd-wrt netgear has no problems. But that does not explain why the gigabit switch is mitigating the problem. What should be my next step in troubleshooting this?

Thanks for reading this ultra long post.



This thread was automatically locked due to age.
Parents Reply Children
No Data