Sophos Firewall v22 EAP is now available! Click here to know more.
[I've edited to provide the final answer, so some replies, below, may no longer make total sense.]
SFOS has an edge-case defect with IPv6 PD that engineering acknowledged. I am not sure that it is being prioritized, but I will describe it here in case you run into it.
My ISP (Verizon) delegates prefixes with a 7200 (second) lifetime, with a 3600-second renewal requested. At 3600 seconds, SFOS requests a renewal, but about 50% of the time, the ISP declines the renewal. After a bit of negotiation, SFOS gets a new PD, which it starts advertising in its internal RAs.
HOWEVER, SFOS does not send an RA with a 0 lifetime for the OLD, non-renewed PD to inform internal clients that the old PD is no longer valid. So the clients continue to try to use the old (established, but not deprecated by SFOS) PD and fail, and switch to IPv4.
This SFOS behavior is incorrect according to RFC 4861 (Neighbor Discovery Protocol) and RFC 4862 (Stateless Address Autoconfiguration). This should be fixed and SFOS should send an RA deprecating the non-renewed PD -- i.e. advertise lifetime 0 so the clients will flush it.
There is actually a more fundamental bug: SFOS is given T1/T2 of 7200 when it gets a prefix delegation, but when SFOS requests renewal it supplies a T2 of 7500, which Verizon can legitimately reject by refusing to renew. So the failure to renew is actually probably Verizon's Juniper router rejecting the erroneous renewal request, not rejecting renewal. There may be other errors in SFOS' renewal request as well.
Sometimes, Verizon ignores the error or it ignores SFOS' incorrect attempt to get a new prefix (because it already has one), and this can happen for 10-12 hours at a stretch, but eventually, SFOS insists on a new prefix and gets it -- then fails to inform the clients that the old prefix is no longer valid.
Until Sophos fixes these two errors, those of us with ISPs with strict checking of delegation renewal requests -- which may be a default on Juniper routers -- major players in large ISPs -- will be unable to use IPv6. Verizon is PD-only and does not supply a GUA for the firewall itself, so if PD doesn't work, there's no IPv6.
Hi Wayne Folta The investigation with the Development team is currently in progress with ID NC-154795 with the submitted data and logs. Our Global Escalation Specialist team will provide updates on the ongoing support case based on the feedback received from the Development team.
Regards,
Vishal Ranpariya
Technical Account Manager | Global Customer Experience
Sophos Support Videos | Knowledge Base  |  @SophosSupport | Sign up for SMS Alerts |
If a post solves your question, use the 'Verify Answer' link.
Thanks! I've gathered more PCAPs at their request and submitted.
It's interesting that sometimes we get on a roll and have 8-12 hours of successful IPv6 PD renewals, so everything works. Then three out of four renewals over the next few hours fail. The bug is not that they fail -- though that could be another bug or I would not be surprised if the ISP were doing something semi-standard and causing failure -- but that the downstream clients believe the prefix has 100+ minutes of valid lifetime when they don't.
Wayne Folta That's a strange one; thanks for the additional update. Let us await the team's feedback after they have examined the PCAP and logs to determine the current status.
Regards,
Vishal Ranpariya
Technical Account Manager | Global Customer Experience
Sophos Support Videos | Knowledge Base  |  @SophosSupport | Sign up for SMS Alerts |
If a post solves your question, use the 'Verify Answer' link.
My ISP uses Juniper firewalls and there's something in the interaction between SFOS and Juniper that's causing the issue.
So the IPv6 PD lease time from the ISP is two hours. At one hour, SFOS tries to renew and gets an error back from the Juniper that basically says, "You don't have a lease, can't renew". So SFOS sends two PD lease requests. Sometimes the Juniper hears them and gives a new prefix. Sometimes it seems to not hear and it eventually broadcasts the original prefix, valid for another two hours.
So when you "get lucky" you get the same prefix back -- or maybe continue to keep that prefix. This can go on each hour for 12+ hours, so IPv6 from the clients seems to be working perfectly. Eventually, you get unlucky and get a new prefix. Unfortunately, while SFOS sees a lifetime of 0 for the lease in the initial handshake (with the error), it does not inform clients, so the clients believe they have that prefix for another hour.
The clients also see the new prefix RA, so they think they have two, but they prefer the first one -- remember, it's supposed to be good for another hour -- and IPv6 is not useable by the clients. Making IPv6 unreliable and unusable I can't tell if Juniper really does think the prefix must be insta-expired and will no longer forward traffic from that prefix, or if SFOS takes the 0 lifetime seriously and stops any traffic on the prefix. (This is an important distinction that we'll never know.)
All that to say, Sophos has investigated and decided that this is a feature request: when you get a 0 lifetime on a delegated prefix, throw an RA for that prefix with 0 lifetime to the clients so they don't use it anymore.
I think there's more to it than this, and Juniper documents seem to indicate very specific/peculiar behavior on their part when you're getting both a delegated prefix AND your own IPv6 /64 (for the firewall) and the order in which they occur. And I'm thinking that this is a timing issue that's exacerbated by the short (two-hour) lease time that my ISP is using.
I can't believe that IPv6 PD works at any large ISP -- who would probably be using Juniper now-a-days -- but evidently I'm the only person seeing it. Perhaps because my ISP recently brought IPv6 PD to my area and has the short lifetimes while they're shaking out the bugs. (Or maybe it was a while ago and they never bumped the lifetimes back up to a day or two.)
My case was escalated to engineering and they say nothing is wrong. Which I want to believe because surely you're doing IPv6 PD all over the world. And engineering was quite professional and helpful, as always.
At the same time, my ISP (Verizon) is huge and is using industrial-strength routers (Juniper) and isn't baking its own IPv6 PD solution. So how could IPv6 PD work for probably hundreds of thousands of other folks, but not Sophos-using me?
Sophos engineering had me capture a PCAP from the WAN and it shows something going on with the DHCPv6 and RAs. There are status messages saying that there is no IA-NA or prefix delegated, which my rookie reading seems to indicate that the Juniper is telling the Sophos -- in response to a renewal request -- "Nope, never heard of you, there is no prefix delegated." Which would be a mistake.
Verizon's tech support is horrible, but I finally found an expert on a forum who looked at the PCAP and says nothing unusual is going on on the part of Juniper. (Sophos says that two-hour lifetimes for leases is unusual, but...)
So I'm stuck. Sophos says it's not their DHCPv6 PD client, that Juniper is doing something weird. Others say Juniper isn't doing anything weird, Sophos is misunderstanding what's going on.
Two last facts: I did find Juniper documentation that seems to indicate that there are two transactions: one to get your /64 and one to get your /56, and if they happen in the wrong order (or something), you'll get an error. I think that's what it's saying.
Second, things can work for 12 hours at a time before breaking down and failing every hour (half-life renewal). So it could well be a race condition or something like that. Fortunately I can use IPv4, but I was hoping to switch to IPv6. Oh well.
Hi,
have you tried without PD, just plain DHCP on the WAN link?
Ian
XGS118 - v22.0 EAP
XG115 converted to software licence v21.5.0
If a post solves your question please use the 'Verify Answer' button.
Hadn't tried it. I'd have to turn on MASQ NAT for IPv6 and basically run like IPv4?
The next question is, what is the DHCP time to live rather than the PD time to live? Do they assign fixed IPv6 address ranges that you can use rather then PD assignments?
But, at least you get DHCP working and better internet access control.
Ian
XGS118 - v22.0 EAP
XG115 converted to software licence v21.5.0
If a post solves your question please use the 'Verify Answer' button.
rfcat_vk I think the issue is that Verizon does PD only. It does not allocate an IPv6 GUA to the Sophos WAN port, and even though what Verizon does is a standard way of doing things, SFOS is confused by the fact that its WAN port has a Link-local IPv6 address (not a GUA) and basically ignores the port.
I haven't figured out a way to cause all GUAs (IPv6 2000::/3) to be routed through the Verizon gateway via Link-local address. Somehow, SFOS is cool with (sometimes) getting PD via the link-local address but it: a) doesn't display it in the GUI, and b) doesn't allow me to route anything through it.
I tried setting up a static IPv6 route, but it gets an error because SFOS misunderstands link-local addresses and treats its link-local address as /64 rather than the /10 that it is. So it gives an error when I try to put the ISP's link-local address. (Which is in the same /10, by definition, but not in the same meaningless /64)
I tried setting up an SD-WAN route and it does see some traffic go across it -- though I suspect this is only the Gateway check to make sure it's up.
Maybe I'm just doing this wrong, but I think Sophos only handles three out of four IPv6 setups and doesn't work properly for a situation where the ISP provides PD-only.
Interesting answer thank you.
The three ISP/RSPs I have used in Australia all provide a /64 for the WAN link and link local gateway address, that was why I asked about DHCP.
Ian
XGS118 - v22.0 EAP
XG115 converted to software licence v21.5.0
If a post solves your question please use the 'Verify Answer' button.
I've rewritten my original post so it's more accessible.
SFOS has two bugs that are causing my issue, one of which was acknowledged by engineering.
1. If SFOS can't renew the PD and negotiates a new one, it is supposed to send an RA with the old PD and a lifetime of 0 to deprecate that PD for clients so they would no longer use it. SFOS doesn't do this, so the clients think the PD is valid and IPv6 files.
2. When renewing the PD, which Verizon had insisted had T1 and T2 lifetimes of 7200 (seconds), SFOS requests a T2 lifetime of 7500 seconds, which is improper. So Verizon may be rejecting the renewal request, but not invalidating the PD itself. This jibes with my observation that sometimes things work for 12 hours.
Engineering found issue #1 during their debugging, though I don't know if they're prioritized fixing this.
I also think SFOS doesn't fully support PD-only IPv6, where the only gateway address you have is link-local. The basics work, but you get weird inconsistencies in the GUI and your ability to set certain parameters, etc.
When you way a /64, you mean a routable (GUA) address, or a link-local address (which I understand are actually /10) that the Sophos assumes is a /64. Is that why you've used NAT in the past for IPv6: You have internal IPv6 assigned however it is you want, and you NAT them to the /64 range of your Sophos address?
I ask because if I do an ifconfig on the Sophos, it lists its own link-local IPv6 as /64. (Which caused one workaround that I wanted to use to not work, since if you look at link-local's as /64, the Sophos and the ISP gateway appear to be on different subnets.
LuCar Toni I've updated my original posting with two confirmed bugs with SFOS (v21.5, but in all prior as far as I know) handling of IPv6 PD. I've also posted them as Feedback from the firewall itself. I'm not sure that opening a ticket will help since I've previously had a ticket, sent PCAPs, etc, and the first bug was acknowledged by engineering -- though I am not sure that it was entered as a bug or prioritized or not. The second bug would be new.
Actually, there's also a third potential issue: if SFOS jumps the gun slightly and asks for a renewal at 3590 seconds (rather than 3600), as I understand it, the ISP could reject the renewal request in a similar manner to what we're seeing. Since the PD is sometimes stable for up to 12 hours, this could be a race condition and the ISP's Juniper router is being a bit pedantic -- even though it's allowed to reject an early renewal.
There are two potential reasons for the renewal rejection: 1) literally the ISP is withdrawing the lease, or 2) the ISP is rejecting the renewal request because it is improper (7500 time to live, early, etc) and the delegated prefix is still valid, and could be renewed if the renewal request were proper.
There are also some interface (CLI, GUI) issues, I think, in SFOS when the ISP is PD-only and uses local-link addresses for communication between ISP router and SFOS. No GUA IPv6 is assigned to either, which I think SFOS at various places doesn't like. (And I think that SFOS may also think a link-level address is a /64 -- at least ifconfig does -- when in fact it's a /10.)