This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Random disconnection from the Internet: possibly from UTM?

Greetings everyone,

I've been working hard on a company network problem and need some advice.

Our clients are experiencing random drops off the Internet. It's like DNS stops resolving websites. When a client experiences this I can ping inside the network, and also the LAN interface of the UTM, but not out to the Internet.  Something is randomly blocking them from getting online.

This sometimes affects all users... but mainly randomly one or two at a time.

Any advice would be greatly appreciated.



This thread was automatically locked due to age.
Parents
  • How long has this been an issue? 

    Did you check out BAlfson's Rulz sticky post for DNS related issues? 

    Did it start when you implemented the UTM? 

    Did it start after some kind of update to the UTM or hardware update by the ISP? 

    What type of connection do you have through the ISP?

    Have you checked any of the UTM logs to locate any errors in the logs? 

    Did you check the Kernel message log for anything 'e1000' related (this could be a NIC issue)?

    What changes have you made to the UTM configuration if any?

    Do you have the correct UTM appliance to handle the # of clients and internet traffic?

    I have some questions, as you can tell.  Slight smile

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • Thank you for your reply Amodin,

    How long has this been an issue? 

         It started yesterday morning.

    Did you check out BAlfson's Rulz sticky post for DNS related issues? 

         I have in the past, I will review them again.

    Did it start when you implemented the UTM? 

         No, we've had Sophos UTM's for years.

    Did it start after some kind of update to the UTM or hardware update by the ISP? 

         No update to UTM, but we did upgrade our Internet speed.

    What type of connection do you have through the ISP?

         1Gbps Fiber

    Have you checked any of the UTM logs to locate any errors in the logs? 

          Yes, I haven't found any errors    

    Did you check the Kernel message log for anything 'e1000' related (this could be a NIC issue)?

         Yes, none found, but quite a few "IPv4: martian source" entries

    What changes have you made to the UTM configuration if any?

        None

    Do you have the correct UTM appliance to handle the # of clients and internet traffic?

       Yes

  • Guys,

    Great job of posting a problem, asking the range of questions needing an answer and responding.

    I'll make a WAG and blame the ISP's equipment.  Does doing #7.7 in Rulz (last updated 2021-02-16) resolve this?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Thank you Bob for your reply, I will review. 

    I called support last night for assistance. The conclusion was that since I was able to connect a laptop directly to the LAN interface and access the Internet that the UTM wasn't to blame. 

    I replaced our main Cisco smart switch (that helped), but after logging onto our Remote Desktop server it was not resolving websites, I switched from Google DNS forwarders to my ISP DNS and it immediately resolved.  Lots of weird little network issues like displayed "Limited" access on our server NIC team, but I have gigabit speeds.

    I seem to be finding perhaps symptoms of a problem. I'm at work today searching for something plugged into the network that maybe causing these issues. I'll also look Rulz 7.7.

    Thanks very much, I'll report back later!

  • Right now none of my stand alone PC's and Laptops can access the Internet. Our Remote Desktop server clients are online and things seems fine.

Reply Children
  • Have you tried using 1.1.1.1 as your DNS?   Flush your DNS on UTM any time you make a DNS change as well, it will help you with resolving issues of course.

    Martian sources are usually something unreliable/unrouteable sources.  I am wondering if you have something rogue on your LAN or a weird config in a switch, one of those specific PCs or Laptops causing the issue, or bad port causing a problem - maybe even a duplicate MAC address somewhere - as weird and rare as it is, I've seen a total of five times where I have found an identical MAC on NIC cards.  One of the issues we found in the LAN caused a broadcast storm in a Cisco environment saturating a 2GB backbone, and it was a Cisco design flaw for supervisor modules. You may have a LAN configuration that just doesn't work.

    One of the methods to test the PC/Laptop issue would be to have all those machines off except your RDP servers (because you said it was working, but clients were not), then start booting those PCs/Laptops up one a time and testing your connectivity after each one.  If you get to a point where either a PC or Laptop is booted on and ready, and you lose internet connectivity, shut it down, then test again. 

    When you experience the outage on the LAN side, are you able to test anything on the WAN side of things?  For instance, can you access a webpage that you host, or be able to access VPN from the outside?  How long does the internet stay inaccessible to you?  Is there any monitoring you can toss in there on the LAN, even something like PRTG for a bit and see if that helps narrow things down?

    Thanks for answering my questions BTW.  ;)

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • Good morning, thank you for suggestions. I'm on day (5) with same problem. I've got to start over and find what I've missed.

    After watching Wireshark and UTM logs all day yesterday I'm considering this being a DNS issue.

    We use the UTM as our DNS Proxy with Google DNS as Forwarders. Is his normal behavior for data in DNS Proxy log? It doesn't look like a lot of resolution is going on...

    2021:03:30-09:54:27 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:31 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:37 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:39 chevy-fw/chevy-fw named: Last message 'resolver priming que' repeated 2 times, suppressed by syslog-ng on chevy-fw.dwaynelane.com
    2021:03:30-09:54:39 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:41 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:43 chevy-fw/chevy-fw named: Last message 'resolver priming que' repeated 1 times, suppressed by syslog-ng on chevy-fw.dwaynelane.com
    2021:03:30-09:54:43 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:44 chevy-fw/chevy-fw named: Last message 'resolver priming que' repeated 1 times, suppressed by syslog-ng on chevy-fw.dwaynelane.com
    2021:03:30-09:54:49 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:52 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:54 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:54:58 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:55:00 chevy-fw/chevy-fw named: Last message 'resolver priming que' repeated 1 times, suppressed by syslog-ng on chevy-fw.dwaynelane.com
    2021:03:30-09:55:00 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:55:03 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:55:04 chevy-fw/chevy-fw named: Last message 'resolver priming que' repeated 1 times, suppressed by syslog-ng on chevy-fw.dwaynelane.com
    2021:03:30-09:55:05 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:55:07 chevy-fw/chevy-fw named: Last message 'resolver priming que' repeated 2 times, suppressed by syslog-ng on chevy-fw.dwaynelane.com
    2021:03:30-09:55:07 chevy-fw named[5421]: resolver priming query complete
    2021:03:30-09:55:09 chevy-fw/chevy-fw named: Last message 'resolver priming que' repeated 1 times, suppressed by syslog-ng on chevy-fw.dwaynelane.com
    2021:03:30-09:55:09 chevy-fw named[5421]: resolver priming query complete
  • Yes, those are DNS issues.  I would change from Google DNS to OpenDNS (1.1.1.1/1.0.0.1) and see what happens.  For testing purposes at least.  I believe those are forwarding query errors.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • I changed DNS Forwarders to Open DNS, I'll keep monitoring logs.  What is a "revolver priming query?"

  • Our Sophos seems to be blocking some DNS traffic, after I switched to Open DNS I saw these packets were blocked.

  • I put these two rules in UTM and webpages started loading. We didn't need to do this before I'm not sure what changed. Our maybe I was supposed to have these all along. Allow DNS traffic to and from UTM and DC. The big test will be in the morning when the troops arrive.

  • Well OpenDNS is Secure DNS.  I don't know if Google DNS uses that same tech or not, and not all sites use SecDNS yet, but it helps against a lot of the DNS maliciousness that is happening.  That TCP/853 traffic you see on your list is Secure DNS port.  The ICMP of course is ping.  Blocking traffic isn't all bad, it's the purpose of UTM.  Slight smile  You can at least see what is being blocked and while some of it of course you use, others you won't use.

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • It's a bind-9 bug I believe, or it's an issues with a recursive server having incorrect records in cache.  I located this article online when looking around:  Why is BIND re-priming the roots from hints more often than it should? - BIND 9 (isc.org)

    OPNSense 64-bit | Intel Xeon 4-core v3 1225 3.20Ghz
    16GB Memory | 500GB SSD HDD | ATT Fiber 1GB
    (Former Sophos UTM Veteran, Former XG Rookie)

  • Not sure I'm comfortable with those new firewall rules, Sean.

    One of the first rules that the installation process creates is one like 'Internal (Network) -> DNS -> Any : Allow', so 12 is likely redundant.

    13 likely has no effect and could leave you open to DNS poisoning by someone spoofing one of those IPs.  The UTM's connection tracker takes care of allowing inbound responses to outbound requests that it allowed.

    In addition to Rulz, you might also be interested in DNS best practice.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA