This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Certain domains trigger high DNSTime and AXTime resulting in proxy timing out for users.

Hey Guy’s

We have been having intermittent issues with our web proxy timing out for our users.  After digging through the logs we determined this happens when a client computer connects to certain domains.  We have identified 4 so far.  When the domain is requested the Dnstime hits 10sec for that log entry and the AXTime for the surrounding logs significantly increase.  The result is an internet outage for 5 to 10 mins.  We have blocked the 4 identified domains in the proxy and this has helped.  We still see the dnstime spike, but the axtime stays at 0.  We have an open ticket with support, but we wanted to see what you all thought about this as we wait. 

I’ll attach some screen shots also.

Cheers

 



This thread was automatically locked due to age.
Parents
  • Hi Dale,

    Hum, will not really be possible to troubleshoot latency via the forums.  I would open a support case..

    Couple things you can do tho.

    depending on your deployment mode.. set your browser to "use proxy" settings.. point the ip to the appliance.  Close/reopen the browser .. click new private tab.. then use debugger (usually f12) and go under the network resources.. 

    from there you can pull down the page and see if there are elements of said domains that are not loading, or taking a long time to load.

     

    another thing you may wish to try is querying the response time from the swa logs its self.. see here for the relevant fields to look at. http://swa.sophos.com/webhelp/swa/concepts/InterpretingLogFiles.html under the data fields. 

     

    ensure you are not using a public dns server like 8.8.8.8 or things like authentication requests may time out.

     

    in the case of an https site, you may want to run it through qualys and make sure it passes .. anything under a c- will fail on the appliance.   https://www.ssllabs.com/ssltest/

     

    Cheers

     

    a support agent will be able to do some additional testing like pulling down the site from the back end and such.

  • Hey Red_Warrior,

    Thanks for the response.  I appreciate all of the suggestions.  We actually have a open support case on this (#8679989) and have been working with them for the last few weeks as well.

    What we have figured out so far is when a computer accesses afy11.net or questionmarket.com the Access Request time's (AXTime) in the SWA logs skyrocket and the SWA basically freezes up and kicks a large number if not all users off the swa web proxy causing essentially a company wide internet outage.  Only a few computers in our environment seem to randomly be accessing these two sites. We think it's probably some sort of malware on them, but if you browse to these addresses from any computer using the swa proxy the same thing happens.  We blocked access to these web domains from within the SWA proxy. Now the Access Request time no longer skyrockets and it no longer seems affect other proxy users, but we do still see the DNS time spike in the logs for these connections.

    We have looked at the response times from the swa logs, the screenshots above are the actual logs from the swa just displayed from our log collector front end.  This allowed us to figure out what we know so far.

    Our DNS configuration is as follows: The SWA points to our internal AD DNS servers, and our AD DNS servers forward on to public DNS servers.

    I guess my ultimate question is how could accessing a web domain from a client computer cause a 10 minute company wide internet outage?  And how to prevent this from happening in the future?  It seems like the SWA should be able handle this better.

    Thanks for the assistance!

     

Reply
  • Hey Red_Warrior,

    Thanks for the response.  I appreciate all of the suggestions.  We actually have a open support case on this (#8679989) and have been working with them for the last few weeks as well.

    What we have figured out so far is when a computer accesses afy11.net or questionmarket.com the Access Request time's (AXTime) in the SWA logs skyrocket and the SWA basically freezes up and kicks a large number if not all users off the swa web proxy causing essentially a company wide internet outage.  Only a few computers in our environment seem to randomly be accessing these two sites. We think it's probably some sort of malware on them, but if you browse to these addresses from any computer using the swa proxy the same thing happens.  We blocked access to these web domains from within the SWA proxy. Now the Access Request time no longer skyrockets and it no longer seems affect other proxy users, but we do still see the DNS time spike in the logs for these connections.

    We have looked at the response times from the swa logs, the screenshots above are the actual logs from the swa just displayed from our log collector front end.  This allowed us to figure out what we know so far.

    Our DNS configuration is as follows: The SWA points to our internal AD DNS servers, and our AD DNS servers forward on to public DNS servers.

    I guess my ultimate question is how could accessing a web domain from a client computer cause a 10 minute company wide internet outage?  And how to prevent this from happening in the future?  It seems like the SWA should be able handle this better.

    Thanks for the assistance!

     

Children
  • Couple things..

    questionmarket.com and afy11.net

    I cant dig them with even googles public 8.8.8.8 dns server .. nor can I pull them down from a sohpos network or my own digital ocean droplet.   there is no resolvable dns for either domain.

    []$ wget https://questionmarket.com
    --2019-03-22 08:55:53-- https://questionmarket.com/
    Resolving questionmarket.com (questionmarket.com)... failed: Name or service not known.
    wget: unable to resolve host address ‘questionmarket.com

    $ dig questionmarket.com

    ; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> questionmarket.com
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 30167
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4000
    ;; QUESTION SECTION:
    ;questionmarket.com. IN A

    ;; Query time: 5823 msec
    ;; SERVER: 10.108.112.2#53(10.108.112.2)
    ;; WHEN: Fri Mar 22 09:03:25 PDT 2019
    ;; MSG SIZE rcvd: 47

     

    verified with google..

    $ dig @8.8.8.8 questionmarket.com

    ; <<>> DiG 9.9.4-RedHat-9.9.4-73.el7_6 <<>> @8.8.8.8 questionmarket.com
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36812
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 512
    ;; QUESTION SECTION:
    ;questionmarket.com. IN A

    ;; Query time: 6 msec
    ;; SERVER: 8.8.8.8#53(8.8.8.8)
    ;; WHEN: Fri Mar 22 09:04:08 PDT 2019
    ;; MSG SIZE rcvd: 47

     

     

    another note.. if these are internal (or similar traffic)

    internal -> internal traffic should never be directed to the SWA .. as it sends all requests to the gateway.. you would be essentially sending internal requests to the external gateway and then it would come back into the network.

     

    as for the swa locking up or "kicking people" off the network.. the swa is just a proxy.. I would need to look into the configuration and deployment mode configuration.. as a general rule.. there is no real way the swa can cause an outage.. at worst .. if you say had it configured to explicit proxy .. and under authentication enabled "authenticate every request"  this could cause the appliance not to return any pages if say there was an ad issue and the request could not be validated.

     

    I have reviewed your case.. you will need to enable remote assistance on the appliance.. It looks like you also have a management appliance. .. at minimum please enable remote assistance on the appliance local to the issue at your site and the management appliance.

  • Hey Red_Warrior,

    I agree the domains are not resolvable, which is why I dont understand why this causes a issue... but it does.  I tested it again this morning by unblocking the afy11.net in the proxy and browsing to it.   Once again it caused an internet outage for all proxy users.  Its not a network outage it only affects internet access for internet users using the SWA as an internet proxy.  The SWA took around 25min's to stabilize.  The remote assistance has been enabled on our local production SWA.   I have a attached some screenshots from this morning, and I have the logs I can provide.  We don't use the management appliance, only the local appliance.     

    Thanks, I appreciate the assistance!  

      

  • Hi Dale,

    Ill send you a pm with some additional thoughts as they are not appropriate for forums.

     

    I have also placed a copy in your support case.