This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Categorization sometime fails -> long loading time of website

Hi,

I've activated the web filtering in transparent mode on my UTM (now 9.352-6) a half year ago. Since then I realized that different websites take a long time to load sometimes. In this case the webbrowser shows "Waiting for answer...." and a white page. After 5-8 seconds the website comes up.


Today I did some research to find the cause of this problem.


It seems that the web proxy can't categorize single objects like scripts or pictures correctly during loading sometimes. The rest of the site is categorized normaly. When this happens, the single object is categorized as "Uncategorized" and the categorisation takes a long time. Here is an example from today of an picture of a website, which is mostly categorized correctly as "Forum/Bulletin Board", and sometimes as "Uncategorized":

Categorie "Forum/Bulletin Boards" - Normal loading
2016:01:15-12:53:34 jasnet httpproxy[5217]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="192.168.10.10" dstip="144.76.168.179" user="" ad_domain="" statuscode="200" cached="0" profile="REF_HttProContaLanNetwo2 (PC)" filteraction="REF_HttCffPc (PC)" size="562" request="0xab1f000" url="www.mazda-forum.info/.../sticky.gif" referer="" error="" authtime="0" dnstime="1" cattime="33652" avscantime="2683" fullreqtime="63720" device="0" auth="0" ua="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0" exceptions="" category="159" reputation="unverified" categoryname="Forum/Bulletin Boards" content-type="image/gif"


Categorie "Uncategorized" - Slow loading
2016:01:15-12:22:47 jasnet httpproxy[5217]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="192.168.10.10" dstip="144.76.168.179" user="" ad_domain="" statuscode="200" cached="0" profile="REF_HttProContaLanNetwo2 (PC)" filteraction="REF_HttCffPc (PC)" size="562" request="0xa772000" url="www.mazda-forum.info/.../sticky.gif" referer="" error="" authtime="0" dnstime="1" cattime="6003436" avscantime="2422" fullreqtime="6028436" device="0" auth="0" ua="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0" exceptions="" category="9998" reputation="unverified" categoryname="Uncategorized" content-type="image/gif"

As a test I added the domain mazda-forum.info to the allowed websites of the web filtering policy, but same problem here:

Allowed website - Slow loading
2016:01:15-12:44:12 jasnet httpproxy[5217]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="192.168.10.10" dstip="144.76.168.179" user="" ad_domain="" statuscode="200" cached="0" profile="REF_HttProContaLanNetwo2 (PC)" filteraction="REF_HttCffPc (PC)" size="562" request="0xe0c3a000" url="www.mazda-forum.info/.../sticky.gif" referer="" error="" authtime="0" dnstime="0" cattime="6003680" avscantime="2431" fullreqtime="6028730" device="0" auth="0" ua="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0" exceptions="" content-type="image/gif"

Any idea?

Thank you

Jas



This thread was automatically locked due to age.
Parents
  • Jas, what result do you get at the command line from:

    cc get http use_sxl_urid

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Bob,
    the command shows "1".
    What have I requested with this command?

    Thank you for your help!
    Jas
  • That means you're using the SXL categorization engine.  Let's see what happens if you change back to CFF.  From the command line as root:

    cc set http use_sxl_urid 0

    /var/mdw/scripts/httpproxy restart

    Does that solve the problem?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Bob,
    yep, this seems to have solved the problem.
    I've changed to CFF an hour ago. Since then I surfed on the affected site more as I usually do. But I can't find any entries in the log file with "Category: Uncategorized" for this site.

    All in all I mean that there are less entries in the log with "Category: Uncategorized" as usual. But this is not very meaningful after one hour.

    What's next? Is it a bug which will be solved in one of the next versiond, or should I live with it? Has no other the problem?

    Thank you for your help.
    Jas
  • We discussed this here for about six months a year ago: community.sophos.com/.../22391

    Cheers - Bob
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
Reply Children
  • Thank you Bob.
    Now I've a litte more knowledge about how the categorization works.

    But I still don't understand why the site is sometimes categorized as "Unknown". Is it because the local SXL cache has no entry for the URL, and therefore the UTM must ask the cloud which not response in a reasonable time? But then disabling SXL and enabling CFFS should make it more slower, because CFFS forwards every request to the cloud. Or I'm wrong?
  • I assume that it's because categorization happens in parallel with getting links so that it's virtually imperceptible.

    Cheers - Bob
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • But for me it is perceptible. As you can see above, the same object (www.mazda-forum.info/.../sticky.gif) is sometimes categorized as "Forum/Bulletin Boards", and sometimes as "Uncategorized". And if this happen, it took about 5 seconds until the site starts to load.

    Does the categorisation evaluate the complete domain (e.g. mazda-forum.info), and all objects of the domain are categorized into the same category via the local cache? Or is every single object of the domain/site seperatly categorized via a cloud request, and only if the same object is reloaded the local cache is use? Can I see in the logs if the cloud or local cache was used?

    For me it looks like the categorisation of the domain respectively the first object of the site runs into a timeout, therefore it is identified as "Uncategorized". Then the second object gets categorized without any timeout, and the rest is categorized via the local cache.
  • The "Uncategorized" item occurred with SXL, not CFFS which was the subject of my prior post.

    To get specifics on how links are categorized, go to TrustedSource where you can experiment with simple FQDNs and with complete URLs.

    You apparently have an access to the SXL servers that isn't as fast as your access to the CFFS servers.  You can see what the categorization time is in the logs, so that should tell you if SXL used cache or a new lookup.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Again thank you, Bob.

    But I still don't understand why the same object is categorized sometimes correctly and sometimes as unknown category with SXL. Maybe it's a understanding-problem of me regarding the language. So sorry if I ask the same question again and again.

    The site TrustedSources categorized the site correctly, but it takes about 7-9 seconds. I don't know it is normal or if this long time is also the problem which I have with the UTM.

    So in my opinion it's a timeout problem.
    Here how I understand the workflow:

    1. User types in a URL to his browser and press enter.

    2. UTM receives the request and tries to find the domain of the URL in his local SXL cache.

    3. If not found, UTM sends a request for the domain to the SXL cloud server.

    4. As long as the UTM has no answer from the SXL cloud server, the website will not loaded. User see "Wait for answer..." or something else in his browser.

    5. UTM receives answer from SXL cloud server.

    6. Website starts to load and from this point, the UTM can use the local cache to categorize the rest of the requestet website.

    7. When step 4 takes to much time, the request is aborted and the categorie will be set to "Uncategorized".

    8. If the SXL request was aborted, the next request for the same domain will be not loaded from the cache (because there is no entry for it), rather it will be send again to the SXL cloud server, and the workflow begins again from step 4.

    If my understanding is correct, the cause must be one of the followings:

    - Slow or problems with the connection to the SXL cloud server (mentioned by Bob)
    - Problems with my Internet connection (DNS, packet lost or something like this)
    - generally problem of UTM (but than there must be more users with this behaviour)
  • I think SXL works like that in genereal.

    "But I still don't understand why the same object is categorized sometimes correctly and sometimes as unknown category with SXL" - No change in the underlying database, just a timeout using the SXL method.

    This has been this issue with SXL in the past. It seems to affect installations that have slower or unreliable access. CFFS apparently doesn't have this issue, Both work fine for all of my US clients.

    Cheers - Bob
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA