This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Regarding website filtering

 could you pls explain how to block uncategories wesites for  particular users



This thread was automatically locked due to age.
Parents
  • Hi, Kiran, and welcome to the UTM Community!

    There is a trick with "Uncategorized" as there is a Category named "Uncategorized" and there are sites that are "Uncategorized" because no Category has been determined for the site.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • I have to quibble Bob.  I just rechecked and I have only one type of Uncategorized in the administrative interface for web proxy

    Perhaps you were remembering that there are options for both "Uncategorized" and "Categorization Failure".  The latter would mean that the Categorization database is not reachable.

  • Well Bob, that is probably a bug and it probably has existed since...  at least 9.0 and maybe even before that.  :)

    Although I cannot be sure without testing, I believe that if you create a Category Group containing "uncategorized" it will do nothing.

  • I don't think it's a bug, Michael, unless Sophos doesn't have an "Uncategorized" category in it's SXL database like CFF does.  I guess that's what you're telling me - my understanding no longer applies to the UTM as CFF is no longer used?

    Hmmmm, in my lab, I still get 0 when I do:

    cc get http use_sxl_urid

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • AFAIK both CFF and SXL have the same thing.  A single "response from the server" that is a hexadecimal number (0xFF) that maps to the the word "Uncategorized".  AFAIK, the underlying database does not have an "explicitly defined" Uncategorized that is seperate from the "I have no information" Uncategorized.  Or if it does, I don't think that gets passed back to the client in either CFF or SXL.  Regardless the httpproxy only has a single Uncategorized concept.  The fact that there are two ways to configure it in the UI is a problem and I don't know what the system does if there is a conflict.  I suspect the one in the Category Group is ignored.

    Of course, I could be wrong.

    cc get http use_sxl_urid
    cc get http sc_local_db

    If use_sxl_urid is 1 then it will use SXL
    If use_sxl_urid is 0 AND sc_local_db is none then it will use CFFS
    If use_sxl_urid is 0 AND sc_local_db is mem|disk then it will use CFF with a local database


    Fun Fact:
    We are currently updating the TrustedSource SDK that the SXL servers use to the latest version.
    In an upcoming UTM release we will be updating the TrustedSource SDK that the local CFF database uses.
    We are not going to update the SDK that the CFFS servers use.

    Therefore SXL users will get very slightly better categorization automatically as soon as we finish the SXL server.  Local db will get very slightly better categorization when they upgrade to the new version.  Anyone still using CFFS (which is anyone on 9.0 or earlier, or anyone who has manually changed their settings) will have the same categorization.
    By "slightly better" I mean less than 0.01% improvement.  :)

  • Just revisited this to understand your point.   I was burned on the sub-category / super-category distinction.   My "Uncategorized" sub-category is in the "web filtering problems" super-category, and only super-categories appear in the Filter Action menus.

    In the logs, I have all of the folllowing CategoryCode/Category pairs from the web logs:

    9998 - Uncategorized

    9998,9998 - Uncategorized,Uncategorized

    9999 - Categorization failed

    The "9998,9998" entry is the most common.   For three days of logs:

    Category#   Log Entries
    9998                 94
    9998,9998   52,239
    9999                 43

    (out of 1,668,457 total log entries)

    Many of these entries appear to be categorized sites that become uncategorized when the path gets long, and McAfee shows most of them with valid categories.   It would help to know which Uncategorized code is which.

    Support had led me to believe that the new search engine was going to be a fix for Uncategorized sites.   Michael Dunn's comment is discouraging, as it suggests that there may be no fix.

  • That seems odd to me.  I just double-checked, I have no categorization problems with URLS that are 5K in length (for most of the internet 4K is the practical limit).

    I would not expect to see sites becoming uncategorized, or having failures, due to the length of the URL.

    Can you please give some examples from the log?

    Can you please tell me the output of

    cc get http sc_local_db
    cc get http use_sxl_urid

    Does the UTM need to use an upstream/parent proxy?

  • # cc get http sc_local_db
    none
    # cc get http use_sxl_urid
    1

    The plot thickens.   As I analyzed my data, I found that more than 50% of the entries were for https.  This is weird because nearly all of my users use Standard Mode with ADSSO on and HTTPS inspection disabled.    So I excluded the small amount of traffic on other profiles, then verified that all of the entries were identical on other key parameters:  itmid = 0060 (unauthorized category), category="9998,9998", http status code = 403, method=CONNECT and error="".   As you know, CONNECT entries have no URL path, so my theory about path complexity is demolished.

    I picked the top 50 by record counts.   One was an invalid url related to autodiscover, and only one was not classified by McAfee.  The remaining 48 fqdns represented just barely less than 30,000 log entries.  One was classified as  spyware/keylogger, which validates our decision to block anything UTM says is unclassified.  I tested McAfee with https specified to be sure that the protocol did not distort the results.   

    Here are the top 10

    logx.optimizely.com
    k.streamrail.com
    geo3.ggpht.com
    ioms.bfmio.com
    go.trouter.io
    sp.analytics.yahoo.com
    cdn.odc.officeapps.live.com
    tracker.departapp.com
    uscollector.tealeaf.ibmcloud.com
    ecn.t2.tiles.virtualearth.net

  • For a second pass, I focused on the http uncategorized.   Could not find any where the fqdn was sometimes allowed and sometimes blocked, based on the path, further obliterating my path-dependent theory.  Here are the top 10 from my sample data

     

     

    http://ssl.cdn.turner.com
    http://static.criteo.net
    http://fan.api.espn.com
    http://www4.assets-gap.com
    themes.googleusercontent.com
    http://cnn.bounceexchange.com
    http://bea4.cnn.com
    http://w88.espn.com
    http://colrep.sitelabweb.com
    http://ml314.com

  • I said earlier that they are changing the SDK on the cloud servers.  I think they made a mistake causing uncategorized.

    Issue:  subdomains are uncat even if the main domain is categorized.  Therefore bounceexchange.com is categorized but cnn.bounceexchange.com is not.

    Thank you very much, not more need for you to look at uncat.

     

    Can you give me any examples for categorization failed or category=9999

     

  • I am thrilled that the problem is identified and a fix is likely to be straightforward.   Please let me know when the change is complete.  I want to process the URL that McAfee said was unclassified to verify that it flows through from them to you to my UTM. 

    The other lists, which were short, have been sent by PM.

  • Correction.

    There are no recent changes to the sever.  The problems that Douglas are experiencing are longstanding issues.  They will be resolved when the servers get updated to the new SDK.  I don't have a timeline on that, but I will post here when it occurs.

Reply Children
No Data