could you pls explain how to block uncategories wesites for particular users
This thread was automatically locked due to age.
could you pls explain how to block uncategories wesites for particular users
Hi, Kiran, and welcome to the UTM Community!
There is a trick with "Uncategorized" as there is a Category named "Uncategorized" and there are sites that are "Uncategorized" because no Category has been determined for the site.
Cheers - Bob
I have to quibble Bob. I just rechecked and I have only one type of Uncategorized in the administrative interface for web proxy
Perhaps you were remembering that there are options for both "Uncategorized" and "Categorization Failure". The latter would mean that the Categorization database is not reachable.
Again, guys, there is a sub-category "Uncategorized" that you can select when you define a new Category on the 'Categories' tab in 'Web Filtering Options'. This is an assigned category, not the lack of any category assignment.
At the bottom of the Categories section of a Filter Action, you can select to Allow 'Uncategorized websites'. These are sites that have had no category assigned. They haven't even been assigned 'Uncategorized'. Perhaps this is what you mean as "categorization failure," Doug?
At least, that's how I've understood and used these concepts - I'd be happy to have my knife sharpened though!
Cheers - Bob
Well Bob, that is probably a bug and it probably has existed since... at least 9.0 and maybe even before that. :)
Although I cannot be sure without testing, I believe that if you create a Category Group containing "uncategorized" it will do nothing.
I don't think it's a bug, Michael, unless Sophos doesn't have an "Uncategorized" category in it's SXL database like CFF does. I guess that's what you're telling me - my understanding no longer applies to the UTM as CFF is no longer used?
Hmmmm, in my lab, I still get 0 when I do:
cc get http use_sxl_urid
Cheers - Bob
AFAIK both CFF and SXL have the same thing. A single "response from the server" that is a hexadecimal number (0xFF) that maps to the the word "Uncategorized". AFAIK, the underlying database does not have an "explicitly defined" Uncategorized that is seperate from the "I have no information" Uncategorized. Or if it does, I don't think that gets passed back to the client in either CFF or SXL. Regardless the httpproxy only has a single Uncategorized concept. The fact that there are two ways to configure it in the UI is a problem and I don't know what the system does if there is a conflict. I suspect the one in the Category Group is ignored.
Of course, I could be wrong.
cc get http use_sxl_urid
cc get http sc_local_db
If use_sxl_urid is 1 then it will use SXL
If use_sxl_urid is 0 AND sc_local_db is none then it will use CFFS
If use_sxl_urid is 0 AND sc_local_db is mem|disk then it will use CFF with a local database
Fun Fact:
We are currently updating the TrustedSource SDK that the SXL servers use to the latest version.
In an upcoming UTM release we will be updating the TrustedSource SDK that the local CFF database uses.
We are not going to update the SDK that the CFFS servers use.
Therefore SXL users will get very slightly better categorization automatically as soon as we finish the SXL server. Local db will get very slightly better categorization when they upgrade to the new version. Anyone still using CFFS (which is anyone on 9.0 or earlier, or anyone who has manually changed their settings) will have the same categorization.
By "slightly better" I mean less than 0.01% improvement. :)
Just revisited this to understand your point. I was burned on the sub-category / super-category distinction. My "Uncategorized" sub-category is in the "web filtering problems" super-category, and only super-categories appear in the Filter Action menus.
In the logs, I have all of the folllowing CategoryCode/Category pairs from the web logs:
9998 - Uncategorized
9998,9998 - Uncategorized,Uncategorized
9999 - Categorization failed
The "9998,9998" entry is the most common. For three days of logs:
Category# Log Entries
9998 94
9998,9998 52,239
9999 43
(out of 1,668,457 total log entries)
Many of these entries appear to be categorized sites that become uncategorized when the path gets long, and McAfee shows most of them with valid categories. It would help to know which Uncategorized code is which.
Support had led me to believe that the new search engine was going to be a fix for Uncategorized sites. Michael Dunn's comment is discouraging, as it suggests that there may be no fix.
That seems odd to me. I just double-checked, I have no categorization problems with URLS that are 5K in length (for most of the internet 4K is the practical limit).
I would not expect to see sites becoming uncategorized, or having failures, due to the length of the URL.
Can you please give some examples from the log?
Can you please tell me the output of
cc get http sc_local_db
cc get http use_sxl_urid
Does the UTM need to use an upstream/parent proxy?
# cc get http sc_local_db
none
# cc get http use_sxl_urid
1
The plot thickens. As I analyzed my data, I found that more than 50% of the entries were for https. This is weird because nearly all of my users use Standard Mode with ADSSO on and HTTPS inspection disabled. So I excluded the small amount of traffic on other profiles, then verified that all of the entries were identical on other key parameters: itmid = 0060 (unauthorized category), category="9998,9998", http status code = 403, method=CONNECT and error="". As you know, CONNECT entries have no URL path, so my theory about path complexity is demolished.
I picked the top 50 by record counts. One was an invalid url related to autodiscover, and only one was not classified by McAfee. The remaining 48 fqdns represented just barely less than 30,000 log entries. One was classified as spyware/keylogger, which validates our decision to block anything UTM says is unclassified. I tested McAfee with https specified to be sure that the protocol did not distort the results.
Here are the top 10
logx.optimizely.com
k.streamrail.com
geo3.ggpht.com
ioms.bfmio.com
go.trouter.io
sp.analytics.yahoo.com
cdn.odc.officeapps.live.com
tracker.departapp.com
uscollector.tealeaf.ibmcloud.com
ecn.t2.tiles.virtualearth.net
For a second pass, I focused on the http uncategorized. Could not find any where the fqdn was sometimes allowed and sometimes blocked, based on the path, further obliterating my path-dependent theory. Here are the top 10 from my sample data
http://ssl.cdn.turner.com
http://static.criteo.net
http://fan.api.espn.com
http://www4.assets-gap.com
themes.googleusercontent.com
http://cnn.bounceexchange.com
http://bea4.cnn.com
http://w88.espn.com
http://colrep.sitelabweb.com
http://ml314.com
I said earlier that they are changing the SDK on the cloud servers. I think they made a mistake causing uncategorized.
Issue: subdomains are uncat even if the main domain is categorized. Therefore bounceexchange.com is categorized but cnn.bounceexchange.com is not.
Thank you very much, not more need for you to look at uncat.
Can you give me any examples for categorization failed or category=9999
I said earlier that they are changing the SDK on the cloud servers. I think they made a mistake causing uncategorized.
Issue: subdomains are uncat even if the main domain is categorized. Therefore bounceexchange.com is categorized but cnn.bounceexchange.com is not.
Thank you very much, not more need for you to look at uncat.
Can you give me any examples for categorization failed or category=9999
I am thrilled that the problem is identified and a fix is likely to be straightforward. Please let me know when the change is complete. I want to process the URL that McAfee said was unclassified to verify that it flows through from them to you to my UTM.
The other lists, which were short, have been sent by PM.
Correction.
There are no recent changes to the sever. The problems that Douglas are experiencing are longstanding issues. They will be resolved when the servers get updated to the new SDK. I don't have a timeline on that, but I will post here when it occurs.