This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Evaluating Acceptable Use browsing

What is the best way to evaluate web filtering logs to evaluate whether employees are outside "Acceptable Use" policy.   For example, is the employee spending time on Facebook, going to sites that have icons for "Follow us on Facebook" (without clicking on the link), or going to sites that support OAuth logins using Facebook.   (In parallel, I am also trying to permit OAuth logins to Facebook without allowing usage of Facebook.)

For employee counseling about "Acceptable Use", I need to be able to distinguish between things the user "chose" (by typed entry or clicked link) and things that happened without his knowledge, either as part of browser overhead or embedded content elements.

I already parse my logs into a SQL database so that I can select all records on a single user on a specific date range, and I find adjacent items with the same request#, on the theory that the first entry is relevant, but any items after the first are user-selected.  I am also trying to understand how to use Refer-From to evaluate the data correctly.  But think I am still seeing a lot of clutter for which the user is not accountable.

Since Chrome 58, I also have been running without HTTPS inspection, and it is clear that the logs become much less useful with it disabled -- only the connect action is logged, and only the FQDN of the server is identified.

Today, I started looking for a way to track time by application, but the Application Control feature only provides allow/block, not quota.  The quota feature only shows total quota time used. I would need quota time by application or FQDN for this purpose.

Has anyone solved this?



This thread was automatically locked due to age.
Parents
  • After another year of experimenting, I have a strategy that works for me.

    1) As I have posted elsewhere, I use both web filter proxy modes in parallel.   Standard Mode with Authentication picks up browser activity, Transparent Mode without authentication picks up fat client and operating system activity.   This partitions the workload.

    2) I only consider passed traffic (itmid='0001'), because we block web ads that are not user requests, and I assume the users will not repeatedly request stuff that is always blocked.   You could include 'warned and proceeded', but we warn so little that this is not important to me.

    3) I exclude known overhead:  crl.* (assumed to be certificate revocation list checks), api.* (assumed to be browser features like search suggestions), and iecvlist.microsoft.com

    4) HTTPS-without-inspection logs a singe entry at the end of the session, but the size field represents all activity during the session.   This means that summing on size is a consistent measure of user activity.   Counting records is not reasonable because it would mix session records (for https-without-inspection) with page records (for http and https-with-inspection)

    Then I sort the top N entries for the specified user and time range.

    For another level of aggregation, I sometimes drop the host name and aggregate on the domain name only.

    All of this is simplified because when I load my log data, I break the URL into 4 pieces:   protocol, FQDN, path, and querystring. 

     

     

Reply
  • After another year of experimenting, I have a strategy that works for me.

    1) As I have posted elsewhere, I use both web filter proxy modes in parallel.   Standard Mode with Authentication picks up browser activity, Transparent Mode without authentication picks up fat client and operating system activity.   This partitions the workload.

    2) I only consider passed traffic (itmid='0001'), because we block web ads that are not user requests, and I assume the users will not repeatedly request stuff that is always blocked.   You could include 'warned and proceeded', but we warn so little that this is not important to me.

    3) I exclude known overhead:  crl.* (assumed to be certificate revocation list checks), api.* (assumed to be browser features like search suggestions), and iecvlist.microsoft.com

    4) HTTPS-without-inspection logs a singe entry at the end of the session, but the size field represents all activity during the session.   This means that summing on size is a consistent measure of user activity.   Counting records is not reasonable because it would mix session records (for https-without-inspection) with page records (for http and https-with-inspection)

    Then I sort the top N entries for the specified user and time range.

    For another level of aggregation, I sometimes drop the host name and aggregate on the domain name only.

    All of this is simplified because when I load my log data, I break the URL into 4 pieces:   protocol, FQDN, path, and querystring. 

     

     

Children
No Data