This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Website categories

I have a list of sites visited taken from Sophos UTM and TMG. I want to parse this list to identify a category for each site visited. Is there a method I can employ to produce this report?

Thanks,

Paul



This thread was automatically locked due to age.
  • Hi, Paul, and welcome to the UTM Community!

    You can do one at a time at Check Single URL, or get a free account there and check up to 100 at a time.

    In the UTM, you can see "Domains with Categories" in Reporting.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hi Bob,

    Thanks for the information and the warm welcome.

    It looks like I'm going to struggle to get the information I need. I'll see if I can compress down my list to see if I can feed it into the site 100 lines at a time. It'll still take a long time though!

    Thanks again,

    Paul

  • Do these work on your UTM?

    $ /var/storage/chroot-http/opt/ws/bin/uridquery -t /var/storage/chroot-http/opt/ws/conf/sophos.tldlist -f http://www.sophos.com/
    # /var/storage/chroot-http/opt/ws/bin/sxl2-dumper /var/storage/chroot-http/persist/sxl/sxl3_cache.dat www.sophos.com

    69 is in the output of both, which when converted from hexidecimal to decimal is 105.

    $ grep $((16#69)) /etc/surf_pro_cat.txt

    wsa.sophos.com/.../AppInterpretingASophosLog.html may give some insights into some of the uridquery results. Other files in the directories may also be of interest.

  • uridquery may work, but is not really intended for customers to use.

    The link to the log file interpretation is for the SWA product not the UTM, however some of the fields are similar.

    The easiest thing in my opinion is to put all the URLs into a file then on a box that is being proxied and then do:

    wget -i filename --spider

    You may want to throw in some timeouts as well. (eg --dns-timeout=2  --connect-timeout=2 -t 1)

    This will do a HEAD call (like GET but don't get file, only headers) for each line in the file.

    Then look at your log file to see all the results including category.

    Note that some failures like DNS on client will result in no actual attempt to proxy.  So I would take results and match them to your requests again.  Anything that did not get through proxy you will have to do manually.

  • Paul, maybe you could tell us why you decided you need this - what benefit do you hope to achieve?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA