The purpose of several Sophos Components? (Some new follow up questions re. Sophos and privacy)

Question

Please explain the purpose of the following and under what circumstances they become active. In addition, do either of these phone home? If so, with what data? 
 -SophosScanD 
 -SophosSXLD

bobcook · Answer

SophosScanD runs the scanning engine. It uses Live Protection when the feature is enabled in preferences, which means it may send encrypted information to SophosLabs for real-time AV checks. The transport is DNS. It does not do any other "phone home" type communications. Today it is only used for scanning downloads from the web, however it will start doing all scanning in the future as we rework the on-access and custom scanning code. 
 
SophosSXLD performs SXL lookups for web reputation to learn the risk level of a given URL or IP address. It communicates with the SophosLabs servers via HTTP and/or DNS in order to provide this information (the database is far too big to download to each endpoint). It does not do any other "phone home" type communications. 
 
Hope this helps, let me know if not.

bobcook · Answer

As of today, SophosScanD does not record any information about the scanning process. There are scan logs that are saved to your local computer only, never sent to Sophos unless you (as the user) send them - we sometimes ask for support reasons. Today these logs are from InterCheck and SophosAVAgent, in the future SophosScanD will do the logging itself when it does all scanning. 
 
Live Protection does send information about what is being scanned, this information is sent to SophosLabs for enhanced detection capabilities. Information sent includes the filename and metadata generated during the scanning process. The reply from the SophosLabs server indicates further "clean" or "threat" information back to the product. 
 
SophosSXLD does not record any information about completed queries. It does have, in memory, the clear text URL and IP address of incomplete (pending) requests. The reply from the SophosLabs server indicates the risk and category information of the URL or IP address. Once that information is received and applied, its deleted from memory. 
 
We never intentionally collect any personal information from your computer, although because we do collect the filename (Live Protection) or URL (Web Protection) its possible we accidentally send something personal. All other data is intentionally anonymized, although due to the nature of the internet the SophosLabs servers do "see" the public-facing IP address of the computer or network sending requests. This is unavoidable, its how computer networks work. 
 
By "transport is DNS" I meant to say that the data being sent to SophosLabs is wrapped up in a DNS request. Although DNS is typically used to convert hostnames to IP addresses, it really can be used to do any sort of lookup request - just make up "fake" names that are attempting to resolve to a domain you own, and you can make the reply meaningful. In the context of a DNS response, an NXDOMAIN response can indicate "clean" while a valid IP address (something like 127.0.0.1) might indicate "threat". This is not a new technique, its been used in the world of spam filtering for years. DNS lookups are very fast. 
 
In the context of SophosConfigD, until version 9.4 we used carefully crafted DNS requests that encoded information about how the endpoint was configured e.g. using the on-access scanner? using Live Protection? This type of information formed a long string of digits that we formed into a hostname that was part of the domain sophosxl.net, and at the SophosLabs server side we could record these fake lookups in order to count how many users used which features. Starting in 9.4 we use HTTPS rather than DNS, and the data is stored using JSON syntax rather than crafting a string of 0s and 1s as a hostname. By using HTTPS we ensure that this information is not leaked unintentionally to third parties. 
 
We intend to move all of the communications between our software and our servers to HTTPS. No more DNS and no more clear-text HTTP. Its the responsible thing to do. 
 
We do store the information sent to SophosLabs for later analysis, in aggregate. We don't intentionally create "profiles" about our customers and users. This data is useful to spot trends or understand the product's performance in the real world. For example, in the weeks after we released version 9.4, we regularly ran reports about the number of endpoints using Live Protection to report the detection of Adware / Potentially Unwanted Applications. This was a measure of product success (e.g. if we released the feature and it never reported anything we might wonder if its actually useful). Its also useful to understand the types of threats or PUAs that customers *actually* encounter (or *never* encounter). 
 
We have no way to convert the information sent to SophosLabs to an actual computer or person (aside from the underlying issue of IP address, which we really can't avoid). To be honest, its not even that interesting to look at what individuals might or might not be doing. Its definitely interesting to see trends and aggregations: for example, nearly all Home Edition users enable the on-access scanner, and more than 95% of all Home Edition users enable both Web Protection features (reputation lookups and download scanning). It tells me that the features are useful, and not getting in the way. 
 
Hope that helps.