(NEW) Video on Schemas for EDR and Data Lake (15 Min)
With the addition of the data lake a significant amount of new information is available. In this document we will discuss each of the core database schemas.
For those that simply want the schema data some links are provided below
Endpoint
Data Lake
Endpoint-OSQuery
We are currently using OSQuery version 4.5.1, in total between Windows Mac and Linux this gives you access to over 200 tables of device information. For a full discussion on the OSQuery extension I recommend starting with the OSQUERY.IO web site.
'osquery is an operating system instrumentation framework for Windows, OS X (macOS), Linux, and FreeBSD. The tools make low-level operating system analytics and monitoring both performant and intuitive.
osquery exposes an operating system as a high-performance relational database. This allows you to write SQL queries to explore operating system data. With osquery, SQL tables represent abstract concepts such as running processes, loaded kernel modules, open network connections, browser plugins, hardware events or file hashes.'
OSQuery acts as the foundation technology for issuing queries to endpoints and collecting the information back to Sophos Central (A Fleet Manager in OSQuery language). Sophos actively participates in the open source project and contributes back to the community both new capabilities (Support for Windows 7) and bug fixes.
There are a number of online locations where folks exchange queries that work with OSQuery directly and almost all of those should work fine from Sophos Central.
Each of the supported operating system has common tables that are shared with other operating systems and a number of tables that are unique to the operating system. For example all operating systems have an OSQuery table for processes, but only windows has a table for registry keys as these are unique to windows.
From central you can get a fair bit of information on exactly what version of OSQuery running on the device and information about the extensions we added as well as information on how we fill the sophos data lake.
-- Get the OSQuery information from the device
SELECT * FROM osquery_info
For those already familiar with OSquery you will notice a column you might not have expected. The epName Column is automatically added by sophos to identify the device that returned the result. We do this to allow administrator to set a highlevel filter on what devices they want to send the query to. This allows administrators to target the query to a single device or group of devices and is a common filering method with fleet managers where the admin may not want to issue a complex queries to all devices or may be issuing a query that only works on Linux and does not want to send it to devices that sill simply return a table not found error.
With this query you can also see the version information and an indication that extensions are 'Active'. This leads to some other work we have done to extend osquery.
Endpoint-Sophos Extension (Windows)
With OSQuery we have excellent information on the current running state of a device and access to some of the event and system logs. For forensics you really need to be able to go back in time and see what was happening on the device days or weeks in the past. To enable this sophos records all process activity and puts the information into a series of journals that are made available with the extension. Among other things we are recording all threads for the process and information on them like was it a remote loaded threat (Injection) as well as all changes to the registry, access to the file system and network activity. This high fidelity data allows a forensics team to see exactly how an adversary or malware took advantage of the system. You can observer the file reads made by a process over 90 days ago, observing it read a document with one thread and copy it to a zip file with another before sending the entire payload to some external C2 server.
You can also see some information on the Extension direct from a query.
SELECT DISTINCT osquery_registry.name "table", osquery_extensions.name "extension"FROM osquery_extensions JOIN osquery_registry ON osquery_extensions.uuid = osquery_registry.owner_uuid JOIN pragma_table_info(osquery_registry.name) ptiWHERE osquery_registry.registry = 'table' AND osquery_extensions.name != 'core'
This list the tables in OSQuery that are not 'core
Data Lake - Endpoint
The sophos data lake for endpoint information leverages the OSQuery ability to automatically run queries on a schedule. To fill the data lake sophos has defined 97 different queries to fill the data lake. Each of these run on their own schedule from as frequently as every 20 seconds (WIndows running processes) to as infrequently as once every 86400 seconds. (once a day). Understanding the frequency of these 'hydration' queries and how they work allows the admin to better understand what is in the data lake and why. The data lake is a subset of information from the endpoint and the best source of truth will always be the device, but if that device is not available or if the administrator is looking for commonality between multiple devices or between devices and other sensor information like the XG FW network data it is best to use a query against the data lake.
Some Notes on teh Data Lake Schema
You can run a query to get information and the SQL code used for each of the scheduled queries filling the data lake. In the example below I ran the query on windows and exported the results to get the information in a nice to read table format.
LINUX