The Data lake contains both endpoint and if deployed and enabled XG Firewall information.
This document covers the schema for endpoint information in the data lake.
The information in the data lake from endpoints is controlled by scheduled queries that are run against the endpoints while they are online. Each query will return information from one or more tables available to OSQuery and the Sophos Extension. When a scheduled query runs it returns the result of the query and the information is in a table format. Each column in the query will provide information for a column in the data lake.
As the information is added to the data lake we store a variety of generic information common for each of the scheduled 'hydration' queries. That information includes the name of the hydration query, time it was executed, device information that returned the result and some other booking keeping data. When searching the data lake it is valuable to understand which hydration query provided the information. The attached excel file describes the information returned from each scheduled hydration query. We can use the generic information that identifies the hydration query used as a filter when searching the data lake so for example to retrieve all entries from all devices for the current processes running on a windows device we want to filter the information on that query_name.
running_processes_windows_sophos | cmdline | string | Process command line |
running_processes_windows_sophos | file_size | long | File size now |
running_processes_windows_sophos | gid | long | Group ID (unsigned) of the user running the process |
running_processes_windows_sophos | global_rep | int | The machine learning global reputation now |
running_processes_windows_sophos | global_rep_data | string | All global reputation data |
running_processes_windows_sophos | local_rep | int | The machine learning local reputation now |
running_processes_windows_sophos | local_rep_data | string | All local reputation now |
running_processes_windows_sophos | ml_score | int | The machine learning malware score now |
running_processes_windows_sophos | ml_score_data | string | All ML score data |
running_processes_windows_sophos | name | string | Name of the registry value entry |
running_processes_windows_sophos | parent | long | Process parent's PID |
running_processes_windows_sophos | parent_name | string | Parent process name |
running_processes_windows_sophos | parent_path | string | The parent process path |
running_processes_windows_sophos | parent_sophos_pid | string | The ID of the parent process and its start time creating a unique identifier |
running_processes_windows_sophos | path | string | Full path to the value |
running_processes_windows_sophos | pid | long | Process (or thread) ID |
running_processes_windows_sophos | pua_score | int | The machine learning PUA score now |
running_processes_windows_sophos | sha1 | string | SHA1 of the file now |
running_processes_windows_sophos | sha256 | string | SHA256 of the file now |
running_processes_windows_sophos | sophos_pid | string | The process ID that produced the registry event and its start time creating a unique identifier |
running_processes_windows_sophos | time | long | Timestamp of the windows powershell event (unix epoch) |
running_processes_windows_sophos | uid | long | The local user that owns the plugin |
running_processes_windows_sophos | username | string |
Username |
SELECT
-- Device ID DETAILS
meta_hostname, meta_ip_address,
-- Query Details
time, sophos_pid, pid, name, path, cmdline, parent, parent_name, parent_path, parent_sophos_pid, username,
uid, gid, sha1, sha256, file_size, ml_score, ml_score_data, pua_score, global_rep, global_rep_data, local_rep, local_rep_data,
-- META information Common for all queries
ingestion_timestamp, schema_version, query_source, query_name, message_identifier, upload_size, host_identifier, calendar_time, unix_time,
epoch, counter, numerics, osquery_action, meta_boot_time, meta_eid, meta_ip_mask, meta_mac_address, meta_os_name, meta_os_platform, meta_username,
meta_public_ip, meta_query_pack_version, customer_id
FROM xdr_data
WHERE query_name = 'running_processes_windows_sophos'
Results would look something like what is shown below.
Note that we used a WHERE query_name 'running_processes_windows_sophos' to filter the entire XDR data lake where all columns from all scheduled 'hydration' queries are avaiable down to just information put into the data lake from the ..running_processes_windows.. query.
We have included the SCHEMA as an Excel document so that you can set filters, or use the pivot table to view just information related to a specific scheduled hydrating query.