DATA LAKE Schema - Endpoint Data

The Data lake contains both endpoint and if deployed and enabled XG Firewall information.

This document covers the schema for endpoint information in the data lake.

The information in the data lake from endpoints is controlled by scheduled queries that are run against the endpoints while they are online.  Each query will return information from one or more tables available to OSQuery and the Sophos Extension.  When a scheduled query runs it returns the result of the query and the information is in a table format. Each column in the query will provide information for a column in the data lake. 

As the information is added to the data lake we store a variety of generic information common for each of the scheduled 'hydration' queries. That information includes the name of the hydration query, time it was executed, device information that returned the result and some other booking keeping data.  When searching the data lake it is valuable to understand which hydration query provided the information.  The attached excel file describes the information returned from each scheduled hydration query. We can use the generic information that identifies the hydration query used as a filter when searching the data lake so for example to retrieve all entries from all devices for the current processes running on a windows device we want to filter the information on that query_name.

running_processes_windows_sophos cmdline string Process command line
running_processes_windows_sophos file_size long File size now
running_processes_windows_sophos gid long Group ID (unsigned) of the user running the process
running_processes_windows_sophos global_rep int The machine learning global reputation now
running_processes_windows_sophos global_rep_data string All global reputation data
running_processes_windows_sophos local_rep int The machine learning local reputation now
running_processes_windows_sophos local_rep_data string All local reputation now
running_processes_windows_sophos ml_score int The machine learning malware score now
running_processes_windows_sophos ml_score_data string All ML score data
running_processes_windows_sophos name string Name of the registry value entry
running_processes_windows_sophos parent long Process parent's PID
running_processes_windows_sophos parent_name string Parent process name
running_processes_windows_sophos parent_path string The parent process path
running_processes_windows_sophos parent_sophos_pid string The ID of the parent process and its start time creating a unique identifier
running_processes_windows_sophos path string Full path to the value
running_processes_windows_sophos pid long Process (or thread) ID
running_processes_windows_sophos pua_score int The machine learning PUA score now
running_processes_windows_sophos sha1 string SHA1 of the file now
running_processes_windows_sophos sha256 string SHA256 of the file now
running_processes_windows_sophos sophos_pid string The process ID that produced the registry event and its start time creating a unique identifier
running_processes_windows_sophos time long Timestamp of the windows powershell event (unix epoch)
running_processes_windows_sophos uid long The local user that owns the plugin
running_processes_windows_sophos username string

Username

SELECT
-- Device ID DETAILS
meta_hostname, meta_ip_address,

-- Query Details
time, sophos_pid, pid, name, path, cmdline, parent, parent_name, parent_path, parent_sophos_pid, username,
uid, gid, sha1, sha256, file_size, ml_score, ml_score_data, pua_score, global_rep, global_rep_data, local_rep, local_rep_data,


-- META information Common for all queries
ingestion_timestamp, schema_version, query_source, query_name, message_identifier, upload_size, host_identifier, calendar_time, unix_time,
epoch, counter, numerics, osquery_action, meta_boot_time, meta_eid, meta_ip_mask, meta_mac_address, meta_os_name, meta_os_platform, meta_username,
meta_public_ip, meta_query_pack_version, customer_id
FROM xdr_data
WHERE query_name = 'running_processes_windows_sophos'

Results would look something like what is shown below.

Note that we used a WHERE query_name 'running_processes_windows_sophos' to filter the entire XDR data lake where all columns from all scheduled 'hydration' queries are avaiable down to just information put into the data lake from the ..running_processes_windows.. query.

We have included the SCHEMA as an Excel document so that you can set filters, or use the pivot table to view just information related to a specific scheduled hydrating query.