For query assistance, please see the following Best Practices guide


One of the most frequently used queries by our threat hunting team is a flexible generic search query against the data lake.

Often you know exactly what you are looking for but sometimes you want to start from a high level view and work your way deeper.  To assist with basic investigations of the environment we provide a broad generic process search query. 

With this data lake query you can look for a variety of things either independently or in combination.   The supports 6 variables and all the conditions for each variable are used as filters for information in the datalake related to the history of running windows processes.  The data lake has a record of all processes that ran on the device for the last 7 days and with this query we are going to search that table where each of the conditions specified in the variable list is met.  So say you just want to see everything. Set the variables as shown below.  It will return a list of all processes that have run, and to simply understanding the results it will group them by command line, parent process, process, username, sha256, path, UID, file_size and Parent process Path. That way if something is happening hundreds of times you simply get a counter for the number of times it ran.  We include a variable to drop results if we have more than N identical results.  

One interesting hunting query is to look for what processes were run directly from cmd.exe or powershell.exe.  Or if a powershell.exe parent started a poweshell.exe as a child.  Of then this is a good indicator of an adversary behavior.  This query can run across the data lake for hundreds of devices and millions of process records and return a result in about 10-20 seconds.  Have fun.  Oh I include the generic network search query as well at the bottom.

Descriptive Name Variable Type SQL Variable Name Value
Command Line String $$Command Line$$ %
Device String $$Device%% %
Ignore when > N Duplicates String $$Ignore when > N Duplicates$$ 1000
Parent Process Name String $$Parent Process Name$$ %
Process Name String $$Process Name$$ %
User Name String $$User Name$$ %

-- Generic Search
-- VARIABLE:   $$Device$$                                STRING
-- VARIABLE:   $$Command Line$$                          STRING
-- VARIABLE:   $$Process Name$$                          STRING
-- VARIABLE:   $$Parent Process Name$$                   STRING
-- VARIABLE:   $$User Name$$                             STRING
-- VARIABLE:   $$Ignore when > N Duplicates$$            STRING

WITH Count_limt AS (
With DeDuplicate AS (
SELECT
   xdr_data.meta_hostname epName,
   xdr_data.meta_os_type Device_Type,
   replace(replace(CAST(from_unixtime(xdr_data.time) AS VARCHAR),'.000','Z'),' ','T') Date_Time, -- add the T to help excel understand this is a date and time
   CASE CAST(xdr_data.username > '' AS INT)
      WHEN 1 THEN xdr_data.username --|| ' : ' || xdr_data.meta_username
      ELSE xdr_data.meta_username
   END User_Name,
   xdr_data.parent_name Parent_Process_Name,
   xdr_data.name Process_Name,
   xdr_data.cmdline Cmd_Line,
   xdr_data.sophos_pid SophosPID,
   xdr_data.parent_sophos_pid Parent_SophosPID,
   xdr_data.sha256 SHA256,
   xdr_data.path Path_Name,
-- Additional Data
   xdr_data.meta_ip_address Device_IP_ADDR,
   xdr_data.meta_public_ip Public_IP,
   xdr_data.meta_mac_address Device_MAC_Address,
   xdr_data.meta_os_name Device_OS,
   xdr_data.meta_os_platform Device_OS_Platform,
   xdr_data.uid UID,
   xdr_data.file_size File_Size,
   xdr_data.ml_score ML_Score,
   xdr_data.pua_score PUA_Score,
   xdr_data.global_rep Global_Rep,
   xdr_data.local_rep Local_Rep,
   xdr_data.parent_path Parent_Path
FROM
   xdr_data
WHERE
   query_name = 'running_processes_windows_sophos' AND
   -- SEARCH AND FILTER CRITERIA
   UPPER(xdr_data.meta_hostname) LIKE UPPER('%$$Device$$%') AND
   ( UPPER(xdr_data.meta_username) LIKE UPPER('%$$User Name$$%') OR UPPER(xdr_data.username) LIKE UPPER('%$$User Name$$%') )AND
   UPPER(xdr_data.parent_name) LIKE UPPER('%$$Parent Process Name$$%') AND
   UPPER(xdr_data.name) LIKE UPPER('%$$Process Name$$%') AND
   UPPER(xdr_data.cmdline) LIKE UPPER('%$$Command Line$$%') 
)
SELECT
   array_join(array_agg(DISTINCT epName||' : '||Device_OS), CHR(10)) Device_List,
   COUNT(DISTINCT epName) Devices_With_Observable,
   MIN(Date_Time) FirstSeen,
   count(epName) Counter,
   User_Name,
   LOWER(Parent_Process_Name) Parent_Process_Name,
   LOWER(Process_Name) Process_Name,
   Cmd_Line,
   array_join(array_agg(DISTINCT SophosPID), CHR(10)) SophosPID_LIST,
   array_join(array_agg(DISTINCT Parent_SophosPID), CHR(10)) Parent_SophosPID_LIST,
   SHA256,
   Path_Name,
   UID,
   File_Size,
   MIN(ML_Score) ML_Score,
   MIN(PUA_Score) PUA_Score,
   MIN(Global_Rep) Gloal_Rep,
   MIN(Local_Rep) Local_Rep,
   Parent_Path,
   MAX(Date_Time) LastSeen
FROM DeDuplicate 
 GROUP BY
   User_Name,
   Parent_Process_Name,
   Process_Name,
   Cmd_Line,
   SHA256,
   Path_Name,
   UID,
   File_Size,
   Parent_Path
)
SELECT * 
FROM Count_limt 
WHERE Counter <= $$Ignore when > N Duplicates$$ 
ORDER BY Devices_With_Observable DESC,Counter DESC

The way this query works is it first builds a list of all installed applications on windows as seen in the data lake and then applies the filters selected by the admin. The variables are all converted to upper case so we can perform a case insensitive search and we add a wildcard at the front and back of each. Note % is the character you enter to specify a wildcard. 

  -- SEARCH AND FILTER CRITERIA
UPPER(xdr_data.meta_hostname) LIKE UPPER('%$$Device$$%') AND
( UPPER(xdr_data.meta_username) LIKE UPPER('%$$User Name$$%') OR UPPER(xdr_data.username) LIKE UPPER('%$$User Name$$%') )AND
UPPER(xdr_data.parent_name) LIKE UPPER('%$$Parent Process Name$$%') AND
UPPER(xdr_data.name) LIKE UPPER('%$$Process Name$$%') AND
UPPER(xdr_data.cmdline) LIKE UPPER('%$$Command Line$$%')  

The variable data entered by the user is inserted into the SQL statement. We are requiring that all of the search criteria are met by using the AND statement.  Note we are performing a search on username for both the 'meta_username and username. This allows us to search for processes that were run when that user was logged on to the device and processes directly executed by the user.   When this section of the query completes we have a virtual table with lots of potentially duplicate results as a process may have had the same exact parent, command line and other information and simply be run multiple times   The next portion of the query  performs aggregation of the duplicates and keep each value in an array. 

First we want a count of the number of duplicates. we wo that with a COUNT function.

   COUNT(DISTINCT epName) Devices_With_Observable,

We do not want to lose useful information on things that were of potential value like the device, sophos process ID ect so we put these into an array

   array_join(array_agg(DISTINCT epName||' : '||Device_OS), CHR(10)) Device_List,

Having specified the aggratations and performed the counting we now want to check the last filter and drop any results where the number of instances exceeds the user specified limit.

WHERE Counter <= $$Ignore when > N Duplicates$$ 

In the end we get a nice little set of results that we can quickly review.  It is simple to now go refine the filters and begin to narrow down to what is interesting. Because the queries execute so fast the admin can use this iteratively to perform some rapid orientation and when pivots arrive in March we will show how to go from one query to another without having to remember or copy the information we are focused on.

Sample results

Generic Search on Network Activity
-- Generic PIVOT Search Network Open_Sockets, Listening_Ports, sophos_ips_windows, sophos_urls_windows
-- NOTE: ALL VARIABLES - Support wild card and performs a case insensative search for a match
-- VARIABLE:   $$Command Line$$                 STRING
-- VARIABLE:   $$Parent Process Name$$          STRING
-- VARIABLE:   $$Process Name$$                 STRING  
-- VARIABLE:   $$Remote IP_Address$$            STRING
-- VARIABLE:   $$Remote Port$$                  STRING
-- VARIABLE:   $$URI$$                          STRING

With Full_List AS (
SELECT 
   X1.meta_hostname epName,
   X1.meta_os_name Device_Type,
   X1.query_name Table_Name,
   replace(replace(CAST(from_unixtime(X2.time) AS VARCHAR),'.000','Z'),' ','T') Date_Time, -- add the T to help excel understand this is a date and time
   X2.meta_username User_Name,
   X2.parent_Name Parent_Process_Name,
   X2.name Process_Name,
   X2.cmdline Cmd_Line,
   X2.sophos_pid SophosPID,
   X2.parent_sophos_pid Parent_SophosPID,
   X2.sha256 SHA256,
   X1.source_ip Local_IP,
   X1.port Local_Port,
   X1.destination_ip Remote_IP,
   X1.destination_port Remote_Port, 
   X1.protocol,
   X1.domain, 
   X1.clean_urls, 
   X2.path Path_Name,
   X2.ml_score ML_Score,
   X2.pua_score PUA_Score,
   X2.global_rep Global_Rep,
   X2.local_rep Local_Rep,
   X2.parent_path Parent_Path 
FROM
   xdr_data X1 LEFT JOIN xdr_data X2 ON X2.query_name = 'running_processes_windows_sophos' AND STRPOS(CAST(X1.sophos_pids AS VARCHAR),X2.sophos_pid) > 0 
WHERE
   X1.query_name = 'sophos_ips_windows' AND
   UPPER(X2.cmdline) LIKE UPPER('%$$Command Line$$%') AND
   UPPER(X2.parent_Name) LIKE UPPER('%$$Parent Process Name$$%') AND
   UPPER(X2.name) LIKE UPPER('%$$Process Name$$%') AND
   UPPER(x1.destination_ip) LIKE UPPER('%CAST($$Remote IP_Address$$ AS VARCHAR)%') AND
   UPPER(CAST(x1.destination_port AS VARCHAR)) LIKE UPPER('%$$Remote Port$$%') AND
   '$$URI$$' = '%'
   
UNION ALL

SELECT 
   X1.meta_hostname epName,
   X1.meta_os_name Device_Type,
   X1.query_name Table_Name,
   replace(replace(CAST(from_unixtime(X2.time) AS VARCHAR),'.000','Z'),' ','T') Date_Time, -- add the T to help excel understand this is a date and time
   X2.meta_username User_Name,
   X2.parent_Name Parent_Process_Name,
   X2.name Process_Name,
   X2.cmdline Cmd_Line,
   X2.sophos_pid SophosPID,
   X2.parent_sophos_pid Parent_SophosPID,
   X2.sha256 SHA256,
   X1.source_ips Local_IP,
   X1.port Local_Port,
   X1.destination_ip Remote_IP,
   X1.destination_port Remote_Port, 
   X1.protocol,
   X1.domain, 
   X1.clean_urls, 
   X2.path Path_Name,
   X2.ml_score ML_Score,
   X2.pua_score PUA_Score,
   X2.global_rep Global_Rep,
   X2.local_rep Local_Rep,
   X2.parent_path Parent_Path 
FROM
   xdr_data X1 LEFT JOIN xdr_data X2 ON X2.query_name = 'running_processes_windows_sophos' AND STRPOS(CAST(X1.sophos_pids AS VARCHAR),X2.sophos_pid) > 0 
WHERE
   X1.query_name = 'sophos_urls_windows' AND
   UPPER(X2.cmdline) LIKE UPPER('%$$Command Line$$%') AND
   UPPER(X2.parent_Name) LIKE UPPER('%$$Parent Process Name$$%') AND
   UPPER(X2.name) LIKE UPPER('%$$Process Name$$%') AND
   UPPER(X1.destination_ips) LIKE UPPER('%CAST($$Remote IP_Address$$ AS VARCHAR)%') AND
   UPPER(CAST(X1.destination_port AS VARCHAR)) LIKE UPPER('%$$Remote Port$$%') AND
   UPPER(X1.clean_urls) LIKE UPPER('%$$URI$$%')
   
UNION ALL

SELECT 
   X1.meta_hostname epName,
   X1.meta_os_name Device_Type,
   X1.query_name Table_Name,
   replace(replace(CAST(from_unixtime(X2.time) AS VARCHAR),'.000','Z'),' ','T') Date_Time, -- add the T to help excel understand this is a date and time
   X2.meta_username User_Name,
   X2.parent_Name Parent_Process_Name,
   X2.name Process_Name,
   X2.cmdline Cmd_Line,
   X2.sophos_pid SophosPID,
   X2.parent_sophos_pid Parent_SophosPID,
   X2.sha256 SHA256,
   X1.local_address Local_IP,
   X1.port Local_Port,
   X1.remote_address Remote_IP,
   X1.remote_port Remote_Port, 
   X1.protocol,
   X1.domain, 
   X1.clean_urls, 
   X2.path Path_Name,
   X2.ml_score ML_Score,
   X2.pua_score PUA_Score,
   X2.global_rep Global_Rep,
   X2.local_rep Local_Rep,
   X2.parent_path Parent_Path 
FROM
   xdr_data X1 LEFT JOIN xdr_data X2 ON X2.query_name = 'running_processes_windows_sophos' AND X1.meta_hostname = X2.meta_hostname AND X1.name = X2.name  AND X1.cmdline = X2.cmdline AND X1.pid = X2.pid AND X1.parent = X2.parent
WHERE
   X1.query_name = 'open_sockets' AND
   UPPER(X2.cmdline) LIKE UPPER('%$$Command Line$$%') AND
   UPPER(X2.parent_Name) LIKE UPPER('%$$Parent Process Name$$%') AND
   UPPER(X2.name) LIKE UPPER('%$$Process Name$$%') AND
   UPPER(X1.remote_address) LIKE UPPER('%$$Remote IP_Address$$%') AND
   UPPER(CAST(X1.remote_port AS VARCHAR)) LIKE UPPER('%$$Remote Port$$%') AND
   '$$URI$$' = '%'

UNION ALL

SELECT 
   X1.meta_hostname epName,
   X1.meta_os_name Device_Type,
   X1.query_name Table_Name,
   replace(replace(CAST(from_unixtime(X2.time) AS VARCHAR),'.000','Z'),' ','T') Date_Time, -- add the T to help excel understand this is a date and time
   X2.meta_username User_Name,
   X2.parent_Name Parent_Process_Name,
   X2.name Process_Name,
   X2.cmdline Cmd_Line,
   X2.sophos_pid SophosPID,
   X2.parent_sophos_pid Parent_SophosPID,
   X2.sha256 SHA256,
   X1.address Local_IP,
   X1.port Local_Port,
   X1.remote_address Remote_IP,
   X1.remote_port Remote_Port, 
   X1.protocol,
   X1.domain, 
   X1.clean_urls, 
   X2.path Path_Name,
   X2.ml_score ML_Score,
   X2.pua_score PUA_Score,
   X2.global_rep Global_Rep,
   X2.local_rep Local_Rep,
   X2.parent_path Parent_Path 
FROM
   xdr_data X1 LEFT JOIN xdr_data X2 ON X2.query_name = 'running_processes_windows_sophos' AND X1.meta_hostname = X2.meta_hostname AND X1.name = X2.name AND X1.pid = X2.pid
WHERE
   X1.query_name = 'listening_ports' AND  X1.address NOT IN ('::1','0.0.0.0','::')  AND
   UPPER(X2.cmdline) LIKE UPPER('%$$Command Line$$%') AND
   UPPER(X2.parent_Name) LIKE UPPER('%$$Parent Process Name$$%') AND
   UPPER(X2.name) LIKE UPPER('%$$Process Name$$%') AND
   UPPER(X1.remote_address) LIKE UPPER('%$$Remote IP_Address$$%') AND
   UPPER(CAST(X1.remote_port AS VARCHAR)) LIKE UPPER('%$$Remote Port$$%') AND
   '$$URI$$' = '%'
)

SELECT 
   epName, Device_Type, Table_Name, 
   MIN(Date_Time) FirstSeen,
   MAX(Date_Time) LastSeen,
   count(epName) Instances,
   User_Name, Parent_Process_Name, Process_Name, Cmd_Line, Local_IP, Local_Port, Remote_IP, Remote_Port, Protocol, clean_urls, 
   array_join(array_agg(SophosPID), CHR(10)) SophosPID_LIST,
   array_join(array_agg(Parent_SophosPID), CHR(10)) Parent_SophosPID_LIST,
   SHA256,
   Path_Name, ML_Score, PUA_Score, Global_Rep, Local_Rep, Parent_Path
FROM Full_List
GROUP BY epName, Device_Type, Table_Name, User_Name, Parent_Process_Name, Process_Name, Cmd_Line, Local_IP, Local_Port, Remote_IP, Remote_Port, Protocol, clean_urls, SHA256,
   Path_Name, ML_Score, PUA_Score, Global_Rep, Local_Rep, Parent_Path
ORDER BY LastSeen DESC