How do I check the logic for a classification rule if I suspect it is a False Positive?

With the new Detections capability for XDR we have hundreds of classification rules that are executed on the stream of data that is reaching the data lake and hundreds more that execute on the device as well as machine learning models that examine data in a more comprehensive way.   

When you are looking at a detection in the console it has been generated by one of the classification rules.  The majority of detections shown in the console come from the heuristic classification rules run in the cloud.  Most activity for MITRE ATT&CK can be easily classified by simply examining the parent/child relationship and command line information for the process that was started.  These type of heuristic rules are going to perform the majority of the work to map device/user activity to efficiently the Mitre ATT&CK framework.  The efficiency of these rules allows us to process over 10 billion events per day with the current EAP and to scale that as the product moves to generally available and our customer base expands.  With billions of events reaching the lake every day the rules need to be efficient and the approach taken by sophos is to create multiple types of analytics engines called workers. A worker registers to receive data from one or more streams of information being sent to the lake and each worker will apply a set of classification rules on the data as it streams into the lake. If the classification rule finds a match it passes it to another set of processes that perform data enrichment before it is written into a data lake table for all detections. 

More advanced rules that leverage machine learning and behavior analytics are also used on the endpoint device and in the data lake but for now let's just focus on the simple heuristic classification rules.   

Detections done in the Cloud:
When the detection is performed in the cloud we use what are called 'Workers'  A worker listens on one or more of the data streams coming from the sensor.  For example the 'DSL Worker' is listening for all windows, mac and linux process activity and the open socket and listening port data from the endpoint.  This data streams into the lake from the devices regularly.  Typically every 20 seconds the device sends its running process information to the data lake, and with the forensic journals we will also see the information for any process that was started and stopped during the last 20 seconds as well as the new processes that are currently running.   

The DSL Worker has hundreds of classification rules that it uses to examine the data stream and if it finds a match it will generate a detection and create a record in the xdr_ti_data table.  With the entry the worker logs the classification rule ID, risk score, description, mitre ATT&CK information and other rule specific information along with the event information that matched the rules logic.  One thing it will also log is the rule itself and that is the information  we want to reveal with a query so we can understand exactly what the match criteria was for a specific detection. 

So why would you want to check the rules logic.  Perhaps you have a detection and after looking at the event information that triggered the detection and reading the description you think the rule made a mistake and has falsely classified something.  Please be aware that a detection is not the conviction of activity as malicious it is simply saying that what was observed maps to one or more Mitre ATT&CK tactics, techniques or procedures.   To check the details on the logic used by a rule you can run this query to get the detection information and classification rule for for a DSL Worker generated detection. Note other workers and other detections may not have the rules logic available. (for example a runtime ioc detection where the behavior engine performed the classification)

This is a data lake query that can take four variables. Wildcards are supported '%'

You can see in the results we have a few cols. let me explain each

Instances: This is a count showing the number of times the specific rule has fired in my test range

WorkerProcess: This identifies the type of worker that ran the rule. In our case we are only focused on DSL Workers

Category: This is the category for the detection Threat, Vulnerability or Classifier (Threats may be related to malicious activity, vulnerability tend to be machine configuration problems, like features being off or disabled, Classifier are simply things that mapped to miter and were determined by the behavior engine on the endpoint, they are analogous to threats, but like threats are not necessarly malicious activity but simply map to one or more MITRE ATT&CK TTPs.

Detection_Type: This simply names the basic type of activity that the rule was attempting to classify

Type: This indicates the type of evidence that was evaluated, process activity, vulnerability check result, direct windows log...

Risk: This is the risk score that has been assigned to the detection based on the sophos managed threat teams experience with thousands of investigations. 

Sigma details: When available this shows the SIGMA classification rule that was used

EQL Details: This is the rule itself in a structured form it describes the logic that was performed by the work. 

ATTACK_Mapping: The mitre tactic techniques this maps to the TTPS from the MITRE ATT&CK enterprise matix

Experimental: Indicates if the classification rule is experimental or not. Experimental rules are still being tuned and scored. They may be very noisy (hundreds or thousands of matches) or making mistakes (Classifying something that is a false positive)

So you want to run that query in your own environment.  Here it is

-- RULE details

-- VARIABLE $$Category$$              STRING
-- VARIABLE $$Classification Rule$$   STRING
-- VARIABLE $$Mitre Tactic$$          STRING
-- VARIABLE $$Worker Process$$        STRING

SELECT
   COUNT(*) Instances,
   ioc_worker_name WorkerProcess,
   ioc_detection_category Category,
   ioc_detection_attack Detection_Type,
   ioc_detection_type Type,
   ioc_severity Risk,
   ioc_detection_id Classificattion_Rule,
   ioc_detection_sigma Sigma_Details,
   ioc_detection_eql EQL_Details,
   ioc_detection_mitre_attack ATTACK_Mapping,
   ioc_detection_experiment_level Experimental 
FROM xdr_ti_data
WHERE LOWER(ioc_detection_id) LIKE LOWER('%$$Classification Rule$$%')
   AND LOWER(ioc_detection_category) LIKE LOWER('%$$Category$$%')
   AND LOWER(ioc_worker_name) LIKE LOWER('%$$Worker Process$$%')
   AND LOWER(ioc_detection_mitre_attack) LIKE LOWER('%$$Mitre Tactic$$%')
GROUP BY ioc_worker_name, ioc_detection_category, ioc_severity, ioc_detection_id, ioc_detection_sigma, ioc_detection_eql, 
   ioc_detection_attack, ioc_detection_mitre_attack, ioc_detection_type, ioc_detection_experiment_level
ORDER BY 1 DESC



Spelling fix in the query col name
[edited by: Karl_Ackerman at 8:56 PM (GMT -7) on 21 Oct 2021]