Performing Suspect Power Shell Detection with a YARA Rule

Josh Sax from the Sophos Data Science team published a YARA file that encoded rules for evaluating powershells as suspect or benign back in 2020.  With XDR's ability run a YARA rule directly from a GIT location I wanted to build an experimental query that would run that YARA signature evaluation for any PowerShell scripts added to a machine in a given time period.  The variables allow you to set the start time for the search and the number of hours to search.  In order to identify the candidate scripts for evaluation we are leveraging the Sophos_File_Joural that maintains a record of all file system changes for the last 90 days.  We are only looking for files in the date range that were created and have a valid extension for powershells.   If the file is no-longer on the system the Journal will have a record but when we go to run the YARA rule it will not find the file and not report on it. 

For the query i show the classification status, file name and file path for all files evaluated.   

To make it even more powerful we also allow the admin to set the threshold value. 

What the model does is observe a number of word token features and then sum up // their counts, multiplying them each by learned coefficients. The Threshold value allows you to set how aggressive the classification should be. If you'd like this detector to be more aggressive (more FP prone, with a higher  detection rate), set that number of -1.941, which will yield an expected false positive rate of 2% on powershell script files. On a test set, this threshold achieved 99% detection. 

Other thresholds:
   1.024, 1% FPR / 95% detection rate
   5.582, 0.2% FPR, 70% detection rate.

For more info on the GIT project for on converting Machine Learning models to YARA see: 

https://github.com/sophos-ai/yaraml_rules

The YARA signiture file used is here:  https://raw.githubusercontent.com/sophos-ai/yaraml_rules/master/example_models/generic_powershell_detector_jan28_2020/generic_powershell_detector.rule

-- VARIABLE $$Begin Search on$$                                      DATE
-- VARIABLE $$Search for N hours$$                                   STRING

/***********************************************************************************\
| N.B.!                                                                             |
| This query uses a logistic regression model for detecting malicious powershell    |
| regularized with a L1 penalty.                                                    |
|                                                                                   |
| What the model does is observe a host of word token features and then sum up      |
| their counts, multiplying them each by learned coefficients.                      |
|                                                                                   |
| At the bottom of the yara file, you'll find a '>' sign followed by the number 0.  |
| If you'd like this detector to be more aggressive (more FP prone, with a higher   |
| detection rate), set that number to -1.941, which will yield an expected false    |
| positive rate of 2% on powershell script files.  On a test set, this threshold    |
| achieved 99% detection.                                                           |
|                                                                                   |
| Other thresholds:                                                                 |
| 1.024, 1% FPR / 95% detection rate,                                               |
| 5.582, 0.2% FPR, 70% detection rate.                                              |
\************************************************************************************/

-- Modify the Powershell classification rule to use the admin supplied threashold value
WITH Sig_Rule AS (
   SELECT REPLACE(result,'> 0', '> $$Threashold$$') Rule 
   FROM curl 
   WHERE URL = 'https://raw.githubusercontent.com/sophos-ai/yaraml_rules/master/example_models/generic_powershell_detector_jan28_2020/generic_powershell_detector.rule')

-- OSQuery File and YARA tables have some issues with the potential volume of data being evaluated so evaluate each path independently
-- NB Scripts no-longer on the drive are not evaluated
SELECT
   CASE WHEN (SELECT COUNT(*) FROM file F WHERE f.path = p.path)>0 
      THEN (SELECT CASE WHEN count > 0 THEN '1 - Malicious: '||matches ELSE '2 - Benign' END FROM yara Y WHERE y.path = P.path AND y.sigrule = (SELECT Rule FROM Sig_Rule))
      ELSE '3 - File no longer on disk at this location' 
   END Classification,
   replace(P.path, rtrim(P.path, replace(P.path, '\', '')), '') File_Name,
   spj.processname Creted_by_process,
   spj.cmdline,
   p.sophosPID,
   P.path Location_of_script
FROM Sophos_File_Journal P JOIN sophos_process_journal spj ON spj.sophosPID = p.SophosPID
WHERE p.Subject = 'FileOtherChanges' 
   AND p.event_type = 0 
   AND (p.time BETWEEN $$Begin Search on$$ AND $$Begin Search on$$ + 3600 * $$Search for N hours$$)
   AND p.path LIKE '%.p%1'
ORDER BY Classification ASC



Updated query to include the process name that wrote the file and the SophosPID.
[edited by: Karl_Ackerman at 5:41 PM (GMT -7) on 22 Aug 2021]