This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Query to check file contents

Hello,

I want to have a query to check for possible PII. There is the query to check for metadata such a password.docx or password.txt. Below is what I have so far and this works as long as you have the exact file path and pattern you are looking for. I tried to use % wildcards for path and %-, %%%-%%%-%%%% and even tried using grep syntax to look for socials in this example. Any help on how to get this to scan directories for files containing specific patterns either with a wildcard or another means. Thank you!

SELECT
*
FROM
grep
WHERE
path = '$$Location$$'
AND pattern = '$$Data$$'



This thread was automatically locked due to age.
Parents Reply Children
  • FormerMember
    +1 FormerMember in reply to TheNewGuy

    one note - this is will be super slow. I wouldn't run this on more than one two machines at a time. Also, keep the target folders very constrained. Recursive iterations down the tree will cause performance problems. If you are too broad in score, the query will fail as it runs into the max time limit for query run. 

  • I agree, it could be limited to a few extensions (pretty crude), for example:

    select path as file, line as matched_text from grep
    where path in(
          select path from file where path 
          like $$Location$$ 
          and       split(path, '.',1) in ('txt', 'config', 'policy')
    )
    and line like '$$Data$$'
    

    Where the variables are:

    $$Data$$ = %onExec%

    $$Location$$ =  'C:\ProgramData\Sophos\Management Communications System\Endpoint\%\%'

    The extensions searched are txt config and policy

  • FormerMember
    0 FormerMember in reply to Sophos User930

    the problem with splitting on the . is that you can technically have .'s in a folder name. C:\This.Folder\This\Folder.This\this.txt would be a valid path and not be processed by that query. Not sure if the OP needs that sort of constraint/folder handling. 

    At this stage its more about thinking up the most robust solution.

  • Hence the 'crude', what about:

    select path as file, line as matched_text from grep
    where path in(
          select path from file where path 
          like $$Location$$ 
          and  regex_match(path, '([0-9a-z]+)(?:[\?#]|$)', 1) in ('txt', 'config', 'policy')
    )
    and line like '$$Data$$'

  • Actually, works for files, not for paths with '.'.  Oh well, hopefully it gives OP a few ideas Slight smile