This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Data Control - PDF file with US SSN or Banking information.

I set up few Data control policy for Banking information,  US SSN , etc. The policy is working fine. Whenever users  email any document with SSN or Banking information I do get notification.  However if user convert a document  to pdf file which has Banking or SSN , the policy doesn’’’’t recognize the file.  I couldn’’’’t find any option for file type.

Any suggestions ?

:12987


This thread was automatically locked due to age.
  • Hello BopBop,

    so you've modified the default rules to apply to email applications? If you did not exclude certain filetypes (Edit rule, 5. Select files to exclude) the rules should work on PDF as well (just tested it). You could turn on verbose logging on the client (open the GUI, Configure data control) to view what's detected and what is not. Which type (and version) are the any documents and how do users convert them?

    Christian

    :13003
  • Na , it doesn’’’’t scan Pdf file with “'Bank routing numbers with qualifying terms'” policy.

    I am not excluding any file ( from Rule 5)

    Here is the logs

    A data control policy was infringed on machine SOPHOS-TEST by logged on user XXX\YYYY.

    An "allow file transfer" action was taken.

              Username: XXX\YYYY

              Rule names: 'Bank routing numbers with qualifying terms', 'Encrypt file before transfer', 'Encrypted archive or file formats', 'Microsoft Office documents'

              User action: File copy

              Data Control action: Allow

              File type: Spreadsheet (Microsoft Excel-OLE)

              Source path: C:\Documents and Settings\YYYYY\Desktop\data L\Sam Test Document.xls

              Destination path: E:\data L\Sam Test Document.xls

              Destination type: Removable storage

    If I turn on the policy for “'Microsoft Office documents'” it will scan the PDF file

    An "allow file transfer" action was taken.

              Username: XXX\YYYY

              Rule names: 'Encrypt file before transfer', 'Encrypted archive or file formats', 'Microsoft Office documents'

              User action: File copy

              Data Control action: Allow

              File type: Document (PDF)

              Source path: C:\Documents and Settings\YYYY\Desktop\data L\Copy of Sam Test Document.pdf

              Destination path: E:\data L\Copy of Sam Test Document.pdf

              Destination type: Removable storage

    Users are converting document to PDF from MS office 2010 add-in.

    :13049
  • Hi,

    Can you contact Sophos support and send in some sample "converted" PDF documents, a verbose log (showing the file being scanned but not triggering the rule) and also the exported data control rule set? It is feasible that the conversion process is doing something odd to the PDF and if that is the case we can investigate whether it is a context extraction or content analysis issue.

    Thanks,

    John Stringer

    Senior Product Manager

    :13063
  • Please check the verbose data control log on the client (using the GUI or viewing DataControl.txt ). The email alert doesn't show the verbose log and contains only the matched rules. Apparently 'Bank routing numbers with qualifying terms'  did not match for the PDF file.

    The verbose log will show any at least partially matched content rule and the details of the matches. If you don't get any match at all for the Bank rule (and you are definitely not excluding the document) you should engage Support. Otherwise the log might tell you why the rule hasn't been triggered. If your test document is "near the trigger level" it could as well be a "false positive" match in the xls. Or it might include some non-printable cells or tables.

    Christian

    :13069
  • I opened a case with Sophos. They sent it to their lab and this is the response received few days ago

    Text is placed on a PDF page by graphics positioning and drawing operators. The numbers are being extracted by text extraction, but joined together with the preceding and following text, e.g. "...11-1111222222223333..." instead of "22222222" separately. So the routing number is not recognized as a distinct word, we then use a heuristic to determine when a new word has been started. In this case the heuristic failed.

    We have a defect open on the issue to try and correct the heuristic but have no ETR at this time.

    :14463