This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Data Control - PDF file with US SSN or Banking information.

I set up few Data control policy for Banking information,  US SSN , etc. The policy is working fine. Whenever users  email any document with SSN or Banking information I do get notification.  However if user convert a document  to pdf file which has Banking or SSN , the policy doesn’’’’t recognize the file.  I couldn’’’’t find any option for file type.

Any suggestions ?

:12987


This thread was automatically locked due to age.
Parents
  • I opened a case with Sophos. They sent it to their lab and this is the response received few days ago

    Text is placed on a PDF page by graphics positioning and drawing operators. The numbers are being extracted by text extraction, but joined together with the preceding and following text, e.g. "...11-1111222222223333..." instead of "22222222" separately. So the routing number is not recognized as a distinct word, we then use a heuristic to determine when a new word has been started. In this case the heuristic failed.

    We have a defect open on the issue to try and correct the heuristic but have no ETR at this time.

    :14463
Reply
  • I opened a case with Sophos. They sent it to their lab and this is the response received few days ago

    Text is placed on a PDF page by graphics positioning and drawing operators. The numbers are being extracted by text extraction, but joined together with the preceding and following text, e.g. "...11-1111222222223333..." instead of "22222222" separately. So the routing number is not recognized as a distinct word, we then use a heuristic to determine when a new word has been started. In this case the heuristic failed.

    We have a defect open on the issue to try and correct the heuristic but have no ETR at this time.

    :14463
Children
No Data