Skip to content

Conversation

@keaton-sublime
Copy link
Member

adding:

  • scan_pdf_obj_hash.py
  • requirements.txt to pull in the main branch for the most up-to-date pdf object hashing library
  • updating scanners.yaml to also run this scanner on PDFs

Describe the change
This adds in support for PDF object hashing, a method to create a structural hash for PDF objects (think imphash or ja3). This allows us to match complete matches with the object_hash value and partial matches by matching parts of the hash_string

Describe testing procedures
I build the scanner in our strelka docker instance on a VM, and used oneshot to send the file to the strelka scanner. After that was working I added the fields to the platform and tested there, testing against ~200 PDFs attached to emails.

Sample output

  "pdf_obj_hash": {
      "elapsed": 0.001209,
      "hash_string": "Filter|Filter|Filter|Page|ProcSet|N|None|StructTreeRoot|StructElem|StructElem|StructElem|StructElem|Pages|Catalog|None|Font/TrueType|FontDescriptor|Length1|Font/TrueType|Length|FontDescriptor|Length1|Title|",
      "object_hash": "e8e2199997a36a260df436937ba63f7f"
    },

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of and tested my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

adding:
- scan_pdf_obj_hash.py
- requirements.txt to pull in the main branch for the most up-to-date pdf object hashing library
- updating scanners.yaml to also run this scanner on PDFs
adding a time out
oletools==0.60.1
opencv-python==4.8.1.78
opencv-contrib-python==4.8.1.78
pdf-object-hashing @ git+https://github.com/0xkyle/pdf_object_hashing.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have to ask as I don't know the answer at all but we may want to see if we're cool with using this package (I know it's yours) but currently it's Apache 2.0 license so just wanted to make sure. I know MIT is cool but I personally don't know much about apache license types and wanted to atleast point it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about license types either, but it looks like strelka itself is under apache 2.0 https://github.com/target/strelka/blob/master/LICENSE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants