|
| 1 | + |
| 2 | +# Strelka Documentation |
| 3 | + |
| 4 | +Welcome to the official documentation for Strelka, an advanced tool for automated malware analysis. This documentation aims to provide comprehensive insights into the functionality and usage of Strelka, facilitating ease of use and development. |
| 5 | + |
| 6 | +## Table of Contents |
| 7 | +- [Overview](#overview) |
| 8 | +- [How Docs Work](#how-docs-work) |
| 9 | +- [Running Docs Locally](#running-docs-locally) |
| 10 | +- [Automated Pipeline](#automated-pipeline) |
| 11 | +- [Documentation Format](#documentation-format) |
| 12 | + - [Scanners](#scanners) |
| 13 | + - [Scanner Class](#scanner-class) |
| 14 | + - [Scanner Functions](#scanner-functions) |
| 15 | + - [Features and Fields](#features-and-fields) |
| 16 | +- [Backend Configuration](#backend-configuration) |
| 17 | + |
| 18 | +## Overview |
| 19 | + |
| 20 | +Strelka is designed for detailed malware analysis, providing robust scanning capabilities across various file types. |
| 21 | +The project's documentation is automatically generated and updated through GitHub Actions the latest changes in the `strelka` repository. |
| 22 | + |
| 23 | +## How Docs Work |
| 24 | + |
| 25 | +Documentation for Strelka is automatically generated to ensure up-to-date information. Key sections include: |
| 26 | + |
| 27 | +- **Strelka Scanners**: Discusses the core analysis components. |
| 28 | + |
| 29 | +## Running Docs Locally |
| 30 | + |
| 31 | +To set up and view the documentation locally, follow these steps: |
| 32 | + |
| 33 | +1. **Install Poetry** |
| 34 | + |
| 35 | + Download and install Poetry, a tool for handling Python package dependencies. |
| 36 | + |
| 37 | + ```bash |
| 38 | + curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python - |
| 39 | + ``` |
| 40 | + |
| 41 | +2. **Clone the Strelka Repository** |
| 42 | + |
| 43 | + Obtain the latest version of the `strelka` code from its repository. |
| 44 | + |
| 45 | + ```bash |
| 46 | + git clone https://github.com/target/strelka |
| 47 | + ``` |
| 48 | +3. **Install Dependencies** |
| 49 | + |
| 50 | + Use Poetry to install the necessary dependencies for running the documentation locally. |
| 51 | + |
| 52 | + ```bash |
| 53 | + poetry install |
| 54 | + ``` |
| 55 | + |
| 56 | +4. **(Optional) Replace Scanners** |
| 57 | + |
| 58 | + If you need to develop or test documentation for specific scanners, modify the scanner in the `strelka/scanner` folder. |
| 59 | + |
| 60 | +5. **Build the Documentation** |
| 61 | + |
| 62 | + Generate the latest version of the documentation by running the build script. This will create new `.md` files based on all of the scanner code. |
| 63 | + |
| 64 | + ```bash |
| 65 | + python ./build_docs.py |
| 66 | + ``` |
| 67 | + |
| 68 | +6. **Start the Local Mkdocs Server** |
| 69 | + |
| 70 | + Use Poetry to run the Mkdocs server and view the documentation locally. |
| 71 | + |
| 72 | + ```bash |
| 73 | + poetry run mkdocs serve |
| 74 | + ``` |
| 75 | + |
| 76 | +7. **Access the Documentation** |
| 77 | + |
| 78 | + Open your web browser and go to `http://127.0.0.1:8000/target/strelka/` to view the local documentation. |
| 79 | + |
| 80 | +## Automated Pipeline |
| 81 | + |
| 82 | +`strelka-docs` builds and publishes new documents to the `gh-pages` branch. This branch is hosted on [GitHub](https://target.github.io/strelka-docs/). |
| 83 | + |
| 84 | +## Strelka Documentation Update Process |
| 85 | + |
| 86 | +1. **Pull Request** (`strelka` repo) |
| 87 | + - A user submits a PR which is then reviewed for integration. |
| 88 | + |
| 89 | +2. **Merge** (`strelka` repo) |
| 90 | + - The PR is approved and merged into the main branch. |
| 91 | + |
| 92 | +3. **Build Trigger** (`strelka` repo) |
| 93 | + - The merge triggers the Vela pipeline, which builds Strelka and commits to the `strelka-docs` repo. |
| 94 | + |
| 95 | +4. **Doc Build** (`strelka-docs` repo) |
| 96 | + - The `strelka-docs` pipeline generates documentation using the latest `strelka` repos. |
| 97 | + |
| 98 | +5. **Publish** (`strelka-docs` repo) |
| 99 | + - Newly generated documentation is published and made available to users. |
| 100 | + |
| 101 | +## Documentation Format |
| 102 | + |
| 103 | +### Scanners |
| 104 | + |
| 105 | +#### Scanner Class |
| 106 | + |
| 107 | +Documented based on Google docstrings guidelines, including: |
| 108 | + |
| 109 | +- **Description**: A concise overview of the scanner's purpose and functionality. |
| 110 | + - Includes **Scanner Type**: Collection or Malware |
| 111 | +- **Attributes**: Details about the scanner's attributes that define its behavior. Usually found outside functions or inside init. (Can be None) |
| 112 | +- **Other Parameters**: Details about the scanner's options. Can usually be found defined at the top of the `scan` class or inside the `backend.yml`. |
| 113 | +- **Detection Use Cases**: Examples of potential use cases for the scanner, highlighting its detection capabilities. |
| 114 | +- **Known Limitations**: Acknowledgment of any limitations or areas for improvement in the scanner's functionality. (Can be None) |
| 115 | +- **Todo**: List of potential script improvements / future implementations (Can be None) |
| 116 | +- **References**: List of references used to develop / describe the scanner (Can be None) |
| 117 | +- **Contributors**: List of users that have assisted in the development of the scanner. |
| 118 | + |
| 119 | +##### Example of a Class-based Docstring |
| 120 | + |
| 121 | +``` |
| 122 | +class ScanEmail(strelka.Scanner): |
| 123 | + """ |
| 124 | + Extracts and analyzes metadata, attachments, and optionally generates thumbnails from email messages. |
| 125 | +
|
| 126 | + This scanner processes email files to extract and analyze metadata, attachments, and optionally generates |
| 127 | + thumbnail images of the email content for a visual overview. It supports both plain text and HTML emails, |
| 128 | + including inline images. |
| 129 | +
|
| 130 | + Scanner Type: Collection |
| 131 | +
|
| 132 | + ## Options |
| 133 | +
|
| 134 | + Attributes: |
| 135 | + None |
| 136 | + |
| 137 | + Other Parameters: |
| 138 | + create_thumbnail (bool): Indicates whether a thumbnail should be generated for the email content. |
| 139 | + thumbnail_header (bool): Indicates whether email header information should be included in the thumbnail. |
| 140 | + thumbnail_size (int): Specifies the dimensions for the generated thumbnail images. |
| 141 | +
|
| 142 | + ## Detection Use Cases |
| 143 | + !!! info "Detection Use Cases" |
| 144 | + - **Document Extraction** |
| 145 | + - Extracts and analyzes documents, including attachments, from email messages for content review. |
| 146 | + - **Thumbnail Generation** |
| 147 | + - Optionally generates thumbnail images of email content for visual analysis, which can be useful for |
| 148 | + quickly identifying the content of emails. |
| 149 | + - **Email Header Analysis** |
| 150 | + - Analyzes email headers for potential indicators of malicious activity, such as suspicious sender addresses |
| 151 | + or subject lines. |
| 152 | +
|
| 153 | + ## Known Limitations |
| 154 | + !!! warning "Known Limitations" |
| 155 | + - **Email Encoding and Complex Structures** |
| 156 | + - Limited support for certain email encodings or complex email structures. |
| 157 | + - **Thumbnail Accuracy** |
| 158 | + - Thumbnail generation may not accurately represent the email content in all cases, |
| 159 | + especially for emails with complex layouts or embedded content. |
| 160 | + - **Limited Output** |
| 161 | + - Content is limited to a set amount of characters to prevent excessive output. |
| 162 | +
|
| 163 | + ## To Do |
| 164 | + !!! question "To Do" |
| 165 | + - **Improve Error Handling**: |
| 166 | + - Enhance error handling for edge cases and complex email structures. |
| 167 | + - **Enhance Support for Additional Email Encodings and Content Types**: |
| 168 | + - Expand support for various email encodings and content types to improve scanning accuracy. |
| 169 | +
|
| 170 | + ## References |
| 171 | + !!! quote "References" |
| 172 | + - [Python Email Parsing Documentation](https://docs.python.org/3/library/email.html) |
| 173 | + - [WeasyPrint Documentation](https://doc.courtbouillon.org/weasyprint/stable/) |
| 174 | + - [PyMuPDF (fitz) Documentation](https://pymupdf.readthedocs.io/en/latest/) |
| 175 | +
|
| 176 | + ## Contributors |
| 177 | + !!! example "Contributors" |
| 178 | + - [Josh Liburdi](https://github.com/jshlbrd) |
| 179 | + - [Paul Hutelmyer](https://github.com/phutelmyer) |
| 180 | + - [Ryan O'Horo](https://github.com/ryanohoro) |
| 181 | +
|
| 182 | + """ |
| 183 | +``` |
| 184 | + |
| 185 | + |
| 186 | +#### Example of a Function-based Docstring |
| 187 | + |
| 188 | +Outlines function purposes, arguments, and return values, promoting clarity and ease of use. |
| 189 | + |
| 190 | +``` |
| 191 | + """ |
| 192 | + Performs the scan operation on batch file data, extracting and categorizing different types of tokens. |
| 193 | +
|
| 194 | + Args: |
| 195 | + data (bytes): The batch file data as a byte string. |
| 196 | + file (strelka.File): The file object to be scanned. |
| 197 | + options (dict): Options for customizing the scan. These options can dictate specific behaviors |
| 198 | + like which tokens to prioritize or ignore. |
| 199 | + expire_at (datetime): Expiration timestamp for the scan result. This is used to determine when |
| 200 | + the scan result should be considered stale or outdated. |
| 201 | + """ |
| 202 | +``` |
0 commit comments