|
| 1 | +| Status | Date | Author(s) | |
| 2 | +|:---------|:-----------|:-------------------------------------| |
| 3 | +| Proposed | 2025-01-29 | [@nscuro](https://github.com/nscuro) | |
| 4 | + |
| 5 | +## Context |
| 6 | + |
| 7 | +In [ADR-001], we surfaced the complication of transmitting large payloads (BOMs, notifications, analysis results). |
| 8 | + |
| 9 | +Neither message brokers, nor RDBMSes are meant to store large blobs of data. |
| 10 | +The introduction of a Postgres-based workflow orchestration solution (see [ADR-002]) does not change this reality. |
| 11 | + |
| 12 | +To ensure our core systems stay performant, we should fall back to storing files externally, |
| 13 | +and instead only pass references to those files around. This strategy is followed by messaging |
| 14 | +services such as AWS SNS, which offloads payloads to S3 if they exceed a size of `256KB`. |
| 15 | + |
| 16 | +We do not need fully-fledged filesystem capabilities. Pragmatically speaking, all we need is |
| 17 | +a glorified key-value store. An obvious choice would be to delegate this to an object store |
| 18 | +such as S3. However, we recognize that not all users are able or willing to deploy additional |
| 19 | +infrastructure. |
| 20 | + |
| 21 | +Thus, at minimum, the following storage solutions must be supported: |
| 22 | + |
| 23 | +1. **Local filesystem**. This option is viable for users running single-node clusters, or those with |
| 24 | + access to reasonably fast network storage. This should be the default, as it does not require any |
| 25 | + additional setup. |
| 26 | +2. **S3-compatible object storage**. This option is viable for users running multi-node clusters, |
| 27 | + operating in the cloud, and / or without access to network storage. It could also be required |
| 28 | + to support very large clusters with increased storage requirements. |
| 29 | + |
| 30 | +Optionally, the following solutions may be offered as well: |
| 31 | + |
| 32 | +* **In-memory**. For unit tests, integration tests, and single-node demo clusters. |
| 33 | +* **Non-S3-compatible object storage**. Like Azure Blob Storage and similar proprietary offerings. |
| 34 | + |
| 35 | +To reduce networking and storage costs, as well as network latencies, |
| 36 | +files *should* be compressed *prior* to sending them over the wire. |
| 37 | + |
| 38 | +When retrieving files from storage, providers *should* verify their integrity, |
| 39 | +to prevent processing of files that have been tampered with. |
| 40 | + |
| 41 | +## Decision |
| 42 | + |
| 43 | +To communicate file references, we will leverage metadata objects. Metadata objects will hold a unique |
| 44 | +reference to a file, within the context of a given storage provider. To enable use cases such as encryption |
| 45 | +and integrity verification, we allow providers to attach additional metadata. |
| 46 | + |
| 47 | +Since our primary means of internal communication is based on Protobuf, we will define the file metadata |
| 48 | +in this format. It will allow us to easily attach it to other Protobuf messages. |
| 49 | + |
| 50 | +```protobuf linenums="1" |
| 51 | +syntax = "proto3"; |
| 52 | +
|
| 53 | +// Metadata of a file stored by a storage provider. |
| 54 | +message FileMetadata { |
| 55 | + // Unique identifier of the file. |
| 56 | + string key = 1; |
| 57 | +
|
| 58 | + // Name of the storage provider that hosts the file. |
| 59 | + string storage_name = 2; |
| 60 | +
|
| 61 | + // Additional metadata of the storage provider, |
| 62 | + // i.e. values used for integrity verification. |
| 63 | + map<string, string> storage_metadata = 3; |
| 64 | +} |
| 65 | +``` |
| 66 | + |
| 67 | +The API surface will evolve around the `FileStorage` interface, which exposes methods to |
| 68 | +store, retrieve, and delete files: |
| 69 | + |
| 70 | +```java linenums="1" |
| 71 | +package org.dependencytrack.storage; |
| 72 | + |
| 73 | +import org.dependencytrack.plugin.api.ExtensionPoint; |
| 74 | +import org.dependencytrack.proto.storage.v1alpha1.FileMetadata; |
| 75 | + |
| 76 | +import java.io.IOException; |
| 77 | +import java.util.Collection; |
| 78 | + |
| 79 | +public interface FileStorage extends ExtensionPoint { |
| 80 | + |
| 81 | + /** |
| 82 | + * Persist data to a file in storage. |
| 83 | + * <br/> |
| 84 | + * Storage providers may transparently perform additional steps, |
| 85 | + * such as encryption and compression. |
| 86 | + * |
| 87 | + * @param name Name of the file. This name is not guaranteed to be reflected |
| 88 | + * in storage as-is. It may be modified or changed entirely. |
| 89 | + * @param content Data to store. |
| 90 | + * @return Metadata of the stored file. |
| 91 | + * @throws IOException When storing the file failed. |
| 92 | + */ |
| 93 | + FileMetadata store(String name, byte[] content) throws IOException; |
| 94 | + |
| 95 | + /** |
| 96 | + * Retrieves a file from storage. |
| 97 | + * <br/> |
| 98 | + * Storage providers may transparently perform additional steps, |
| 99 | + * such as integrity verification, decryption and decompression. |
| 100 | + * <br/> |
| 101 | + * Trying to retrieve a file from a different storage provider |
| 102 | + * is an illegal operation and yields an exception. |
| 103 | + * |
| 104 | + * @param fileMetadata Metadata of the file to retrieve. |
| 105 | + * @return The file's content. |
| 106 | + * @throws IOException When retrieving the file failed. |
| 107 | + */ |
| 108 | + byte[] get(FileMetadata fileMetadata) throws IOException; |
| 109 | + |
| 110 | + /** |
| 111 | + * Deletes a file from storage. |
| 112 | + * <br/> |
| 113 | + * Trying to delete a file from a different storage provider |
| 114 | + * is an illegal operation and yields an exception. |
| 115 | + * |
| 116 | + * @param fileMetadata Metadata of the file to delete. |
| 117 | + * @return {@code true} when the file was deleted, otherwise {@code false}. |
| 118 | + * @throws IOException When deleting the file failed. |
| 119 | + */ |
| 120 | + boolean delete(FileMetadata fileMetadata) throws IOException; |
| 121 | + |
| 122 | + /** |
| 123 | + * Some providers support batch deletes. |
| 124 | + * |
| 125 | + * @see #delete(FileMetadata) |
| 126 | + * @see <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html">S3 DeleteObjects API</a> |
| 127 | + */ |
| 128 | + void deleteMany(Collection<FileMetadata> fileMetadata) throws IOException; |
| 129 | + |
| 130 | +} |
| 131 | +``` |
| 132 | + |
| 133 | +To support multiple, configurable providers, we will leverage the plugin mechanism introduced in [hyades-apiserver/#805]. |
| 134 | +The mechanism, once its API is published as separate Maven artifact, will allow users to develop their own storage |
| 135 | +providers if required, without requiring changes to the Dependency-Track codebase. |
| 136 | + |
| 137 | +This allows storage providers to be configured as follows: |
| 138 | + |
| 139 | +```properties linenums="1" |
| 140 | +# Defines the file storage extension to use. |
| 141 | +# When not set, an enabled extension will be chosen based on its priority. |
| 142 | +# It is recommended to explicitly configure an extension for predictable behavior. |
| 143 | +# |
| 144 | +# @category: Storage |
| 145 | +# @type: enum |
| 146 | +# @valid-values: [local, memory] |
| 147 | +file.storage.default.extension= |
| 148 | + |
| 149 | +# Whether the local file storage extension shall be enabled. |
| 150 | +# |
| 151 | +# @category: Storage |
| 152 | +# @type: boolean |
| 153 | +file.storage.extension.local.enabled=true |
| 154 | + |
| 155 | +# Defines the local directory where files shall be stored. |
| 156 | +# Has no effect unless file.storage.extension.local.enabled is `true`. |
| 157 | +# |
| 158 | +# @category: Storage |
| 159 | +# @default: ${alpine.data.directory}/storage |
| 160 | +# @type: string |
| 161 | +file.storage.extension.local.directory= |
| 162 | + |
| 163 | +# Whether the in-memory file storage extension shall be enabled. |
| 164 | +# |
| 165 | +# @category: Storage |
| 166 | +# @type: boolean |
| 167 | +file.storage.extension.memory.enabled=false |
| 168 | +``` |
| 169 | + |
| 170 | +Application code can interact with `FileStorage` via the `PluginManager`: |
| 171 | + |
| 172 | +```java linenums="1" |
| 173 | +package org.dependencytrack.foobar; |
| 174 | + |
| 175 | +import org.dependencytrack.plugin.PluginManager; |
| 176 | +import org.dependencytrack.proto.storage.v1alpha1.FileMetadata; |
| 177 | +import org.dependencytrack.storage.FileStorage; |
| 178 | + |
| 179 | +class Foo { |
| 180 | + |
| 181 | + void bar() { |
| 182 | + try (var fileStorage = PluginManager.getInstance().getExtension(FileStorage.class)) { |
| 183 | + FileMetadata fileMetadata = fileStorage.store("filename", "content".getBytes()); |
| 184 | + |
| 185 | + byte[] fileContent = fileStorage.get(fileMetadata); |
| 186 | + |
| 187 | + fileStorage.delete(fileMetadata); |
| 188 | + } |
| 189 | + } |
| 190 | + |
| 191 | +} |
| 192 | +``` |
| 193 | + |
| 194 | +## Consequences |
| 195 | + |
| 196 | +* There is a non-zero chance of orphaned files remaining in storage. Crashes or service outages on either end |
| 197 | + can prevent Dependency-Track from deleting files if they're no longer needed. Some storage providers such as |
| 198 | + AWS S3 allow retention policies to be configured. This is not true for local file storage, however. |
| 199 | + As a consequence, storage providers should make an effort to make the creation timestamp of files obvious, |
| 200 | + i.e. as part of the file's name, if relying on the file system's metadata is not possible. |
| 201 | +* Storage operations are not atomic with database operations. This is an acceptable tradeoff, |
| 202 | + because it does not impact the integrity of the system. Application code is expected to gracefully |
| 203 | + deal with missing files, and perform compensating actions accordingly. Since file storage is not the |
| 204 | + primary system of record, files existing without the application knowing about them is not an issue. |
| 205 | + |
| 206 | +[ADR-001]: 001-drop-kafka-dependency.md |
| 207 | +[ADR-002]: 002-workflow-orchestration.md |
| 208 | + |
| 209 | +[hyades-apiserver/#805]: https://github.com/DependencyTrack/hyades-apiserver/pull/805 |
0 commit comments