Skip to content

Commit 446971c

Browse files
committed
Add "ADR-004: File Storage Plugin"
Signed-off-by: nscuro <[email protected]>
1 parent 91084cd commit 446971c

File tree

2 files changed

+210
-0
lines changed

2 files changed

+210
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
| Status | Date | Author(s) |
2+
|:---------|:-----------|:-------------------------------------|
3+
| Proposed | 2025-01-29 | [@nscuro](https://github.com/nscuro) |
4+
5+
## Context
6+
7+
In [ADR-001], we surfaced the complication of transmitting large payloads (BOMs, notifications, analysis results).
8+
9+
Neither message brokers, nor RDBMSes are meant to store large blobs of data.
10+
The introduction of a Postgres-based workflow orchestration solution (see [ADR-002]) does not change this reality.
11+
12+
To ensure our core systems stay performant, we should fall back to storing files externally,
13+
and instead only pass references to those files around. This strategy is followed by messaging
14+
services such as AWS SNS, which offloads payloads to S3 if they exceed a size of `256KB`.
15+
16+
We do not need fully-fledged filesystem capabilities. Pragmatically speaking, all we need is
17+
a glorified key-value store. An obvious choice would be to delegate this to an object store
18+
such as S3. However, we recognize that not all users are able or willing to deploy additional
19+
infrastructure.
20+
21+
Thus, at minimum, the following storage solutions must be supported:
22+
23+
1. **Local filesystem**. This option is viable for users running single-node clusters, or those with
24+
access to reasonably fast network storage. This should be the default, as it does not require any
25+
additional setup.
26+
2. **S3-compatible object storage**. This option is viable for users running multi-node clusters,
27+
operating in the cloud, and / or without access to network storage. It could also be required
28+
to support very large clusters with increased storage requirements.
29+
30+
Optionally, the following solutions may be offered as well:
31+
32+
* **In-memory**. For unit tests, integration tests, and single-node demo clusters.
33+
* **Non-S3-compatible object storage**. Like Azure Blob Storage and similar proprietary offerings.
34+
35+
To reduce networking and storage costs, as well as network latencies,
36+
files *should* be compressed *prior* to sending them over the wire.
37+
38+
When retrieving files from storage, providers *should* verify their integrity,
39+
to prevent processing of files that have been tampered with.
40+
41+
## Decision
42+
43+
To communicate file references, we will leverage metadata objects. Metadata objects will hold a unique
44+
reference to a file, within the context of a given storage provider. To enable use cases such as encryption
45+
and integrity verification, we allow providers to attach additional metadata.
46+
47+
Since our primary means of internal communication is based on Protobuf, we will define the file metadata
48+
in this format. It will allow us to easily attach it to other Protobuf messages.
49+
50+
```protobuf linenums="1"
51+
syntax = "proto3";
52+
53+
// Metadata of a file stored by a storage provider.
54+
message FileMetadata {
55+
// Unique identifier of the file.
56+
string key = 1;
57+
58+
// Name of the storage provider that hosts the file.
59+
string storage_name = 2;
60+
61+
// Additional metadata of the storage provider,
62+
// i.e. values used for integrity verification.
63+
map<string, string> storage_metadata = 3;
64+
}
65+
```
66+
67+
The API surface will evolve around the `FileStorage` interface, which exposes methods to
68+
store, retrieve, and delete files:
69+
70+
```java linenums="1"
71+
package org.dependencytrack.storage;
72+
73+
import org.dependencytrack.plugin.api.ExtensionPoint;
74+
import org.dependencytrack.proto.storage.v1alpha1.FileMetadata;
75+
76+
import java.io.IOException;
77+
import java.util.Collection;
78+
79+
public interface FileStorage extends ExtensionPoint {
80+
81+
/**
82+
* Persist data to a file in storage.
83+
* <br/>
84+
* Storage providers may transparently perform additional steps,
85+
* such as encryption and compression.
86+
*
87+
* @param name Name of the file. This name is not guaranteed to be reflected
88+
* in storage as-is. It may be modified or changed entirely.
89+
* @param content Data to store.
90+
* @return Metadata of the stored file.
91+
* @throws IOException When storing the file failed.
92+
*/
93+
FileMetadata store(String name, byte[] content) throws IOException;
94+
95+
/**
96+
* Retrieves a file from storage.
97+
* <br/>
98+
* Storage providers may transparently perform additional steps,
99+
* such as integrity verification, decryption and decompression.
100+
* <br/>
101+
* Trying to retrieve a file from a different storage provider
102+
* is an illegal operation and yields an exception.
103+
*
104+
* @param fileMetadata Metadata of the file to retrieve.
105+
* @return The file's content.
106+
* @throws IOException When retrieving the file failed.
107+
*/
108+
byte[] get(FileMetadata fileMetadata) throws IOException;
109+
110+
/**
111+
* Deletes a file from storage.
112+
* <br/>
113+
* Trying to delete a file from a different storage provider
114+
* is an illegal operation and yields an exception.
115+
*
116+
* @param fileMetadata Metadata of the file to delete.
117+
* @return {@code true} when the file was deleted, otherwise {@code false}.
118+
* @throws IOException When deleting the file failed.
119+
*/
120+
boolean delete(FileMetadata fileMetadata) throws IOException;
121+
122+
/**
123+
* Some providers support batch deletes.
124+
*
125+
* @see #delete(FileMetadata)
126+
* @see <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html">S3 DeleteObjects API</a>
127+
*/
128+
void deleteMany(Collection<FileMetadata> fileMetadata) throws IOException;
129+
130+
}
131+
```
132+
133+
To support multiple, configurable providers, we will leverage the plugin mechanism introduced in [hyades-apiserver/#805].
134+
The mechanism, once its API is published as separate Maven artifact, will allow users to develop their own storage
135+
providers if required, without requiring changes to the Dependency-Track codebase.
136+
137+
This allows storage providers to be configured as follows:
138+
139+
```properties linenums="1"
140+
# Defines the file storage extension to use.
141+
# When not set, an enabled extension will be chosen based on its priority.
142+
# It is recommended to explicitly configure an extension for predictable behavior.
143+
#
144+
# @category: Storage
145+
# @type: enum
146+
# @valid-values: [local, memory]
147+
file.storage.default.extension=
148+
149+
# Whether the local file storage extension shall be enabled.
150+
#
151+
# @category: Storage
152+
# @type: boolean
153+
file.storage.extension.local.enabled=true
154+
155+
# Defines the local directory where files shall be stored.
156+
# Has no effect unless file.storage.extension.local.enabled is `true`.
157+
#
158+
# @category: Storage
159+
# @default: ${alpine.data.directory}/storage
160+
# @type: string
161+
file.storage.extension.local.directory=
162+
163+
# Whether the in-memory file storage extension shall be enabled.
164+
#
165+
# @category: Storage
166+
# @type: boolean
167+
file.storage.extension.memory.enabled=false
168+
```
169+
170+
Application code can interact with `FileStorage` via the `PluginManager`:
171+
172+
```java linenums="1"
173+
package org.dependencytrack.foobar;
174+
175+
import org.dependencytrack.plugin.PluginManager;
176+
import org.dependencytrack.proto.storage.v1alpha1.FileMetadata;
177+
import org.dependencytrack.storage.FileStorage;
178+
179+
class Foo {
180+
181+
void bar() {
182+
try (var fileStorage = PluginManager.getInstance().getExtension(FileStorage.class)) {
183+
FileMetadata fileMetadata = fileStorage.store("filename", "content".getBytes());
184+
185+
byte[] fileContent = fileStorage.get(fileMetadata);
186+
187+
fileStorage.delete(fileMetadata);
188+
}
189+
}
190+
191+
}
192+
```
193+
194+
## Consequences
195+
196+
* There is a non-zero chance of orphaned files remaining in storage. Crashes or service outages on either end
197+
can prevent Dependency-Track from deleting files if they're no longer needed. Some storage providers such as
198+
AWS S3 allow retention policies to be configured. This is not true for local file storage, however.
199+
As a consequence, storage providers should make an effort to make the creation timestamp of files obvious,
200+
i.e. as part of the file's name, if relying on the file system's metadata is not possible.
201+
* Storage operations are not atomic with database operations. This is an acceptable tradeoff,
202+
because it does not impact the integrity of the system. Application code is expected to gracefully
203+
deal with missing files, and perform compensating actions accordingly. Since file storage is not the
204+
primary system of record, files existing without the application knowing about them is not an issue.
205+
206+
[ADR-001]: 001-drop-kafka-dependency.md
207+
[ADR-002]: 002-workflow-orchestration.md
208+
209+
[hyades-apiserver/#805]: https://github.com/DependencyTrack/hyades-apiserver/pull/805

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ nav:
9292
- "ADR-001: Drop Kafka Dependency": architecture/decisions/001-drop-kafka-dependency.md
9393
- "ADR-002: Workflow Orchestration": architecture/decisions/002-workflow-orchestration.md
9494
- "ADR-003: Notification Publishing": architecture/decisions/003-notification-publishing.md
95+
- "ADR-004: File Storage Plugin": architecture/decisions/004-file-storage-plugin.md
9596
- Design:
9697
- Workflow State Tracking: architecture/design/workflow-state-tracking.md
9798
- Operations:

0 commit comments

Comments
 (0)