-
Notifications
You must be signed in to change notification settings - Fork 118
feat: implement Python storage cleaner with Appwrite integration #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: implement Python storage cleaner with Appwrite integration #338
Conversation
WalkthroughAdds a Python storage-cleaner function under python/storage-cleaner: a Python-specific Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (4)
python/storage-cleaner/src/appwrite.py (4)
8-8
: Remove unused import.The
ID
import is not used anywhere in this module.-from appwrite.id import ID
17-24
: Validate required environment variables.The
__init__
method readsAPPWRITE_FUNCTION_API_ENDPOINT
andAPPWRITE_FUNCTION_PROJECT_ID
without validation. If these are missing or None, the Appwrite client may fail later during API calls with unclear errors.Consider adding validation:
def __init__(self, api_key: str): + endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT") + project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID") + + if not endpoint or not project_id: + raise ValueError("Missing required environment variables: APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID") + client = ( Client() - .set_endpoint(os.getenv("APPWRITE_FUNCTION_API_ENDPOINT")) - .set_project(os.getenv("APPWRITE_FUNCTION_PROJECT_ID")) + .set_endpoint(endpoint) + .set_project(project_id) .set_key(api_key) ) self.storage = Storage(client)
43-43
: Use defensive dictionary access.Directly accessing
f["$id"]
will raise aKeyError
if the API response format changes or is malformed. Consider using.get()
with a fallback or explicit validation.- self.storage.delete_file(bucket_id, f["$id"]) + file_id = f.get("$id") + if file_id: + self.storage.delete_file(bucket_id, file_id)
45-46
: Consider more Pythonic empty check.Using
not files
is more idiomatic in Python thanlen(files) == 0
.- if len(files) == 0: + if not files: break
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
python/storage-cleaner/.gitignore
(1 hunks)python/storage-cleaner/README.md
(1 hunks)python/storage-cleaner/requirements.txt
(1 hunks)python/storage-cleaner/src/appwrite.py
(1 hunks)python/storage-cleaner/src/main.py
(1 hunks)python/storage-cleaner/src/utils.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
get_expiry_date
(9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
AppwriteService
(13-46)clean_bucket
(26-46)python/storage-cleaner/src/utils.py (1)
throw_if_missing
(26-40)
🪛 markdownlint-cli2 (0.18.1)
python/storage-cleaner/README.md
11-11: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/utils.py
40-40: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
python/storage-cleaner/README.md (1)
24-24
: Verify timeout sufficiency for large buckets.The 15-second timeout may be insufficient when cleaning buckets with thousands of old files. Each batch fetches 25 files, and with network latency for delete operations, larger buckets could exceed this limit and cause incomplete cleanup.
Consider testing with a representative bucket size, or document the expected maximum bucket size this function can handle within 15 seconds. You may need to increase the timeout or implement cursor-based resumption for very large buckets.
python/storage-cleaner/src/main.py (1)
7-10
: Environment validation logic approved.The use of
throw_if_missing
correctly validates required environment variables before proceeding with bucket operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
python/storage-cleaner/src/main.py (1)
9-14
: Fix header access to prevent KeyError.Line 9 uses direct dictionary access
context.req.headers["x-appwrite-key"]
, which will raise aKeyError
if the header is missing—before the check on Line 11 can execute. This exposes internal errors to clients.Apply this diff:
- api_key = context.req.headers["x-appwrite-key"] + api_key = context.req.headers.get("x-appwrite-key") if not api_key:
🧹 Nitpick comments (1)
python/storage-cleaner/src/appwrite.py (1)
17-24
: Consider validating required environment variables.The
__init__
method readsAPPWRITE_FUNCTION_API_ENDPOINT
andAPPWRITE_FUNCTION_PROJECT_ID
from the environment without validation. If these are missing, the Appwrite SDK client will likely fail with an unclear error during subsequent API calls.You can either:
- Let the SDK fail naturally (current approach—acceptable if calling code validates upstream).
- Add explicit validation here for clearer error messages:
def __init__(self, api_key: str): endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT") project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID") if not endpoint or not project_id: raise ValueError( "Missing required environment variables: " "APPWRITE_FUNCTION_API_ENDPOINT, APPWRITE_FUNCTION_PROJECT_ID" ) client = ( Client() .set_endpoint(endpoint) .set_project(project_id) .set_key(api_key) ) self.storage = Storage(client)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
python/storage-cleaner/requirements.txt
(1 hunks)python/storage-cleaner/src/appwrite.py
(1 hunks)python/storage-cleaner/src/main.py
(1 hunks)python/storage-cleaner/src/utils.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-30T10:09:36.561Z
Learnt from: zeus2611
PR: appwrite/templates#338
File: python/storage-cleaner/src/utils.py:26-40
Timestamp: 2025-09-30T10:09:36.561Z
Learning: In the Python storage-cleaner template at python/storage-cleaner/src/utils.py, the throw_if_missing function intentionally treats 0 as an invalid value to prevent a retention period of 0, which would cause files to be deleted immediately and risk accidental data loss.
Applied to files:
python/storage-cleaner/src/utils.py
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
get_expiry_date
(9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
AppwriteService
(12-65)clean_bucket
(26-65)python/storage-cleaner/src/utils.py (1)
throw_if_missing
(26-41)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/appwrite.py
45-47: Avoid specifying long messages outside the exception class
(TRY003)
46-46: Use explicit conversion flag
Replace with conversion flag
(RUF010)
56-56: Do not catch blind exception: Exception
(BLE001)
63-65: Avoid specifying long messages outside the exception class
(TRY003)
python/storage-cleaner/src/main.py
23-23: Do not catch blind exception: Exception
(BLE001)
python/storage-cleaner/src/utils.py
41-41: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (5)
python/storage-cleaner/requirements.txt (1)
1-1
: LGTM!The pinned version constraint ensures reproducible builds and prevents breaking changes from future major releases.
python/storage-cleaner/src/main.py (1)
18-25
: LGTM!The exception handling properly wraps bucket operations, distinguishing between validation errors (400) and unexpected failures (500) while logging details for debugging.
python/storage-cleaner/src/utils.py (2)
9-23
: LGTM!The function uses timezone-aware datetime (Python 3.12+ compatible), handles invalid environment values gracefully, and returns a properly formatted ISO 8601 timestamp.
26-41
: LGTM!The validation logic correctly treats
0
as invalid to prevent immediate deletion and data loss. The inline comment on Line 36 documents this business rule for maintainers.python/storage-cleaner/src/appwrite.py (1)
41-65
: LGTM!The error handling properly wraps API calls, tracks successes and failures, and provides detailed diagnostics when deletions fail. The loop correctly exits when no more files are returned.
…ing API key gracefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
python/storage-cleaner/src/main.py (1)
20-20
: Consider a more informative success message.The message "Buckets cleaned" is plural but only one bucket is cleaned. Consider making it singular or adding details about the operation.
- return context.res.text("Buckets cleaned", 200) + return context.res.text("Bucket cleaned successfully", 200)Alternatively, if you want to return structured data:
- return context.res.text("Buckets cleaned", 200) + return context.res.json({"message": "Bucket cleaned successfully", "bucket_id": os.environ["APPWRITE_BUCKET_ID"]}, 200)python/storage-cleaner/src/appwrite.py (1)
50-58
: File deletion loop handles errors well.The loop safely extracts file IDs, wraps deletions in error handling, and tracks both successes and failures. The use of the
file_id
variable in the exception handler correctly avoids the KeyError issue flagged in previous reviews.Optional improvement: Files without a
$id
field are silently skipped (line 53's conditional). Consider logging these cases for visibility:try: file_id = f.get("$id") if file_id: self.storage.delete_file(bucket_id, file_id) deleted_files_count += 1 + else: + print(f"Warning: File object missing $id in bucket {bucket_id}") except Exception as e: failed_files.append({"id": file_id, "error": str(e)})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
python/storage-cleaner/src/appwrite.py
(1 hunks)python/storage-cleaner/src/main.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
AppwriteService
(12-65)clean_bucket
(26-65)python/storage-cleaner/src/utils.py (1)
throw_if_missing
(26-41)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
get_expiry_date
(9-23)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/main.py
23-23: Do not catch blind exception: Exception
(BLE001)
python/storage-cleaner/src/appwrite.py
45-47: Avoid specifying long messages outside the exception class
(TRY003)
46-46: Use explicit conversion flag
Replace with conversion flag
(RUF010)
56-56: Do not catch blind exception: Exception
(BLE001)
63-65: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (8)
python/storage-cleaner/src/main.py (3)
1-3
: LGTM! Clean and minimal imports.The imports are appropriate for the serverless function's requirements.
9-14
: Excellent API key validation!The safe header access using
.get()
combined with the empty-value check properly prevents KeyError exceptions and provides clear error messages.
18-25
: Exception handling properly addresses previous concerns.The try-except structure correctly catches and handles errors from the bucket cleaning operation, logs details internally, and returns sanitized error messages. The broad
Exception
catch is appropriate in this serverless context to prevent unhandled exceptions from crashing the function.Note: The static analysis warning about catching
Exception
is a false positive here - broad exception handling is appropriate for serverless entry points to ensure graceful degradation.python/storage-cleaner/src/appwrite.py (5)
1-9
: LGTM! Well-organized imports and documentation.The module docstring and imports are clean and appropriate for the storage cleanup service.
17-24
: Client initialization looks correct.The fluent API usage for configuring the Appwrite client is clean and idiomatic. Ensure that the required environment variables (
APPWRITE_FUNCTION_API_ENDPOINT
andAPPWRITE_FUNCTION_PROJECT_ID
) are validated at the entry point (see comment in main.py).
26-39
: Good query setup and pagination strategy.The batch size of 25 is reasonable for iterative deletion, and the counter initialization properly tracks success and failure states.
41-48
: Robust error handling for list operation.The try-except block properly catches API failures, preserves the exception chain with
from e
, and provides clear context. The broadException
catch is appropriate here since the Appwrite SDK may raise various exception types.Note: Static analysis warnings (TRY003, RUF010) are pedantic in this context - the current implementation is clear and maintainable.
59-65
: Appropriate error handling for partial failures.The loop termination logic is correct, and raising a
RuntimeError
for partial failures ensures callers are aware of issues. The detailed error message (including counts and failed file details) is valuable for debugging, despite the static analysis warning.Note: The current behavior raises an exception even if some files were successfully deleted. This is a conservative approach that ensures operators are notified of any failures. If you prefer to log warnings instead and only raise on total failure, consider adjusting the logic - but the current behavior is safer.
{"error": "Missing API key in x-appwrite-key header"}, 401 | ||
) | ||
|
||
appwrite = AppwriteService(api_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate all required environment variables.
The AppwriteService
constructor depends on APPWRITE_FUNCTION_API_ENDPOINT
and APPWRITE_FUNCTION_PROJECT_ID
from the environment, but these are not validated in the throw_if_missing
call on line 7. If either is missing, the Appwrite client initialization will fail with unclear errors.
Add the missing variables to the validation list:
- throw_if_missing(os.environ, ["RETENTION_PERIOD_DAYS", "APPWRITE_BUCKET_ID"])
+ throw_if_missing(os.environ, [
+ "RETENTION_PERIOD_DAYS",
+ "APPWRITE_BUCKET_ID",
+ "APPWRITE_FUNCTION_API_ENDPOINT",
+ "APPWRITE_FUNCTION_PROJECT_ID"
+ ])
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
appwrite = AppwriteService(api_key) | |
# Validate required environment variables before anything else | |
throw_if_missing(os.environ, [ | |
"RETENTION_PERIOD_DAYS", | |
"APPWRITE_BUCKET_ID", | |
"APPWRITE_FUNCTION_API_ENDPOINT", | |
"APPWRITE_FUNCTION_PROJECT_ID" | |
]) |
🤖 Prompt for AI Agents
In python/storage-cleaner/src/main.py around line 16, the AppwriteService is
instantiated but the required environment variables
APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID are not
validated; update the throw_if_missing call (around line 7) to include
"APPWRITE_FUNCTION_API_ENDPOINT" and "APPWRITE_FUNCTION_PROJECT_ID" so missing
values are caught early and a clear error is raised before creating the
AppwriteService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Meldiron In the Node.js implementation of the storage cleaner, we only validate the RETENTION_PERIOD_DAYS and APPWRITE_BUCKET_ID environment variables. My understanding is that APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID might be automatically populated at build or runtime. Could you please clarify if these two variables need explicit validation here, or if they are always guaranteed to be set by the environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
What does this PR do?
This PR implements a Python translation of the existing Node.js AppwriteService for cleaning storage buckets by removing files older than a retention period. It also provides a Python version of the serverless handler function that validates environment variables, authenticates the API key from request headers, and triggers the bucket cleanup process.
Test Plan
The changes were verified by running the Python service locally with mock environment variables and headers to ensure the cleanup function executes without error and correctly deletes files as intended. Additionally, error handling was tested by providing invalid or missing API keys to confirm appropriate error responses are returned.
Related PRs and Issues
#310
Have you read the Contributing Guidelines on issues?
Yes, I have read and followed the contributing guidelines.
Summary by CodeRabbit