Skip to content

Conversation

zeus2611
Copy link

@zeus2611 zeus2611 commented Sep 30, 2025

What does this PR do?

This PR implements a Python translation of the existing Node.js AppwriteService for cleaning storage buckets by removing files older than a retention period. It also provides a Python version of the serverless handler function that validates environment variables, authenticates the API key from request headers, and triggers the bucket cleanup process.

Test Plan

The changes were verified by running the Python service locally with mock environment variables and headers to ensure the cleanup function executes without error and correctly deletes files as intended. Additionally, error handling was tested by providing invalid or missing API keys to confirm appropriate error responses are returned.

Related PRs and Issues

#310

Have you read the Contributing Guidelines on issues?

Yes, I have read and followed the contributing guidelines.

Summary by CodeRabbit

  • New Features
    • Python Storage Cleaner function that deletes files older than a configurable retention period from a specified bucket; supports GET invocation and cron scheduling, validates inputs, and returns success or error responses.
  • Documentation
    • Added README describing usage, configuration, environment variables, runtime, permissions, cron, and timeout.
  • Chores
    • Added Python .gitignore and introduced the Appwrite SDK runtime dependency.

Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Adds a Python storage-cleaner function under python/storage-cleaner: a Python-specific .gitignore, README.md, and requirements.txt (adds appwrite>=13.1.0,<14.0.0). New source files: src/utils.py (get_expiry_date computes UTC expiry from RETENTION_PERIOD_DAYS; throw_if_missing validates keys), src/appwrite.py (AppwriteService initializes Client from APPWRITE_FUNCTION_API_ENDPOINT, APPWRITE_FUNCTION_PROJECT_ID and provided API key; clean_bucket lists files with createdAt < expiry in batches (limit 25), deletes them, and aggregates failures), and src/main.py (entrypoint main(context) validating env vars, checking x-appwrite-key header, invoking clean_bucket, and returning HTTP responses).

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely describes the primary feature added by the pull request, namely the implementation of a Python storage cleaner integrated with Appwrite, aligning closely with the objectives and scope of the changes.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
python/storage-cleaner/src/appwrite.py (4)

8-8: Remove unused import.

The ID import is not used anywhere in this module.

-from appwrite.id import ID

17-24: Validate required environment variables.

The __init__ method reads APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID without validation. If these are missing or None, the Appwrite client may fail later during API calls with unclear errors.

Consider adding validation:

 def __init__(self, api_key: str):
+    endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT")
+    project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID")
+    
+    if not endpoint or not project_id:
+        raise ValueError("Missing required environment variables: APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID")
+    
     client = (
         Client()
-        .set_endpoint(os.getenv("APPWRITE_FUNCTION_API_ENDPOINT"))
-        .set_project(os.getenv("APPWRITE_FUNCTION_PROJECT_ID"))
+        .set_endpoint(endpoint)
+        .set_project(project_id)
         .set_key(api_key)
     )
     self.storage = Storage(client)

43-43: Use defensive dictionary access.

Directly accessing f["$id"] will raise a KeyError if the API response format changes or is malformed. Consider using .get() with a fallback or explicit validation.

-            self.storage.delete_file(bucket_id, f["$id"])
+            file_id = f.get("$id")
+            if file_id:
+                self.storage.delete_file(bucket_id, file_id)

45-46: Consider more Pythonic empty check.

Using not files is more idiomatic in Python than len(files) == 0.

-        if len(files) == 0:
+        if not files:
             break
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79a7bcc and d786c8c.

📒 Files selected for processing (6)
  • python/storage-cleaner/.gitignore (1 hunks)
  • python/storage-cleaner/README.md (1 hunks)
  • python/storage-cleaner/requirements.txt (1 hunks)
  • python/storage-cleaner/src/appwrite.py (1 hunks)
  • python/storage-cleaner/src/main.py (1 hunks)
  • python/storage-cleaner/src/utils.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
  • get_expiry_date (9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
  • AppwriteService (13-46)
  • clean_bucket (26-46)
python/storage-cleaner/src/utils.py (1)
  • throw_if_missing (26-40)
🪛 markdownlint-cli2 (0.18.1)
python/storage-cleaner/README.md

11-11: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.13.1)
python/storage-cleaner/src/utils.py

40-40: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
python/storage-cleaner/README.md (1)

24-24: Verify timeout sufficiency for large buckets.

The 15-second timeout may be insufficient when cleaning buckets with thousands of old files. Each batch fetches 25 files, and with network latency for delete operations, larger buckets could exceed this limit and cause incomplete cleanup.

Consider testing with a representative bucket size, or document the expected maximum bucket size this function can handle within 15 seconds. You may need to increase the timeout or implement cursor-based resumption for very large buckets.

python/storage-cleaner/src/main.py (1)

7-10: Environment validation logic approved.

The use of throw_if_missing correctly validates required environment variables before proceeding with bucket operations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
python/storage-cleaner/src/main.py (1)

9-14: Fix header access to prevent KeyError.

Line 9 uses direct dictionary access context.req.headers["x-appwrite-key"], which will raise a KeyError if the header is missing—before the check on Line 11 can execute. This exposes internal errors to clients.

Apply this diff:

-    api_key = context.req.headers["x-appwrite-key"]
+    api_key = context.req.headers.get("x-appwrite-key")
 
     if not api_key:
🧹 Nitpick comments (1)
python/storage-cleaner/src/appwrite.py (1)

17-24: Consider validating required environment variables.

The __init__ method reads APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID from the environment without validation. If these are missing, the Appwrite SDK client will likely fail with an unclear error during subsequent API calls.

You can either:

  1. Let the SDK fail naturally (current approach—acceptable if calling code validates upstream).
  2. Add explicit validation here for clearer error messages:
def __init__(self, api_key: str):
    endpoint = os.getenv("APPWRITE_FUNCTION_API_ENDPOINT")
    project_id = os.getenv("APPWRITE_FUNCTION_PROJECT_ID")
    
    if not endpoint or not project_id:
        raise ValueError(
            "Missing required environment variables: "
            "APPWRITE_FUNCTION_API_ENDPOINT, APPWRITE_FUNCTION_PROJECT_ID"
        )
    
    client = (
        Client()
        .set_endpoint(endpoint)
        .set_project(project_id)
        .set_key(api_key)
    )
    self.storage = Storage(client)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d786c8c and accc817.

📒 Files selected for processing (4)
  • python/storage-cleaner/requirements.txt (1 hunks)
  • python/storage-cleaner/src/appwrite.py (1 hunks)
  • python/storage-cleaner/src/main.py (1 hunks)
  • python/storage-cleaner/src/utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-30T10:09:36.561Z
Learnt from: zeus2611
PR: appwrite/templates#338
File: python/storage-cleaner/src/utils.py:26-40
Timestamp: 2025-09-30T10:09:36.561Z
Learning: In the Python storage-cleaner template at python/storage-cleaner/src/utils.py, the throw_if_missing function intentionally treats 0 as an invalid value to prevent a retention period of 0, which would cause files to be deleted immediately and risk accidental data loss.

Applied to files:

  • python/storage-cleaner/src/utils.py
🧬 Code graph analysis (2)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
  • get_expiry_date (9-23)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
  • AppwriteService (12-65)
  • clean_bucket (26-65)
python/storage-cleaner/src/utils.py (1)
  • throw_if_missing (26-41)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/appwrite.py

45-47: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Use explicit conversion flag

Replace with conversion flag

(RUF010)


56-56: Do not catch blind exception: Exception

(BLE001)


63-65: Avoid specifying long messages outside the exception class

(TRY003)

python/storage-cleaner/src/main.py

23-23: Do not catch blind exception: Exception

(BLE001)

python/storage-cleaner/src/utils.py

41-41: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
python/storage-cleaner/requirements.txt (1)

1-1: LGTM!

The pinned version constraint ensures reproducible builds and prevents breaking changes from future major releases.

python/storage-cleaner/src/main.py (1)

18-25: LGTM!

The exception handling properly wraps bucket operations, distinguishing between validation errors (400) and unexpected failures (500) while logging details for debugging.

python/storage-cleaner/src/utils.py (2)

9-23: LGTM!

The function uses timezone-aware datetime (Python 3.12+ compatible), handles invalid environment values gracefully, and returns a properly formatted ISO 8601 timestamp.


26-41: LGTM!

The validation logic correctly treats 0 as invalid to prevent immediate deletion and data loss. The inline comment on Line 36 documents this business rule for maintainers.

python/storage-cleaner/src/appwrite.py (1)

41-65: LGTM!

The error handling properly wraps API calls, tracks successes and failures, and provides detailed diagnostics when deletions fail. The loop correctly exits when no more files are returned.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
python/storage-cleaner/src/main.py (1)

20-20: Consider a more informative success message.

The message "Buckets cleaned" is plural but only one bucket is cleaned. Consider making it singular or adding details about the operation.

-        return context.res.text("Buckets cleaned", 200)
+        return context.res.text("Bucket cleaned successfully", 200)

Alternatively, if you want to return structured data:

-        return context.res.text("Buckets cleaned", 200)
+        return context.res.json({"message": "Bucket cleaned successfully", "bucket_id": os.environ["APPWRITE_BUCKET_ID"]}, 200)
python/storage-cleaner/src/appwrite.py (1)

50-58: File deletion loop handles errors well.

The loop safely extracts file IDs, wraps deletions in error handling, and tracks both successes and failures. The use of the file_id variable in the exception handler correctly avoids the KeyError issue flagged in previous reviews.

Optional improvement: Files without a $id field are silently skipped (line 53's conditional). Consider logging these cases for visibility:

             try:
                 file_id = f.get("$id")
                 if file_id:
                     self.storage.delete_file(bucket_id, file_id)
                     deleted_files_count += 1
+                else:
+                    print(f"Warning: File object missing $id in bucket {bucket_id}")
             except Exception as e:
                 failed_files.append({"id": file_id, "error": str(e)})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between accc817 and 0d103d6.

📒 Files selected for processing (2)
  • python/storage-cleaner/src/appwrite.py (1 hunks)
  • python/storage-cleaner/src/main.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
python/storage-cleaner/src/main.py (2)
python/storage-cleaner/src/appwrite.py (2)
  • AppwriteService (12-65)
  • clean_bucket (26-65)
python/storage-cleaner/src/utils.py (1)
  • throw_if_missing (26-41)
python/storage-cleaner/src/appwrite.py (1)
python/storage-cleaner/src/utils.py (1)
  • get_expiry_date (9-23)
🪛 Ruff (0.13.1)
python/storage-cleaner/src/main.py

23-23: Do not catch blind exception: Exception

(BLE001)

python/storage-cleaner/src/appwrite.py

45-47: Avoid specifying long messages outside the exception class

(TRY003)


46-46: Use explicit conversion flag

Replace with conversion flag

(RUF010)


56-56: Do not catch blind exception: Exception

(BLE001)


63-65: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (8)
python/storage-cleaner/src/main.py (3)

1-3: LGTM! Clean and minimal imports.

The imports are appropriate for the serverless function's requirements.


9-14: Excellent API key validation!

The safe header access using .get() combined with the empty-value check properly prevents KeyError exceptions and provides clear error messages.


18-25: Exception handling properly addresses previous concerns.

The try-except structure correctly catches and handles errors from the bucket cleaning operation, logs details internally, and returns sanitized error messages. The broad Exception catch is appropriate in this serverless context to prevent unhandled exceptions from crashing the function.

Note: The static analysis warning about catching Exception is a false positive here - broad exception handling is appropriate for serverless entry points to ensure graceful degradation.

python/storage-cleaner/src/appwrite.py (5)

1-9: LGTM! Well-organized imports and documentation.

The module docstring and imports are clean and appropriate for the storage cleanup service.


17-24: Client initialization looks correct.

The fluent API usage for configuring the Appwrite client is clean and idiomatic. Ensure that the required environment variables (APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID) are validated at the entry point (see comment in main.py).


26-39: Good query setup and pagination strategy.

The batch size of 25 is reasonable for iterative deletion, and the counter initialization properly tracks success and failure states.


41-48: Robust error handling for list operation.

The try-except block properly catches API failures, preserves the exception chain with from e, and provides clear context. The broad Exception catch is appropriate here since the Appwrite SDK may raise various exception types.

Note: Static analysis warnings (TRY003, RUF010) are pedantic in this context - the current implementation is clear and maintainable.


59-65: Appropriate error handling for partial failures.

The loop termination logic is correct, and raising a RuntimeError for partial failures ensures callers are aware of issues. The detailed error message (including counts and failed file details) is valuable for debugging, despite the static analysis warning.

Note: The current behavior raises an exception even if some files were successfully deleted. This is a conservative approach that ensures operators are notified of any failures. If you prefer to log warnings instead and only raise on total failure, consider adjusting the logic - but the current behavior is safer.

{"error": "Missing API key in x-appwrite-key header"}, 401
)

appwrite = AppwriteService(api_key)
Copy link

@coderabbitai coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate all required environment variables.

The AppwriteService constructor depends on APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID from the environment, but these are not validated in the throw_if_missing call on line 7. If either is missing, the Appwrite client initialization will fail with unclear errors.

Add the missing variables to the validation list:

-    throw_if_missing(os.environ, ["RETENTION_PERIOD_DAYS", "APPWRITE_BUCKET_ID"])
+    throw_if_missing(os.environ, [
+        "RETENTION_PERIOD_DAYS",
+        "APPWRITE_BUCKET_ID",
+        "APPWRITE_FUNCTION_API_ENDPOINT",
+        "APPWRITE_FUNCTION_PROJECT_ID"
+    ])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
appwrite = AppwriteService(api_key)
# Validate required environment variables before anything else
throw_if_missing(os.environ, [
"RETENTION_PERIOD_DAYS",
"APPWRITE_BUCKET_ID",
"APPWRITE_FUNCTION_API_ENDPOINT",
"APPWRITE_FUNCTION_PROJECT_ID"
])
🤖 Prompt for AI Agents
In python/storage-cleaner/src/main.py around line 16, the AppwriteService is
instantiated but the required environment variables
APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID are not
validated; update the throw_if_missing call (around line 7) to include
"APPWRITE_FUNCTION_API_ENDPOINT" and "APPWRITE_FUNCTION_PROJECT_ID" so missing
values are caught early and a clear error is raised before creating the
AppwriteService.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Meldiron In the Node.js implementation of the storage cleaner, we only validate the RETENTION_PERIOD_DAYS and APPWRITE_BUCKET_ID environment variables. My understanding is that APPWRITE_FUNCTION_API_ENDPOINT and APPWRITE_FUNCTION_PROJECT_ID might be automatically populated at build or runtime. Could you please clarify if these two variables need explicit validation here, or if they are always guaranteed to be set by the environment?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant