-
Notifications
You must be signed in to change notification settings - Fork 6
external-resource-rds-logs actions #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c9577b3
to
584bd9b
Compare
584bd9b
to
d4dae76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for this PR, I really love the implementation and the refactoring of the celery tasks. I would somehow limit the amount of logs downloaded by default, as they can be a lot.
But I don't understand very well why we need this or who has asked for this when we can (and we actually do) send RDS logs to Cloudwatch where it is quite convenient to analyze them and where we can have more retention... Maybe there's a use case I'm missing, though...
packages/automated_actions/automated_actions/celery/external_resource/tasks.py
Show resolved
Hide resolved
def list_rds_logs(self, identifier: str) -> list[str]: | ||
"""Lists the log files for a specified RDS instance.""" | ||
response = self.rds_client.describe_db_log_files( | ||
DBInstanceIdentifier=identifier | ||
) | ||
return [ | ||
log["LogFileName"] | ||
for log in response["DescribeDBLogFiles"] | ||
if log["LogFileName"] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many logs this gets? All of them? That can be a non-trivial amount of data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if the response is paginated, this will only get first page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 413d05b
* **Description**: Retrieves logs from an Amazon RDS instance and stores them in an S3 bucket. | ||
* **Use Case**: Typically used for troubleshooting database issues, analyzing performance problems, or collecting logs for audit purposes. | ||
* **Required Parameters**: The AWS account name and the RDS instance identifier. | ||
* **Optional Parameters**: Expiration time in days (1-7, default: 7), S3 target file name (defaults to '{account}-{identifier}.zip'). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clarify that this is the expiration date of the url you are going to get as result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use s3 to store large content is a pattern we need for complex tasks, but for this rds log download, there can be several problems:
- cost: some logs can be very large, download them and upload to s3 can introduce large traffic and storage cost, especially when people want to recheck logs, duplicate the process multiple times.
- security: rds log can contain PII data, store it in S3 means s3 bucket need to have higher data security level, and compliance for cross account/region.
it would be easier if we create iam policies like AmazonRDSReadOnlyAccess but limit the resource to exact match instance, let people directly view logs in AWS console.
def list_rds_logs(self, identifier: str) -> list[str]: | ||
"""Lists the log files for a specified RDS instance.""" | ||
response = self.rds_client.describe_db_log_files( | ||
DBInstanceIdentifier=identifier | ||
) | ||
return [ | ||
log["LogFileName"] | ||
for log in response["DescribeDBLogFiles"] | ||
if log["LogFileName"] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if the response is paginated, this will only get first page
def generate_s3_download_url( | ||
self, bucket: str, s3_key: str, expiration_secs: int = 3600 | ||
) -> str: | ||
"""Generate a pre-signed URL for downloading an object from S3.""" | ||
return self.s3_client.generate_presigned_url( | ||
"get_object", | ||
Params={"Bucket": bucket, "Key": s3_key}, | ||
ExpiresIn=expiration_secs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have lifecycle policy set to actually delete the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lifecycle policy is on the buckets
I don't expect such high usage of this action that the costs will become a factor.
The S3 bucket is read and write for the bucket owner only. I don't see any risk here.
This is still an option for our tenants, and no one prevents them from using it. This action will be just in addition. |
/retest |
916fa9f
to
413d05b
Compare
Together with an upcoming |
Summary
This PR implements a new automated action
external-resource-rds-logs
that enables users to retrieve and export RDS database logs to S3 storage. The action fetches logs from specified RDS instances and packages them into zip files for easy download and analysis.Ticket: APPSRE-12231
Technical Implementation
zipstream
to create a streaming zip file without having everything in memoryUsage
Testing
Additionally
This PR splits the Celery
external_resources.tasks
into separate ones to maintain readability.Dependencies