Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(keyv-mysql): use internal scheduler to delete expired keys #1291

Merged
merged 14 commits into from
Feb 1, 2025

Conversation

johaven
Copy link
Contributor

@johaven johaven commented Jan 26, 2025

Please check if the PR fulfills these requirements

  • Followed the Contributing guidelines.
  • Docs have been added / updated (for bug fixes / features)

** Proposal **
Currently a draft but works on all MySQL servers since version 5.1 (2008).

This uses the event_scheduler function (https://dev.mysql.com/doc/refman/8.4/en/create-event.html)

Activating ttlSupport becomes optional with my proposal, it might be necessary to be able to customize the interval but I find it an interesting alternative.

It's up to you to tell me if you want to validate the idea and we will finalize this PR

@jaredwray
Copy link
Owner

thanks. I would like to move forward with this but I am still reading up on events. Are there performance issues with events?

@jaredwray
Copy link
Owner

@johaven I am wondering if we will want to all them to set the interval value?

@johaven
Copy link
Contributor Author

johaven commented Jan 27, 2025

@jaredwray Scheduled events in MySQL are similar to stored procedures. So the performance is related to the queries that are executed. In this case, the query to remove expired keys is pretty simple, it shouldn't have any performance issues. That being said, I haven't tested on millions of keys. If the cache is a critical part of an application stack and it contains millions of entries I don't think you would choose MySQL in this case (although I think it would do the job well).

On my side I use it for a cache on a medium sized application, it works very well, the scheduler runs every 5 minutes. The keys that are expired are filtered during get/mget/ops and cleaned later.

You can check the state of the event with SHOW EVENTS in sql console, if you want to test it.

@johaven
Copy link
Contributor Author

johaven commented Jan 27, 2025

@johaven I am wondering if we will want to all them to set the interval value?

I ask myself the same question in this context, it is complicated to apply an interval that suits all needs, depending on the applications and the developers... That's why I thought we could leave the possibility of defining it.

@jaredwray
Copy link
Owner

I think instead of doing useInternalScheduler that we should have a property that is intervalExpiration where by default it is undefined. Then, if they set it to a number it enables it.

We should also make sure that we can set it and overwrite the task if they change or update the settings.

@johaven
Copy link
Contributor Author

johaven commented Jan 27, 2025

I think instead of doing useInternalScheduler that we should have a property that is intervalExpiration where by default it is undefined. Then, if they set it to a number it enables it.

Good idea.

We should also make sure that we can set it and overwrite the task if they change or update the settings.

This is why i do DROP EVENT IF EXISTS ... at startup, the task is removed and created with the updated settings.

@johaven
Copy link
Contributor Author

johaven commented Jan 27, 2025

It would also be more optimized to filter queries to handle the case where the record has not yet been purged by the scheduler.

async get<Value>(key: string) {
  const sql = `SELECT * FROM ${this.opts.table!} WHERE id = ? AND JSON_EXTRACT( value, '$.expiration' ) < UNIX_TIMESTAMP();`

@johaven
Copy link
Contributor Author

johaven commented Jan 27, 2025

A note regarding the keysize parameter, it must be limited to 768, otherwise it will cause errors when creating the table.

Most recent databases have a utf8mb4 charset which limits indexing (primary key). If you want to no longer depend on this limit, you must use a hash system.

Capture d’écran 2025-01-28 à 00 03 48

@jaredwray
Copy link
Owner

It would also be more optimized to filter queries to handle the case where the record has not yet been purged by the scheduler.

async get<Value>(key: string) {
  const sql = `SELECT * FROM ${this.opts.table!} WHERE id = ? AND JSON_EXTRACT( value, '$.expiration' ) < UNIX_TIMESTAMP();`

This would be useful if the expireInterval is turned on as otherwise it would grow without control

@jaredwray
Copy link
Owner

A note regarding the keysize parameter, it must be limited to 768, otherwise it will cause errors when creating the table.

Most recent databases have a utf8mb4 charset which limits indexing (primary key). If you want to no longer depend on this limit, you must use a hash system.

Capture d’écran 2025-01-28 à 00 03 48

I haven’t seen this issue come up. For right now lets not add this or do it as another PR after this change

@jaredwray
Copy link
Owner

@johaven - I added in the unit test and it looks like the event is not firing. Do you know why?

@johaven
Copy link
Contributor Author

johaven commented Feb 1, 2025

@jaredwray I found a problem regarding the use of the mysql scheduler.

SET GLOBAL event_scheduler = ON;

This command may fail.
MySQL seems to have it enabled by default but MariaDB does not.

There are 3 ways to make it work if it is not enabled by default :

  • be connected as root on mysql server or
  • must have 'SUPER' privilege or
  • event_scheduler must be set to ON in my.cnf

@johaven
Copy link
Contributor Author

johaven commented Feb 1, 2025

@johaven - I added in the unit test and it looks like the event is not firing. Do you know why?

I will try to test this week but on my side it works well (for other needs)
Have you tried the 'show events' command to see the task start date ?

edit: I forgot to cast the value column, this might solve the problem

@johaven
Copy link
Contributor Author

johaven commented Feb 1, 2025

That's strange, I thought the key values ​​were all serialized with an expires and value field. On my side I use cache-manager, that might be the reason.

If we don't store the expiration, the scheduler can't do the job, I'll see about creating a timestamp field.
How do you purge values ​​currently? Never?

edit: i created the expires column to do the job, tests are ok

@johaven
Copy link
Contributor Author

johaven commented Feb 1, 2025

There are still things to do:

  • if the SET GLOBAL event_scheduler = ON; command fails, we need to fall back to ttlSupport = false
  • documentation on 3 ways to enable the event scheduler if the server does not allow it with the user logged into the MySQL database.

@jaredwray
Copy link
Owner

jaredwray commented Feb 1, 2025

@johaven - Yeah this unit test still fails and I made sure that it is passing the expires and wraps it.

test.it('set intervalExpiration to 1 second', async t => {
	const keyvMySql = new KeyvMysql({uri, intervalExpiration: 1});
	const keyv = new Keyv({store: keyvMySql});
	await keyv.set('foo-interval1', 'bar-interval1');
	const value1 = await keyv.get('foo-interval1');
	t.expect(keyvMySql.ttlSupport).toBe(true);
	t.expect(value1).toBe('bar-interval1');
	await delay(1100);
	const value2 = await keyv.get('foo-interval1');
	t.expect(value2).toBeUndefined();
});

@johaven
Copy link
Contributor Author

johaven commented Feb 1, 2025

Capture d’écran 2025-02-01 à 23 41 27

Copy link

codecov bot commented Feb 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (75741ca) to head (38833ec).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1291   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            9         9           
  Lines           43        43           
=========================================
  Hits            43        43           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jaredwray jaredwray merged commit 4c90bf0 into jaredwray:main Feb 1, 2025
7 checks passed
@johaven
Copy link
Contributor Author

johaven commented Feb 1, 2025

@jaredwray There were still a few things to review, isn't it too early to merge the feature ?

@jaredwray
Copy link
Owner

@johaven - i thought the last was documentation. Anything else?

@johaven
Copy link
Contributor Author

johaven commented Feb 2, 2025

There are still things to do:

  • if the SET GLOBAL event_scheduler = ON; command fails, we need to fall back to ttlSupport = false

@jaredwray
Copy link
Owner

@johaven - ok. I wont be pushing a new build. Do you want me to do a pull request with these changes or are you working on it?

@johaven
Copy link
Contributor Author

johaven commented Feb 3, 2025

@jaredwray This is not a big change, you should catch the error when you run the above command. For the error to be generated, you need to connect to the database with a user that does not have the "SUPER" privilege and change the ttlSupport to false (for this rollback, I don't know if it will be taken into account if it is changed at the time of connection, you must know better than me :) ).

I don't have much time this week, I can work on it this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants