Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named tasks can not be recreated after they complete #164

Open
shaunco opened this issue Nov 21, 2021 · 4 comments
Open

Named tasks can not be recreated after they complete #164

shaunco opened this issue Nov 21, 2021 · 4 comments

Comments

@shaunco
Copy link
Contributor

shaunco commented Nov 21, 2021

I had left this comment in #106, but figured it was worth asking more directly...

I find it very strange that named tasks will always Exists once added to a queue when it seems like they should stop existing once successfully executed... but Storage only has a single Exists method, which means Consumer has no way to remove a successfully executed task from storage.

RedisStorage holds these for 24hr and LocalStorage will hold them until 128,000 additional tasks have been queued.

An example:

  1. Create a task and give it a name like spider_www_google_com
  2. Attempt to add a second task with the name spider_www_google_com - taskq rightfully returns taskq.ErrDuplicate
  3. spider_www_google_com task completes successfully and is consumer.Put is called which calls consumer.delete, which calls Queue.Delete ... this calls Redis.XDel on the ID for Redis and scheduler.Remove for memqueue
  4. Attempt to add a new task with the name spider_www_google_com - taskq returns taskq.ErrDuplicate

I would expect step 4 to succeed, but it fails because RedisStorage.Exists calls SetNX on the task's name whereas the call to XDel is on the task's ID ... which leaves the name lock lingering for 24hr. Similarly, LocalStorage.Exists puts the name in a LRU cache (of size 128,000) whereas scheduler.Remove is a map based with the task's pointer as its key.

The easy fix is to extend the Storage interface to have a Delete method that can be called when a task is completed successfully... however, for Redis it is probably best to simply pass both the task's ID and name to XDel to avoid needing a second round-trip to Redis and then leaving RedisStorage.Delete to do nothing.

Thoughts? Am I missing something here?

@vmihailenco
Copy link
Owner

The idea is to be able to execute certain tasks once a day / period. So we chose a task name like spider_www_google_com_20210101 and use Redis to deduplicate tasks by name. Looks like your expectations / requirements are a bit different.

@shaunco
Copy link
Contributor Author

shaunco commented Nov 26, 2021

Now that you say it, I realize I completely overlooked https://github.com/vmihailenco/taskq#message-deduplication and how OnceInPeriod, OnceWithDelay, and OnceWithSchedule craft message names around that behavior.

What are your thoughts on something like:

task.WithArgs(ctx, "my input", 42).OnceUntilComplete()

in order to trigger cleanup of Redis/LocalStorage dedupe cache on completion of the task

@vmihailenco
Copy link
Owner

That is possible, but I can't describe/understand what guarantees such logic provides...

@shaunco
Copy link
Contributor Author

shaunco commented Nov 27, 2021

I think the confusion on my side is that currently setting a name really acts as though I called a OncePerDay method, when I really just expect it to be "once in the queue at a time for the given name".

The proposed OnceUntilComplete() would just guarantee that any attempts to add the same named task to the queue prior to completion of the existing task would return taskq.ErrDuplicate.

Continuing with the spider example, imagine there is a website where people can request to have their site indexed. Sites are indexed in FIFO order. Users are free to request their site be re-indexed at any time, but to save memory/queue length I only want each requested site in the queue once (yes, this memory constraint is contrived - in my real use case taskq is being used on an IoT device that is actually memory constrained). So, if someone requests "www.google.com" to be indexed, and there is already a pending task for "www.google.com", the site would just return "your site will be indexed soon". If there was not a pending task for "www.google.com", a new task would be added to the end of the queue.

The alternative to this is that I have to maintain a totally separate list of what is currently in the queue to ensure I don't add duplicates to the queue. Given the Redis and LRU maps that already exist in taskq, extending Message to have a CleanupOnceOnComplete flag and extending Storage to have a Delete methods seems to be the most eloquent solution.

Happy to submit a PR for this, just want to make sure there is a chance you'll accept it before I do the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants