Skip to content

Implement Session Queuing and Prioritization for Resource-Constrained Environments #14

@mudassaralichouhan

Description

@mudassaralichouhan

Enhance the session management system to support

  • Session queuing and prioritization for resource-constrained environments.

Crawl sessions should be processed based on priority, and retries should follow defined policies.

Motivation:

  • Optimize resource usage in environments with limited crawling capacity.
  • Ensure high-priority sessions are processed first.
  • Support retry policies for failed URLs.

Acceptance Criteria:

  • Sessions are queued in URLFrontier or equivalent structure.
  • Each session can have a priority (LOW, NORMAL, HIGH).
  • Retry queue with exponential backoff is implemented.
  • System handles resource limits gracefully.

Suggested Tasks:

  1. Extend CrawlSession and CrawlerManager to support priorities.
  2. Implement queue data structure and session scheduling.
  3. Add retry and backoff logic for failed sessions.
  4. Update APIs and metrics to reflect session priorities.
  5. Write unit and integration tests for prioritization and queuing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions