Test template #18

Hatef-Rostamkhani · 2025-10-22T01:06:50Z

No description provided.

- Added functionality to fetch and display sponsor payment accounts from an external JSON source in the `HomeController`. - Introduced a new API endpoint `GET /api/v2/sponsor-payment-accounts` to retrieve active payment accounts. - Updated the `sponsor.html` and `sponsor.js` files to integrate the new payment accounts feature, including a modal for displaying account details. - Enhanced CSS styles for the payment accounts modal and added utility classes for better UI consistency. - Updated `.cursorrules` documentation to include best practices for CSS and JavaScript, ensuring adherence to coding standards. These changes improve the sponsor management capabilities and provide a better user experience when handling payment information.

- Created a new static library `mongodb_instance` for shared MongoDB instance management, improving code organization and reusability. - Updated `CMakeLists.txt` to include the new library and adjusted dependencies for `MongoDBStorage` and `SponsorStorage`. - Enhanced localization files to include sponsor-related text for better user engagement. - Improved the `sponsor.html` template with a direct link to the GitHub repository for better visibility. - Added CSS variables for theming support and improved styling for the sponsor link and other UI elements. These changes enhance the application's MongoDB integration and sponsor management capabilities, providing a more cohesive user experience.

…service - Added environment variables for SPA rendering configuration, including `SPA_RENDERING_ENABLED`, `SPA_RENDERING_TIMEOUT`, and `BROWSERLESS_URL` in `docker-compose.prod.yml`. - Optimized browserless service settings, reducing `MAX_CONCURRENT_SESSIONS` and adjusting resource limits for better performance on constrained environments. - Implemented health checks for the browserless service to ensure it is running correctly. - Updated request timeout handling in `SearchController` to allow overriding via environment variables, improving flexibility for API requests. - Enhanced logging for timeout configurations to aid in debugging and monitoring. These changes improve the application's SPA rendering capabilities and overall Docker deployment efficiency.

- Added new environment variables for SPA rendering in `docker-compose.yml`, including `SPA_RENDERING_ENABLED`, `SPA_RENDERING_TIMEOUT`, and `BROWSERLESS_URL`. - Increased `SEARCH_REDIS_POOL_SIZE` in `docker-compose.prod.yml` for improved Redis performance. - Implemented resource limits for various services to optimize performance on constrained environments. - Introduced a maximum session duration in `CrawlConfig` to prevent infinite crawling, with logging for session timeouts in `CrawlerManager`. - Updated `PageFetcher` to utilize environment variable for SPA rendering timeout, enhancing flexibility. These changes improve the application's Docker deployment and crawler session management, ensuring better resource utilization and operational efficiency.

- Updated the `subscribeEmail` method in the `mongodb` class to accept `ipAddress` and `userAgent` parameters, allowing for more detailed tracking of email subscriptions. - Modified the MongoDB insertion logic to include the new fields along with the email and timestamp, improving data richness. - Enhanced the `emailSubscribe` method in `HomeController` to extract and pass the user's IP address and user agent to the subscription method, ensuring comprehensive data collection. - Implemented error handling for MongoDB operations to improve robustness and user feedback during subscription attempts. These changes enhance the email subscription feature by capturing additional context, leading to better analytics and user engagement.

- Introduced a new `ApiRequestLog` structure for logging API request details, including endpoint, method, IP address, user agent, request body, and response status. - Enhanced `ContentStorage` and `MongoDBStorage` classes to support storing and retrieving API request logs in MongoDB. - Updated `SearchController` to log API requests and errors, capturing relevant metadata for improved analytics and debugging. - Added helper methods for BSON conversion of `ApiRequestLog` to streamline database interactions. - Improved the `CrawlerManager` to provide access to storage for logging purposes. These changes enhance the application's ability to track and analyze API usage, leading to better insights and performance monitoring.

…idelines - Expanded the .cursorrules documentation to include mandatory rules for configurable debug output using the LOG_LEVEL environment variable. - Added detailed examples of proper logging practices, emphasizing the use of LOG_DEBUG() instead of std::cout for better performance and security. - Introduced a checklist for migrating legacy debug output to structured logging. - Included critical notes on lazy initialization patterns for controllers to prevent static initialization order issues. - Updated common issues section with solutions related to MongoDB instance initialization and uWebSockets error handling. These changes enhance the documentation's clarity and provide essential guidelines for maintaining high-quality code standards in the project.

…nfiguration - Changed the MongoDB image from `mongodb/mongodb-enterprise-server:latest` to `mongo:7` for better compatibility and support. - Updated environment variable names from `MONGODB_INITDB_ROOT_USERNAME` and `MONGODB_INITDB_ROOT_PASSWORD` to `MONGO_INITDB_ROOT_USERNAME` and `MONGO_INITDB_ROOT_PASSWORD` to align with the official MongoDB documentation. - Adjusted build arguments in the GitHub Actions workflow to use the correct base image for MongoDB drivers. These changes improve the Docker setup and ensure proper configuration for MongoDB services.

…ance - Introduced constants for various timing and size parameters to enhance code readability and maintainability. - Implemented comprehensive validation checks during crawler initialization to ensure all components are properly set up before starting. - Refactored the processURL method into smaller, more manageable helper methods for better organization and clarity. - Enhanced error handling and logging throughout the crawling process, providing more informative messages for debugging. - Optimized result storage and counting using atomic operations to reduce mutex contention and improve performance. - Added detailed documentation for new methods and refactored logic, ensuring clarity on the crawling workflow and configuration management. These changes significantly improve the Crawler's structure, making it easier to maintain and extend while enhancing its performance and reliability.

- Introduced new issue templates for each phase of the Universal Job Manager project, including detailed descriptions, acceptance criteria, tasks, and success metrics. - Created a comprehensive breakdown guide to facilitate incremental development and testing across phases. - Added a universal job manager epic template to outline the overall goals and objectives of the job management system. - Enhanced documentation for better clarity and organization, ensuring all phases are well-defined and easy to follow. These additions improve project management and streamline the development process for the Universal Job Manager system.

- Changed placeholder URLs in the issue template configuration to point to the correct repository for the Universal Job Manager project. - Updated links for the Job Manager Epic Overview and Implementation Guide to reflect the actual repository path, ensuring users have access to the right documentation. These updates improve the clarity and accessibility of project documentation for contributors.

- Introduced the EmailService class to handle sending email notifications for crawling completion, supporting HTML templates and SMTP communication. - Added EmailController to manage API endpoints for sending crawling notifications and generic emails. - Implemented lazy initialization for EmailService to prevent static initialization order issues. - Enhanced localization support by organizing language files and adding new JSON structures for email templates. - Updated Docker configuration to include SMTP environment variables for email service configuration. These additions improve the email notification functionality, enhancing user engagement and providing better tracking of crawling results.

- Introduced the UnsubscribeService class to manage email unsubscribe records, including token generation, validation, and MongoDB integration. - Added UnsubscribeController to handle API endpoints for one-click unsubscribe functionality, compliant with RFC 8058. - Enhanced EmailService to generate unsubscribe tokens for emails, incorporating List-Unsubscribe headers for better user experience. - Updated localization files to include unsubscribe text for email notifications in both English and Farsi. - Implemented lazy initialization for UnsubscribeService to prevent static initialization order issues. These additions improve user engagement by providing a seamless unsubscribe experience and enhancing email notification management.

- Introduced localization files for English and Farsi, providing translations for the home and about pages. - Enhanced HomeController to serve localized content based on user language preferences, with fallback mechanisms for missing translations. - Implemented a new about page endpoint to serve coming soon content, improving user experience and engagement. - Updated CSS styles for better presentation of localized elements. These additions enhance the application's accessibility and usability for a broader audience by supporting multiple languages.

- Added new search results page template to display search results dynamically, improving user experience. - Implemented search functionality in the SearchController, including URL encoding for search queries and rendering results with elapsed time. - Introduced MongoDBStorage methods for searching site profiles and counting search results, enhancing search capabilities with multilingual support. - Updated localization files to include new keys for search-related content in both English and Farsi, ensuring a consistent user experience across languages. - Enhanced link management by enforcing the use of `{{ base_url }}` for internal links, improving environment handling and flexibility. These changes significantly improve the search feature and localization support, making the application more user-friendly and accessible to a wider audience.

- Introduced a helper function to truncate text descriptions to a maximum length of 300 characters, ensuring that long descriptions do not disrupt the layout of search results. - Updated the SearchController to utilize the new truncation function when adding descriptions to site profiles and search results, improving the presentation of data. - Enhanced user experience by preventing excessively long text from being displayed, maintaining a clean and readable interface. These changes enhance the search functionality by improving the display of descriptions in search results.

- Introduced EmailLogsStorage class to manage email sending logs in MongoDB, tracking email attempts with status and details. - Updated EmailService to support asynchronous email processing, allowing for improved performance and user experience. - Enhanced EmailController to utilize EmailLogsStorage for logging email statuses and errors, providing better tracking of email notifications. - Added new environment variables for SMTP connection timeout and email async processing configuration in docker-compose. - Improved localization support by adding sender names in both English and Farsi email templates. These changes significantly enhance the email notification functionality, improve logging capabilities, and provide better user engagement through asynchronous processing.

- Updated SearchController to log API requests asynchronously, preventing response blocking and improving performance. - Implemented error handling for MongoDB connection initialization in ContentStorage, ensuring robust connection management. - Added retry logic with exponential backoff for storing API request logs in MongoDB, enhancing reliability during connection issues. - Improved logging for API request errors, providing clearer insights into failures. These changes significantly enhance the logging capabilities and MongoDB integration, improving overall system performance and reliability.

…testing - Updated docker-compose configuration to set MongoDB connection parameters, improving performance and connection management. - Introduced a new script for testing server stability with concurrent API requests, allowing for better assessment of server performance under load. - Added a maximum concurrent sessions limit in CrawlerManager to prevent MongoDB connection issues, enhancing stability during high-load scenarios. - Implemented thread-safe MongoDB operations in ContentStorage and MongoDBStorage to prevent socket conflicts, ensuring reliable data access. These changes significantly improve the robustness of MongoDB interactions and provide tools for testing server performance under concurrent load conditions.

- Added a new retryCountMap to track the number of retries for each URL, improving the management of retry logic. - Updated scheduleRetry method to store the retry count for each normalized URL, ensuring accurate tracking. - Implemented cleanup of retry counts in markVisited method after processing a URL, maintaining data integrity. - Enhanced getQueuedURLInfo method to retrieve the current retry count from the tracking map, providing better insights into URL processing. These changes improve the robustness of the URL frontier's retry mechanism, enhancing the crawler's efficiency and reliability.

- Updated CMakeLists.txt to use newer MongoDB C and C++ driver versions (2.1.1 and 4.1.2). - Added UrlCanonicalizer class for consistent URL handling, including normalization and deduplication. - Integrated UrlCanonicalizer into SiteProfile for canonical URL storage, improving deduplication in search results. - Enhanced SearchController and URLFrontier to utilize UrlCanonicalizer for URL normalization, ensuring consistent processing. - Updated Docker configurations for improved service health checks and Redis stack integration. These changes significantly enhance the robustness of URL handling and MongoDB driver integration, improving overall system performance and reliability.

- Introduced intelligent content validation to ensure only high-quality, searchable content is indexed, including content type filtering, quality validation, and URL scheme validation. - Updated the storage architecture to utilize IndexedPage instead of SiteProfile for better organization and clarity in content management. - Enhanced MongoDB and Redis storage classes to support the new IndexedPage structure, improving data handling and retrieval processes. - Improved documentation to reflect changes in content validation and storage mechanisms, ensuring clarity for future development. These enhancements significantly improve the robustness of content handling and validation, leading to better search quality and storage efficiency.

- Removed unnecessary trailing spaces in `sponsor_payment_accounts.json` for cleaner formatting. - Improved content validation documentation in `crawler_endpoint.md` and `content-storage-layer.md` by adding clarity to validation requirements and processes. - Ensured consistent formatting and readability in documentation sections related to content type, quality, and URL validation. These changes enhance the overall quality and maintainability of the documentation and JSON files.

- Added optional parameters `email` and `language` to the crawler endpoint for improved user notifications. - Updated documentation in `crawler_endpoint.md` to reflect new parameters and enhance clarity. - Improved email notification content in both English and Farsi localization files to include the application name "Hatef" for better branding. - Enhanced email subject localization to dynamically include the number of pages indexed, improving user engagement. These changes significantly improve the user experience by providing localized notifications and clearer documentation for the crawler API.

- Added WebsiteProfileController for managing website profile data with full CRUD operations. - Introduced EmailTrackingStorage to handle email open tracking, including IP address and user agent logging. - Implemented lazy initialization pattern for both WebsiteProfileStorage and EmailTrackingStorage to enhance performance and resource management. - Created TrackingController to serve tracking pixel requests and retrieve tracking statistics. - Updated CMakeLists.txt to include new storage classes and ensure proper linking. - Added comprehensive API documentation for the Website Profile API, detailing endpoints and request/response formats. These enhancements significantly improve the API's capabilities for managing website profiles and tracking email interactions, providing a robust solution for the Iranian e-commerce verification system.

…and testing - Added WebsiteProfileStorage and WebsiteProfileController for managing website profile data with full CRUD operations. - Implemented lazy initialization pattern for storage and controller to enhance performance. - Created detailed API documentation covering all endpoints, request/response formats, and error handling. - Developed a test script to validate all API endpoints with colored output for better readability. - Updated build configuration to include new storage and controller files. These enhancements provide a robust solution for managing website profiles in the Iranian e-commerce verification system, ensuring clarity and ease of use for developers and users alike.

- Updated regex in TrackingController to match hex characters case insensitively for better tracking ID extraction. - Refactored EmailTrackingStorage to build update documents using the basic BSON builder for improved clarity and maintainability. - Enhanced the update document structure by separating $set and $push operations, ensuring better organization of fields. These changes enhance the robustness and readability of the email tracking functionality, contributing to a more maintainable codebase.

…figuration - Introduced BASE_URL variable in docker-compose.yml to allow dynamic configuration of the base URL for internal API calls, enhancing deployment flexibility. - Updated email notification templates to include recipient honorifics and improved greeting messages in both English and Farsi localization files. - Enhanced SearchController to support recipient names in email notifications, providing a more personalized user experience. - Implemented asynchronous crawling triggers in WebsiteProfileController, ensuring non-blocking operations during profile saves. These changes improve the overall user experience and system configurability, making the application more adaptable to different environments.

- Added Crawler Scheduler service to both development and production Docker Compose files for automated task management. - Created comprehensive documentation for the Crawler Scheduler, including usage guides, integration methods, and troubleshooting resources. - Implemented necessary scripts for starting, stopping, and verifying the Crawler Scheduler setup. - Enhanced the overall project structure by organizing documentation and ensuring all components are easily discoverable. These changes significantly improve the deployment and management of the Crawler Scheduler, providing a robust solution for automated crawling tasks within the Search Engine Core project.

- Added automatic timezone detection and configurable timezone settings to the Crawler Scheduler, replacing the hardcoded Asia/Tehran value. - Introduced `_detect_timezone()` function in `app/config.py` to determine timezone based on environment variables and system settings. - Updated `app/celery_app.py` to use the new `Config.TIMEZONE` for task scheduling. - Enhanced `docker-compose.yml` and `README.md` with timezone configuration options and examples. - Created comprehensive documentation in `TIMEZONE_CONFIGURATION.md` and a changelog in `CHANGELOG_TIMEZONE.md`. - Added a test script `scripts/test_timezone.sh` to verify timezone detection functionality. These changes improve the flexibility and usability of the Crawler Scheduler, ensuring it operates correctly across different time zones.

…eduler - Updated the time window logic to ensure the end hour is inclusive, allowing processing through the entire hour (e.g., `WARMUP_END_HOUR=23` now processes until `23:59`). - Created comprehensive documentation in `TIME_WINDOW_FIX.md` detailing the changes, including examples and migration notes for users. - Added a new script `scripts/test_time_window.py` to validate the time window logic and ensure all edge cases are handled correctly. - Enhanced the timezone detection mechanism in `app/config.py` to support timezone-aware datetime handling in the database and rate limiter. These changes improve the accuracy and usability of the Crawler Scheduler, ensuring it operates correctly across different time windows and time zones.

- Added a blank line for improved readability in the build-search-engine.yml workflow file. - This change enhances the clarity of the workflow configuration, making it easier to read and maintain.

…improved documentation - Added timezone configuration options to both development and production Docker Compose files, allowing for automatic timezone detection and optional overrides via environment variables. - Updated warm-up hour settings to default to a full day (0-23) for development, ensuring continuous processing. - Enhanced comments throughout the configuration files to clarify the purpose and behavior of each setting, including the inclusivity of the end hour. - Created comprehensive documentation in `DOCKER_COMPOSE_CONFIGURATION.md` and `DOCKER_COMPOSE_UPDATE_SUMMARY.md` to guide users in configuring the Crawler Scheduler effectively. These changes improve the flexibility and usability of the Crawler Scheduler, ensuring it operates correctly across different time zones and configurations.

- Enhanced timezone settings in both development and production Docker Compose files to default to Asia/Tehran, ensuring consistent behavior across environments. - Improved comments for clarity on timezone configuration options, allowing for easier overrides via environment variables. - Updated Farsi localization for the crawling notification footer to reflect a more personalized message and corrected copyright year. These changes enhance the usability and accuracy of the Crawler Scheduler's timezone handling and improve the localization experience for Farsi users.

- Changed the file processor to look for `.txt` files instead of `.json` files, ensuring compatibility with the expected input format. - Updated the Docker Compose production configuration to set the logging level to `debug` for detailed email diagnostics. - Modified the data volume mount for the crawler service to be configurable via an environment variable, enhancing flexibility in production environments. These changes improve the file processing capabilities and logging configuration, ensuring better diagnostics and adaptability in the Crawler Scheduler's deployment.

- Added a mechanism to calculate a hash of relevant source files in the crawler-scheduler directory to determine if a rebuild is necessary. - Introduced a `force_rebuild` input option to allow manual triggering of rebuilds, bypassing the cache. - Updated the build workflow to check existing images against the calculated source hash, ensuring that images are only rebuilt when source files change. - Enhanced documentation in the workflows directory to explain the new caching system and its usage. These changes improve the efficiency of the CI/CD pipeline by preventing unnecessary rebuilds and ensuring that the latest code changes are reflected in the Docker images.

…nfiguration - Introduced MAX_CONCURRENT_SESSIONS variable in both development and production Docker Compose files to allow dynamic configuration of the maximum number of concurrent crawler sessions, enhancing flexibility in resource management. - Updated CrawlerManager to read the MAX_CONCURRENT_SESSIONS value from the environment, with a default of 5, and added error handling for invalid values. - Improved logging to warn users when the maximum sessions limit is reached, ensuring better visibility into crawler operations. These changes enhance the configurability and robustness of the Crawler Scheduler, allowing for better management of concurrent crawling tasks.

- Added a new method `encodeFromHeader` in `EmailService` to handle proper encoding of email headers according to RFC 5322 and RFC 2047 standards. - Updated the `formatEmailHeaders` method to utilize the new encoding function, ensuring that email headers are correctly formatted for both ASCII and non-ASCII names. - Enhanced the email-crawling notification template with improved styling for better visual presentation. These changes enhance the email service's ability to handle diverse character sets in email headers, improving compatibility and user experience.

Hatef-Rostamkhani added 30 commits August 30, 2025 16:18

Hatef-Rostamkhani added 10 commits October 17, 2025 23:55

chore: update GitHub Actions workflow for Docker build

7ddf2ab

- Added a blank line for improved readability in the build-search-engine.yml workflow file. - This change enhances the clarity of the workflow configuration, making it easier to read and maintain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test template #18

Test template #18

Uh oh!

Hatef-Rostamkhani commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test template #18

Are you sure you want to change the base?

Test template #18

Uh oh!

Conversation

Hatef-Rostamkhani commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants