-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[NUTCH-3103] Fixed custom max intervals for AdaptiveFetchSchedule
1) The loop in setHostSpecificIntervals is cleaned up and if max interval in the config is set to default, it is treated correctly. 2) The functions getMinInterval and getMaxInterval are respectively renamed to getCustomMinInterval and getCustomMaxInterval and now return null if no custom interval has been set for the given URL's hostname. If one of them returns null after it is called, then the corresponding default value will be used to bound the calculated interval. 3) The custom interval values in the config are now allowed to equal the default values. For example, if the default min interval is 7200 then in the config file "0", "default" and "7200" are all valid values for the custom min interval, and they all have the same result. 4) The config file template is changed to account for these changes.
- Loading branch information
martin
committed
Feb 6, 2025
1 parent
b52ec90
commit 931ba17
Showing
2 changed files
with
145 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,14 @@ | ||
# This file defines a mapping that associates specific min. and max. refetching time intervals | ||
# to a host, that deviate from the default settings of the AdaptiveFetchSchedule class. | ||
# This file defines a mapping that associates specific min and max refetching intervals | ||
# with a host, that deviate from the default settings of the AdaptiveFetchSchedule class. | ||
# | ||
# Format: <hostname> <min_interval> <max_interval>. | ||
# Format: <hostname> <min_interval> <max_interval> | ||
# | ||
# The two values will be parsed as float and should be STRICTLY between | ||
# The two interval values will be parsed as float and should be between | ||
# db.fetch.schedule.adaptive.min_interval and db.fetch.schedule.adaptive.max_interval. | ||
# | ||
# To use default values, write "default" or "0". | ||
# The default min. is 60 (1 min) and default max. is 31536000 (1 year). | ||
# To use the default as a value, write either "default" or "0". | ||
# The default min is 60 (1 min), while the default max is 31536000 (1 year). | ||
# | ||
www.apache.org default 1728000 | ||
www.example.org 1296000 0 | ||
nutch.apache.org 864000 2160000 | ||
www.example.com default 1728000 | ||
www.apache.org 1296000 0 | ||
nutch.apache.org 864000 2160000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters