-
Notifications
You must be signed in to change notification settings - Fork 3
URL Shorteners and Redirectors
Some URLs discovered by MailCleaner will be redirected immediately, or after a brief lookup or prompt by the redirecting host. In these cases, the original URL can obscure where the link will actually land the user and will make it difficult for MailCleaner to check the reputation of that host. This can make RBLs less effective.
Using a service which provides "click protection" to redirect all links in emails to a secure portal in order to verify the reputation of that URL in read time is a great addition to regular spam filtering. However, if it does so before MailCleaner has a chance to scan the message, this could limit our ability to filter out obviously malicious links.
If you use this type of service be sure that it acts on the message after passing through MailCleaner. This will also ensure that we don't reject messages for failing DKIM/DMARC if they are modified before arriving at MailCleaner.
The settings `Configuration->Anti-spam->UriRBLs->Resolve URL shorteners/redirects' will enable a module which will do it's best to try to determine the eventual destination of redirected links when checking RBLs (for both UriRBL and, SpamC).
This does two main things. Both of these are kicked of by the lib/MCDnsLists.pm
module module which contains the functions to perform RBL lookups.
If it finds a short domain name with a short alpha-numeric code as the entire URL, like:
bit.ly/abcdef
That domain is searched for in the file etc/rbls/url_shorteners.txt
(please contribute additional shortening services if you find one that is not listed). These services should, without exception, react to a shorten URL with a 3xx redirect code. Links found which match any of these domains, will automatically perform a HEAD request, get the redirected URL in the 3xx response and will look up that URL for RBLs instead. If there is a URL shortening service that does not behave this way, it must be treated as a redirecting service, below.
There is a more sophisticated module (lib/URLRedirects.pm
) which is able to resolve the redirected URL for services using a more complex methodology. In general, these services encode the destination URL somewhere within the re-written URL, so this module works to decode them.
Each service needs a dedicated decoding function since there is no standardized method for encoding the destination URL. Within the module, there is a getServices
function, which contains a hash with each known services. Each includes a regex
(Regular Expression) to detect if the passed URL is one that is subject to this re-writing service, and a decoder
method to extract and return the destination URL.
If you are aware of a service that is not currently supported, you can contribute a new one by basing it off of one of the existing services, like:
"Google Redirect" => {
"regex" => qr#www\.google\.com/url\?q=#,
"decoder" => sub {
my $url = shift;
$url =~ s#www\.google\.com/url\?q=(.*)#$1#;
$url = uri_unescape($url);
return $url;
}
},
In the above example, we are simply matching any of the Google Redirect links which all begin www.google.com/url?q=
. Having matched, the module will feed the input link to the decoder
method. In this case it captures everything after the =
just mentioned, then it uri_unescape
s that string (ie. transforms URL escape characters like %20
back to their unescaped equivalents, in that case
). This is a fairly simple example.
If the returned value is unmodified, the link will not be treated as a redirect. If it is modified, it will be treated as the destination URL after redirection.
Shortened and redirected links will be searched recursively until no new URL is discovered.
- Installation
- Overview of Admin Interface
- General Administration and Maintenance Issues
- Clustering
- Upgrading
- FAQ
Expand ▶ Pages
above to view the Table of Contents for the article you are already reading, or to browse additional topics. You can also search for keywords in the Wiki.