SA Bugzilla – Bug 6335
[review] add domain-only and IP-only lookups for URIDNSBL, for Spamhaus DBL
Last modified: 2010-03-14 14:39:35 UTC
from Spamhaus: At MAAWG in a few days time Spamhaus will release the Domain Block List (DBL), a URI blocklist along the lines of SURBL/URIBL. The zone is currently up and running on our public mirrors (just not announced yet) as: dbl.spamhaus.org Obviously it's designed primarily for use in SpamAssassin so we'd like you guys to implement URIBL_DBL as quickly as possible :) however there is a technical difference to SURBL/URIBL in that the DBL only lists domains (domain.tld) and not IP addresses. This difference is technically important because we would like to actually prevent people making IP address queries to the DBL. Spamhaus's public DNSBL serving infrastructure is always under very heavy load so we do what we can to cut unnecessary traffic. A lot of users wrongly set their MTAs to query non-existent zones such as "rbl.spamhaus.org" or "bl.spamhaus.org", so we correct that quickly by using this DNS return to stop this sort of traffic before it can get going: 1.2.3.4.rbl.spamhaus.org IN A 127.0.0.2 1.2.3.4.rbl.spamhaus.org IN TXT "SPAMHAUS BLOCKLIST ADDRESS IS WRONG MUST FIX" We would like to do the same for the DBL zone, which -if the query was for an IP address- would produce this return: 1.2.3.4.dbl.spamhaus.org IN A 127.0.0.2 1.2.3.4.dbl.spamhaus.org IN TXT "SPAMHAUS DOMAIN BLOCKLIST WRONGLY USED AS AN IP BLOCKLIST, MUST FIX" SpamAssassin at least for URIBL makes both name and IP queries, so we don't know if SpamAssassin will by default try to query IP addresses against DBL too. Can you tell us if it will or not? What we would like is for Spam filters that analyze URIs to decide if the hostname is a domain name or an IP address, and query the appropriate DNSBL list, in our case that would be: Domain name: URIBL_DBL IP address: URIBL_SBL
[current SA code] will definitely query IPs as well, in order to deal with IP-addressed URLs. My thoughts are that while it'll take new code, we'd have no problem doing this in 3.3.x and trunk -- it'll give people a good reason to upgrade ;) I'm thinking of adding a new tflags keyword for 2 new lookup types: look up only domains, or look up only IPs, then adding new rules to do those.
If adding a tflag to only check domains is it also sensible/viable to add one for the util_rb_2tld and util_rb_3tld queries also? I know Jeff has spoken out about their usage in the past (not sure his current opinions however) and based on those comments I presume SpamHaus would rather not deal with *.blogspot.com et al queries. That way you could query everything at URIBL, IPs and anything with a registrar at SURBL, and registrar only domains at DBL.
(In reply to comment #2) > If adding a tflag to only check domains is it also sensible/viable to add one > for the util_rb_2tld and util_rb_3tld queries also? I know Jeff has spoken out > about their usage in the past (not sure his current opinions however) and based > on those comments I presume SpamHaus would rather not deal with *.blogspot.com > et al queries. that's not a bad idea. Let's see how easy it is....
(In reply to comment #2) > If adding a tflag to only check domains is it also sensible/viable to add one > for the util_rb_2tld and util_rb_3tld queries also? I know Jeff has spoken out > about their usage in the past (not sure his current opinions however) and based > on those comments I presume SpamHaus would rather not deal with *.blogspot.com > et al queries. > That way you could query everything at URIBL, IPs and anything with a registrar > at SURBL, and registrar only domains at DBL. SURBL might add support for checking 2/3/4 TLD soon. To catch up with the abused freehosters. So util_rb_3tld and so on will be used most likely pretty soon within SURBL. Raymond Dijkxhoorn - SURBL
That's good news. :) Spamhaus DBL does not require stripping either. Thus the additional tflags mentioned in comment 2 is not needed.
Created attachment 4672 [details] first draft this first draft adds a test to ensure we are implementing "tflags domains_only", and then does it. It doesn't yet do the opposite (IPs only), add the rules, the additional domain rules, etc.
For reference, SURBLs have IPs and domains because URIs have IPs and domains. As Raymond mentions, SURBL will be adding commonly abused freehosts at the third and fourth level probably next week. On our side, we will add the second and third level domains to our two-level-tlds file and a new three-level-tlds file. We will announce and document the changes on the SURBL site and announcement list.
As previously hinted, SURBL has added a three level tld list, and also added some frequently abused large web hosts to the two and three level tld lists. Please see the announcement at: http://lists.surbl.org/pipermail/announce/2010-February/000202.html It may be beneficial for SpamAssassin TLD data and TLD handling code to be updated accordingly. Some of the code may already be in place, for example for handling: http://rulesemporium.com/rules/90_3tld.cf Some of these issues may be appropriate for new tickets. SA folks, please follow up as needed.
(In reply to comment #8) > As previously hinted, SURBL has added a three level tld list, and also added > some frequently abused large web hosts to the two and three level tld lists. > Please see the announcement at: > > http://lists.surbl.org/pipermail/announce/2010-February/000202.html > > It may be beneficial for SpamAssassin TLD data and TLD handling code to be > updated accordingly. Some of the code may already be in place, for example for > handling: > > http://rulesemporium.com/rules/90_3tld.cf > > Some of these issues may be appropriate for new tickets. SA folks, please > follow up as needed. For simplicity pls consider http://www.rulesemporium.com/rules/90_2tld.cf which contains "extra" both 2ltld and 3ltld (incl. SA version check)
Created attachment 4677 [details] second draft getting there. There's a possibility the URIBL_DBL rule may even work if we can figure out a listed URI ;) "domains_only" and "ips_only" tflags implemented, tested and doc'ed.
(In reply to comment #10) > Created an attachment (id=4677) [details] > second draft > > getting there. There's a possibility the URIBL_DBL rule may even work if we > can figure out a listed URI ;) > Mar 1 00:59:11.407 [1098] dbg: check: tests=DATE_IN_FUTURE_06_12,RDNS_NONE,T_FSL_HELO_NON_FQDN_2,T_LAZY_LISTWASHING,T_TO_NO_BRKTS_NORDNS,URIBL_DBL so looks good.
hmm. actually, we need to pick a score, so it's not quite review-ready just yet. I'll commit it to trunk anyway. : 11...; svn commit -m "bug 6335: add support for 'tflags ips_only' and 'tflags domains_only', to control URIDNSBL lookup behaviour on a rule-by-rule basis; add URIBL_DBL rule for Spamhaus DBL, http://www.spamhaus.org/dbl/" Sending MANIFEST Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm Sending rules/25_uribl.cf Sending t/SATest.pm Sending t/data/spam/dnsbl.eml Adding t/data/spam/dnsbl_domsonly.eml Adding t/data/spam/dnsbl_ipsonly.eml Sending t/dnsbl.t Sending t/dnsbl_sc_meta.t Sending t/uribl.t Adding t/uribl_all_types.t Adding t/uribl_domains_only.t Adding t/uribl_ips_only.t Transmitting file data ............. Committed revision 917454.
(In reply to comment #12) > hmm. actually, we need to pick a score, so it's not quite review-ready just > yet. OK, I propose giving it 1.7 points, comparable to: score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2 score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2
(In reply to comment #13) > (In reply to comment #12) > > hmm. actually, we need to pick a score, so it's not quite review-ready just > > yet. > > OK, I propose giving it 1.7 points, comparable to: > > score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2 > score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2 (In reply to comment #13) > (In reply to comment #12) > > hmm. actually, we need to pick a score, so it's not quite review-ready just > > yet. > > OK, I propose giving it 1.7 points, comparable to: > > score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2 > score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2 +1
Created attachment 4682 [details] with a score OK, here's something suitable for 3.3.1 -- please vote...
Created attachment 4683 [details] redo for 3.3.1 oops. That patch was missing some files due to overlooking some "svn add"s :( redo, please vote.
Created attachment 4684 [details] third time lucky ugh. out of disk error meant that patch is incomplete! try this one...
(In reply to comment #13) > (In reply to comment #12) > > hmm. actually, we need to pick a score, so it's not quite review-ready just > > yet. > > OK, I propose giving it 1.7 points, comparable to: > > score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2 > score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2 1.7 score could be ok. DBL may not catch as much as SURBL, but may have lower FP rates.
> Created an attachment (id=4684) [details] > third time lucky Is trunk now in sync with the proposed patch?
oops. good catch Mark, it was still at "second draft". : 11...; svn commit -m "bug 6335: sync up trunk with latest patch, id 4684, for URIBL_DBL rule" rules/50_scores.cf Sending rules/50_scores.cf Transmitting file data . Committed revision 918126.
+1 for third time lucky patch though I am NOT using the DBL yet. Do we have the ability to test this on live systems yet? I'll likely want to do that before giving a +1 to 3.3.1.
> Do we have the ability to test this on live systems yet? > I'll likely want to do that before giving a +1 to 3.3.1. Seems it is working, using the current trunk - I've been using it for the last hour. Somehow the results are not impressive, I get about as many hits on spam as on ham (I don't claim all 'ham' is truly ham, but at least 2/3 of it should be). Even the last mail from Kevin <4B8D63D1.3020609@PCCC.com> hit URIBL_DBL. I had to reduce it score to 0.3 .
Sure enough, we are geting 127.0.1.255 "IP queries prohibited!" most of the time: dbg: async: completed in 0.365 s: URI-DNSBL, DNSBL:dbl.spamhaus.org.:129.206.32.114 dbg: uridnsbl: domain "delphij.net" listed (URIBL_DBL): 127.0.1.255 Somehow the query is an IP address.
(In reply to comment #23) > Sure enough, we are geting 127.0.1.255 "IP queries prohibited!" most of the > time: > > dbg: async: completed in 0.365 s: URI-DNSBL, > DNSBL:dbl.spamhaus.org.:129.206.32.114 > dbg: uridnsbl: domain "delphij.net" listed (URIBL_DBL): 127.0.1.255 > > Somehow the query is an IP address. I saw that too. I was wondering if I'd manage to fluff it somehow and zeroed the score for checking later but I guess I'll save the effort now. Querying directly gave NXDOMAIN on the domains as expected.
(In reply to comment #23) > Sure enough, we are geting 127.0.1.255 "IP queries prohibited!" most of the > time: Even after the issue causing this particular problem is fixed, as a general protection against SpamAssassin accidentally sending data that the DBL thinks is an IP address (correctly or otherwise), URIBL_DBL should never fire if the DBL returns 127.0.1.255. Since the DBL is fundamentally represented as a dnset via rbldnsd, the DBL's view of what is and is not an IP address is not perfect. For example, the DBL thinks that "example.com.1" is an IP address. $ host example.com.1.dbl.spamhaus.org example.com.1.dbl.spamhaus.org has address 127.0.1.255 For our local Exim MTA-level/DBL integration that we're using in production at Berkeley, we make sure we have a sane looking TLD before sending a query off to the DBL, and additionally, we ignore any 127.0.1.255 return codes.
I think I've got it -- the URIBL_SBL-style 3-stage NS->A->DNSBL lookups are triggering lookups on IPs in DBL after the A resolution.
Created attachment 4691 [details] additional fix this applies on top of "third time lucky". I think it'll fix the issue. please test!
Patch applied, run some test messages through: - I appear to be querying the DBL. - It would appear not to report avg.com as listed now. Appears to be a +1 here.
(In reply to comment #27) > Created an attachment (id=4691) [details] > additional fix > > this applies on top of "third time lucky". I think it'll fix the issue. > please test! I applied this patch to my current trunk copy (I hope I understood correctly that the rest is already rolled in), with the latest sa-update which does include the URIBL_DBL rule: uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 body URIBL_DBL eval:check_uridnsbl('URIBL_DBL') describe URIBL_DBL Contains an URL listed in the DBL blocklist tflags URIBL_DBL net domains_only score URIBL_DBL 0 1.7 0 1.7 But I'm not getting any queries sent out to dbl.spamhaus.org at all (only queries to multi.surbl.org, multi.uribl.com, dob.sibl.support-intelligence.net) Using a testpoint: http://dbltest.com/ in a mail body.
(In reply to comment #29) > (In reply to comment #27) > > Created an attachment (id=4691) [details] [details] > > additional fix > > > > this applies on top of "third time lucky". I think it'll fix the issue. > > please test! > > I applied this patch to my current trunk copy (I hope I understood > correctly that the rest is already rolled in), with the latest sa-update > which does include the URIBL_DBL rule: > > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 > body URIBL_DBL eval:check_uridnsbl('URIBL_DBL') > describe URIBL_DBL Contains an URL listed in the DBL blocklist > tflags URIBL_DBL net domains_only > score URIBL_DBL 0 1.7 0 1.7 > > But I'm not getting any queries sent out to dbl.spamhaus.org > at all (only queries to multi.surbl.org, multi.uribl.com, > dob.sibl.support-intelligence.net) > > Using a testpoint: http://dbltest.com/ in a mail body. uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 ??? what is 2130706688 ?
> uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 > > what is 2130706688 ? Decimal for 0x7f000100, i.e. 127.0.1.0 mask. man Mail::SpamAssassin::Plugin::URIDNSBL The mask is poorly documented in that POD, the text does not explain what does it actually do, and whether ones or zeroes have the desired meaning (the classical Cisco dilema).
hmm. Karsten found a bug: 'I think I might have found a corner-case problem with the patch (in trunk) for bug 6335. The problem appears to be, that $dnsbl_lookup_ips and $is_ip are not independent. + my $cf = $scanner->{uridnsbl_active_rules_revipbl}; + my $dnsbl_lookup_ips = 0; + foreach my $rulename (keys %{$cf}) { + if ($tflags->{$rulename} !~ /\bdomains_only\b/) { + $dnsbl_lookup_ips++; $dnsbl_lookup_ips == 0 IFF *all* $tflags->{$rulename} *do* match /domains_only/ (assumption (1)). Due to + if ($dnsbl_lookup_ips && $dom =~ /^\d+\.\d+\.\d+\.\d+$/) { $is_ip then also is 0, even if $dom indeed *is* an IP. + next if ($is_ip && $tflags->{$rulename} =~ /\bdomains_only\b/); This test later on then fails to skip, because $is_ip is zero. Again, even in the case of $dom actually being an IP. And the respective tflags set to domains_only, as per the corner-case assumption of *all* such rules having that tflag set.'
(In reply to comment #29) > But I'm not getting any queries sent out to dbl.spamhaus.org > at all (only queries to multi.surbl.org, multi.uribl.com, > dob.sibl.support-intelligence.net) I can confirm that. :( needs more work. I also need to fix guenther's issue too.
Created attachment 4692 [details] redone OK -- I've addressed guenther's issue, and cleaned up that conditional code in a more consistent way with the existing code. Also, the reason the rule was now not hitting was because I used the wrong rule type in the first place for URIBL_DBL! doh. using "urirhssub" now seems to be working with a test msg here. applied to trunk: : 9...; svn commit -m "bug 6335: address guenther's issue, clean up ips_only/domains_only in a more consistent way, and fix case where URIBL_DBL was not hitting with new code due to use of the wrong rule type" Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm Sending rules/25_uribl.cf Transmitting file data .. Committed revision 918483. the attached patch is against vanilla 3.3, so doesn't depend on applying the earlier patches -- they're now all obsoleted.
(In reply to comment #31) > > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 > > > > what is 2130706688 ? > > Decimal for 0x7f000100, i.e. 127.0.1.0 mask. > > man Mail::SpamAssassin::Plugin::URIDNSBL > > The mask is poorly documented in that POD, the text does not > explain what does it actually do, and whether ones or zeroes > have the desired meaning (the classical Cisco dilema). It would be clearer if the number could be specified as an IP address.
(In reply to comment #35) > > > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 > > It would be clearer if the number could be specified as an IP address. IP addresses are exact match only. ANDed bitmask is an option. REs not supported here. That number is (almost) identical to 256 anyway. Yes, the AND bitmask is pretty fuzzy. :/
(In reply to comment #36) > (In reply to comment #35) > > > > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 > > > > It would be clearer if the number could be specified as an IP address. > > IP addresses are exact match only. ANDed bitmask is an option. REs not > supported here. > > That number is (almost) identical to 256 anyway. Yes, the AND bitmask is pretty > fuzzy. :/ Bitmask would be clearer than hex or decimal number.
> Committed revision 918483. Very good, I'm finally getting useful results. Btw, following my offline communication with João Gouveia, it may be a good idea to add another rule (e.g. URIBL_DBL_MISUSE, with low/symbolic score), matching a 127.0.1.255 - to make it obvious when something goes wrong with our IP classification logic or Spamhaus wildcarding.
(In reply to comment #37) > Bitmask would be clearer than hex or decimal number. It *is* a bitmask. 32 bit integer, decimal representation. Hmm, this is Perl. It should accept a 0x hex number, too. complete_dnsbl_lookup() doesn't though, unless $subtest has been converted to /\d+/ behind the scenes at that point.
(In reply to comment #39) > (In reply to comment #37) > > Bitmask would be clearer than hex or decimal number. > > It *is* a bitmask. 32 bit integer, decimal representation. > > Hmm, this is Perl. It should accept a 0x hex number, too. > complete_dnsbl_lookup() doesn't though, unless $subtest has been converted to > /\d+/ behind the scenes at that point. An IP address octet style bitmask would be clearer. It would look more like an IP address or IP address netmask than a raw number does.
(In reply to comment #31) > > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688 > > > > what is 2130706688 ? > > Decimal for 0x7f000100, i.e. 127.0.1.0 mask. > > man Mail::SpamAssassin::Plugin::URIDNSBL > > The mask is poorly documented in that POD, the text does not > explain what does it actually do, and whether ones or zeroes > have the desired meaning (the classical Cisco dilema). I presume a netmask for the above would be 128.255.254.255 which looks much more reasonable than decimal or hex notation.
(In reply to comment #40) > An IP address octet style bitmask would be clearer. It would look more like an > IP address or IP address netmask than a raw number does. Yup. And the AND bitmask is too fuzzy anyway. But both these are different, unrelated issues. Feel free to file a bug and link into this discussion.
Created attachment 4693 [details] match r918549 Taking the chance to address guenther's concerns about the inefficiency of the unnecessary "foreach" loop, and a little extra code/doco cleanup while I was at it. This also includes a test improvement too to exercise more edge cases.
Sorry Justin, I hate to be a bitch like this... ;) + my ($is_ip, my $single_dnsbl);
$ svn diff Index: URIDNSBL.pm =================================================================== --- URIDNSBL.pm (revision 918562) +++ URIDNSBL.pm (working copy) @@ -664,7 +664,7 @@ my $tflags = $scanner->{conf}->{tflags}; my $cf = $scanner->{uridnsbl_active_rules_revipbl}; - my ($is_ip, my $single_dnsbl); + my ($is_ip, $single_dnsbl); if ($dom =~ /^\d+\.\d+\.\d+\.\d+$/) { my $IPV4_ADDRESS = IPV4_ADDRESS; my $IP_PRIVATE = IP_PRIVATE; Sending Plugin/URIDNSBL.pm Transmitting file data . Committed revision 918565.
In the last two hours or so I got 1900 hits on URIBL_DBL for spam, and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/). Good!
> and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/). Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc and only sends out queries for 'co.cc' ???
(In reply to comment #47) > > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/). $ host co.cc.dbl.spamhaus.org co.cc.dbl.spamhaus.org has address 127.0.1.2 Hmm. :/ > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc > and only sends out queries for 'co.cc' ??? [guenther@monkey trunk]$ grep -rl co.cc . Doesn't appear to be listed as a second-level TLD, so that would be normal.
I've filed bug 6363. This work has impacted production 3.3.0 installs -- URIBL_DBL was pushed to the sa-update channel and is misfiring since the 3.3.1 changes are not in place and URIBL_DBL treats 127.0.1.255 as a valid return code from the DBL.
Created attachment 4695 [details] match r918599 include guenther's typo fix
Created attachment 4696 [details] match r918602 update with "if can()" fix
> > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/). > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc > and only sends out queries for 'co.cc' ??? Isn's dbl.spamhaus.org supposed to receive a full domain name, and do its own stripping/wildcarding? I could imagine for example that dbl would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc.
(In reply to comment #52) > > > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/). > > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc > > and only sends out queries for 'co.cc' ??? > > Isn's dbl.spamhaus.org supposed to receive a full domain name, and do > its own stripping/wildcarding? I could imagine for example that dbl Yes. But that would require a *lot* more code and a whole new type, if we want to do that for DBL. Either way, I don't really see why they would list a 2tld, or a free sub-domain hoster, but not some of their hosted stuff. > would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc. co.cc.dbl.spamhaus.org has address 127.0.1.2 www.co.cc.dbl.spamhaus.org has address 127.0.1.2 Check out what that is. A free sub-domain hoster. *sigh* foo.co.cc.dbl.spamhaus.org has address 127.0.1.2 Does respond, but is free for grabs. osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2 www.osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2 And they do list them...
>> would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc > Check out what that is. A free sub-domain hoster. *sigh* I know, but that does not mean they can't be hosting a couple of legitimate non-spam domains too, which could be excluded from a *.co.cc wildcard. > And they do list them... Yes, which is a pity, and a false positive for the osvisnet project.
Agreed, it is a FP with DBL. DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com. This is the same situation with a different sub-domain hoster.
I ran sa-update last night and have been monitoring DBL hits since then. Although it seems to be running fine, I'm finding that it randomly reports the wrong domain in the test result headers. For example: 3.5 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist [URIs: scridest.com] 3.5 URIBL_BLACK Contains an URL listed in the URIBL blacklist [URIs: scridest.com] 3.5 URIBL_DBL Contains an URL listed in the DBL blocklist [URIs: w.interia.pl] However, when I test the domain being reported by the DBL header manually, immediately after receiving the message, I get: host w.interia.pl.dbl.spamhaus.org Host w.interia.pl.dbl.spamhaus.org not found: 3(NXDOMAIN) But testing the domain that's reported in the headers by URIBL_BLACK or SURBL or other URIBL, I get: host scridest.com.dbl.spamhaus.org scridest.com.dbl.spamhaus.org has address 127.0.1.2 I know this is just a cosmetic issue, but would be nice to have this fixed. Bill
(In reply to comment #56) > I ran sa-update last night and have been monitoring DBL hits since then. > Although it seems to be running fine, I'm finding that it randomly reports the > wrong domain in the test result headers. For example: Last night? That's a side-effect of the bugs in the first cuts of the code. See today's comments and the code changes. I believe what you are seeing are FP hits due to the code, that has been corrected since. The reported domain just happens to have been checked first against DBL. Please update from SVN, and see if you still can reproduce this.
well, since this update was released via sa-update, shouldn't the fix be release there as well, especially since that is how most people update their SA installations? And since my SA install was done via yum: rpm -q spamassassin spamassassin-3.3.0-5.fc12.i686 I'd rather not do an SVN update over the top of my current running RPM. When is this fix planned to be available via sa-update? Bill
(In reply to comment #58) > well, since this update was released via sa-update, shouldn't the fix be > release there as well, especially since that is how most people update their SA > installations? See comment 49 and bug 6363. > I'd rather not do an SVN update over the top of my current running RPM. Yes. Sorry, I assumed you're running SVN... > When is this fix planned to be available via sa-update? Bug 6363 comment 4. Update has been done, should already be propagated to mirrors and DNS. Meanwhile, add score URIBL_DBL 0 to local.cf, until sa-update corrected this.
>> Isn's dbl.spamhaus.org supposed to receive a full domain name, and do >> its own stripping/wildcarding? I could imagine for example that dbl >> would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc. > > Yes. But that would require a *lot* more code and a whole new type, > if we want to do that for DBL. How much is 'a *lot* more code'? > Either way, I don't really see why they would list a 2tld, or a free > sub-domain hoster, but not some of their hosted stuff. For precisely the cases like the one I stumbled across. Because some of their subdomains can be whitehats. The DNS technology covers such cases just fine (a wildcard, along with more specific exceptions). > DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com. > This is the same situation with a different sub-domain hoster. Let's modify this example a bit: spammer.live.com blacklisted, but live.com not. And our plugin strips out the first label, querying for live.com given an URL http://spammer.live.com/, and we miss a hit. What I'm driving at is - it doesn't take much stretching to come across cases of FP or FN due to the fact that URIDNSBL plugin can easily split/strip domain names differently than DBL does.
(In reply to comment #60) > >> Isn's dbl.spamhaus.org supposed to receive a full domain name, and do > >> its own stripping/wildcarding? I could imagine for example that dbl > >> would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc. > > > > Yes. But that would require a *lot* more code and a whole new type, > > if we want to do that for DBL. > > How much is 'a *lot* more code'? A new URIDNSBL::uri* rule? And most likely some background code, since the existing PMS::get_uri_list() and PMS::get_uri_detail_list() do quite a bit of munging, cleaning, and adding variants that's undesired here. Definitely do not call Util::uri_list_canonify(), Util::uri_to_domain() and friends, which strip on the RegistrarBoundaries. > > Either way, I don't really see why they would list a 2tld, or a free > > sub-domain hoster, but not some of their hosted stuff. > > For precisely the cases like the one I stumbled across. Because some > of their subdomains can be whitehats. The DNS technology covers such > cases just fine (a wildcard, along with more specific exceptions). A blackhat that dark that it should be blacklisted. With whitehat customers as sub-domains... Note that you did not find a case like you just described. I have not found any co.cc sub-domain that was not blacklisted by DBL. Dunno if they would, honestly. All we've seen is *.co.cc blacklisted. > > DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com. > > This is the same situation with a different sub-domain hoster. > > Let's modify this example a bit: spammer.live.com blacklisted, but > live.com not. And our plugin strips out the first label, querying for > live.com given an URL http://spammer.live.com/, and we miss a hit. Using vanilla RegistrarBoundaries -- yes. Using the third-party 90_2tld.cf -- no. The same with URIBL and SURBL. Given the previous mud-slinging (as evidenced in this bugzilla) between these two URI DNSBLs, whether or not to list 2tld sub-domain hosters, involved traffic, etc, I don't think they would be particularly happy about RegistrarBoundaries stripped variants (including third-party settings), *plus* the unstripped ones. The latter, in the absence of extensive, agreed-upon lists of 2tld and 3tld sub-hoster domains, is exactly what DBL wants in a case like this, though. So to not being harassed by other URI DNSBLs, we either can re-use those boundaries and settings, or come up with code and a data-structure that keeps only un-altered, un-stripped URIs -- with the path, etc again stripped though.
In addition, the previously existing URI DNSBLs do NOT work the same as DBL does with respect to wildcard DNS. Just checked, spammer domain masked. example.net.multi.uribl.com has address 127.0.0.2 Host sub.example.net.multi.uribl.com not found: 3(NXDOMAIN) example.net.multi.surbl.org has address 127.0.0.6 Host sub.example.net.multi.surbl.org not found: 3(NXDOMAIN) They absolutely need the RegistrarBoundaries stripping. Thus, the above definitely would require new data structures and code to not strip anything but keep the full URIs for DBL.
It'd effectively be a pretty large quantity of DBL-specific code, alright.
Oh, and given Spamhaus chooses to actively fight useless traffic by deliberately returning FPs for IP lookups, they most likely wouldn't be happy about multiple lookups like co.cc and foo.co.cc either. Dilemma. Strip registrar boundaries, as we used to. Or introduce code to not do that, specifically for DBL.
> > > DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com. > > > This is the same situation with a different sub-domain hoster. > > > > Let's modify this example a bit: spammer.live.com blacklisted, but > > live.com not. And our plugin strips out the first label, querying for > > live.com given an URL http://spammer.live.com/, and we miss a hit. > > Using vanilla RegistrarBoundaries -- yes. > > Using the third-party 90_2tld.cf -- no. The same with URIBL and SURBL. > > Given the previous mud-slinging (as evidenced in this bugzilla) between these > two URI DNSBLs, whether or not to list 2tld sub-domain hosters, involved > traffic, etc, I don't think they would be particularly happy about > RegistrarBoundaries stripped variants (including third-party settings), *plus* > the unstripped ones. > > The latter, in the absence of extensive, agreed-upon lists of 2tld and 3tld > sub-hoster domains, is exactly what DBL wants in a case like this, though. So > to not being harassed by other URI DNSBLs, we either can re-use those > boundaries and settings, or come up with code and a data-structure that keeps > only un-altered, un-stripped URIs -- with the path, etc again stripped though. SURBL and URIBL have agreed on most of the 2ld and 3lds in our tld lists now. co.cc is included in the 2ld list, which means foo.co.cc would be checked, but co.cc would not. In other words, the right thing is done. Thus the request to use the revised TLD data for RegistrarBoundaries: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6361
(In reply to comment #64) > Oh, and given Spamhaus chooses to actively fight useless traffic by > deliberately returning FPs for IP lookups, they most likely wouldn't be happy > about multiple lookups like co.cc and foo.co.cc either. > > Dilemma. Strip registrar boundaries, as we used to. Or introduce code to not do > that, specifically for DBL. Spamhaus would like DBL to be thought of as a new type of list, so perhaps a new rule might be appropriate than trying to modify urirhssub. Some of the differences between a URIBL and DBL are: 1. No URI IPs in DBL. DBL is not a URI list; it's a domain list. 2. IPs in URIs should be checked in SBL instead of DBL. Obviously this complicates URI handling somewhat. 3. DBL uses wildcards, therefore domain stripping by the mail filter application is not necessary. This simplifies the application, but it may have a negative impact on on-rbldnsd DNS caching, for example BIND forwards. A relevant question is whether these differences are enough to justify a new rule.
(In reply to comment #48) > (In reply to comment #47) > > > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/). > > $ host co.cc.dbl.spamhaus.org > co.cc.dbl.spamhaus.org has address 127.0.1.2 > > Hmm. :/ > > > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc > > and only sends out queries for 'co.cc' ??? > > [guenther@monkey trunk]$ grep -rl co.cc . > > Doesn't appear to be listed as a second-level TLD, so that would be normal. It seems that co.cc is no longer listed in DBL: % host co.cc.dbl.spamhaus.org Host co.cc.dbl.spamhaus.org not found: 3(NXDOMAIN)
(In reply to comment #53) > (In reply to comment #52) > > > > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/). > > > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc > > > and only sends out queries for 'co.cc' ??? > > > > Isn's dbl.spamhaus.org supposed to receive a full domain name, and do > > its own stripping/wildcarding? I could imagine for example that dbl > > Yes. But that would require a *lot* more code and a whole new type, if we want > to do that for DBL. Either way, I don't really see why they would list a 2tld, > or a free sub-domain hoster, but not some of their hosted stuff. > > > would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc. > > co.cc.dbl.spamhaus.org has address 127.0.1.2 > www.co.cc.dbl.spamhaus.org has address 127.0.1.2 > > Check out what that is. A free sub-domain hoster. *sigh* > > foo.co.cc.dbl.spamhaus.org has address 127.0.1.2 > > Does respond, but is free for grabs. > > osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2 > www.osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2 > > And they do list them... No longer listed it seems: host osvisnet.co.cc.dbl.spamhaus.org Host osvisnet.co.cc.dbl.spamhaus.org not found: 3(NXDOMAIN)
(In reply to comment #66) > Spamhaus would like DBL to be thought of as a new type of list, so perhaps a > new rule might be appropriate than trying to modify urirhssub. I considered adding a new rule keyword for this. My thoughts were that we already have a lot of different rule types, and in reality this is the same kind of RHSBL-lookup mechanism -- just with a different selection of what to query. Hence a tflag was more appropriate. Regarding the IP mask -- 0x7f000100 was chosen since initial plans were to return 127.0.0.2 for the "don't do that" listings for IP addresses. That would not match those, instead matching solely the 127.0.1.2 "real" listings. Since then, though, Spamhaus have instead gone with using 127.0.1.255, which is in the 127.0.1.x space, rendering that attempt futile. We could therefore now just use 2 as the mask for the URIBL_DBL rule, with the same effect and causing less confusion. devs, thoughts?
> Regarding the IP mask -- 0x7f000100 was chosen since initial plans were to > return 127.0.0.2 for the "don't do that" listings for IP addresses. That would > not match those, instead matching solely the 127.0.1.2 "real" listings. Since > then, though, Spamhaus have instead gone with using 127.0.1.255, which is in > the 127.0.1.x space, rendering that attempt futile. > > We could therefore now just use 2 as the mask for the URIBL_DBL rule, with the > same effect and causing less confusion. devs, thoughts? Please followups on this masking topic to Bug 6362.
Spamhaus DBL return codes / sections are aligned on human comprehensible decimal boundaries. Thus a bitmask does not have any advantage here. http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291 Instead of using a bitmask of 2, I'd rather use the expected 127.0.1.2 return code for the URIBL_DBL rule. This, unlike a bitmask, will also prevent the "IP queries prohibited!" .225 response from FP'ing.
(In reply to comment #69) > (In reply to comment #66) > > Spamhaus would like DBL to be thought of as a new type of list, so perhaps a > > new rule might be appropriate than trying to modify urirhssub. > > I considered adding a new rule keyword for this. My thoughts were that we > already have a lot of different rule types, and in reality this is the same > kind of RHSBL-lookup mechanism We already have a: urinsrhsbl,urinsrhssub vs. urifullnsrhsbl,urifullnsrhssub rules so a urifullrhsbl would fill-in a natural missing gap: urirhsbl vs. urifullrhsbl and avoid having to worry about matching zone boundaries for this type of a query.
we've lost track of status here -- this still isn't in 3.3.1.
> we've lost track of status here -- this still isn't in 3.3.1. I'd consider this bug a dependency on Bug 6362, along with a need to rewrite rule(s) according to the DBL documentation: http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291 Consider example in Bug 6362 comment 8 as a suggestion, and also Comment 38 in this bug. There is also a dependency on Bug 6361 to reduce chances on mismatched boundaries - and ideally a promise to turn these rules into urifullrhsbl eventually. I see the current stripping solution as just a stop-gap temporary solution - works most of the time, but is not according to Spamhaus specifications.
(In reply to comment #74) > > we've lost track of status here -- this still isn't in 3.3.1. > > I'd consider this bug a dependency on Bug 6362, along with a need > to rewrite rule(s) according to the DBL documentation: > http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291 those aren't yet in production, though. > Consider example in Bug 6362 comment 8 as a suggestion, and also > Comment 38 in this bug. > > There is also a dependency on Bug 6361 to reduce chances on mismatched > boundaries - and ideally a promise to turn these rules into urifullrhsbl > eventually. I see the current stripping solution as just a stop-gap > temporary solution - works most of the time, but is not according to > Spamhaus specifications. do you think the current patch needs reworking before it is suitable for release as 3.3.1?
> > according to the DBL documentation: > > http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291 > > those aren't yet in production, though. True, but the moment they are (like the PHISH and MALWARE ranges), we'd need to issue another code release, if we do not introduce subtest ranges now. > > Consider example in Bug 6362 comment 8 as a suggestion, and also > > Comment 38 in this bug. > do you think the current patch needs reworking before it is suitable for > release as 3.3.1? I think the code is alright as it is (barring a wish for enhancement), but I'd like to see rules augmented: separate phish and malware from spam, and provide a separate rule to catch the .1.255 error status, so that any future blunder will be detected soon.
> True, but the moment they are (like the PHISH and MALWARE ranges), > we'd need to issue another code release, if we do not introduce > subtest ranges now. Btw, is there a testpoint like dbltest.com for phish/malware ranges? It could facilitate testing.
(In reply to comment #76) > > > Consider example in Bug 6362 comment 8 as a suggestion, and also > > > Comment 38 in this bug. > > > do you think the current patch needs reworking before it is suitable for > > release as 3.3.1? > > I think the code is alright as it is (barring a wish for enhancement), > but I'd like to see rules augmented: separate phish and malware from > spam, and provide a separate rule to catch the .1.255 error status, > so that any future blunder will be detected soon. ok, I've done that. I haven't actually _added_ rules for phish and malware yet, as Spamhaus haven't yet started using those codes, and we can trivially add them in updates.
Created attachment 4699 [details] match r920291. : 114...; svn commit -m "bug 6335: rename URIBL_DBL to URIBL_DBL_SPAM, to make way for future DBL subtypes like phish or malware; use new IP-address style lookup result; add URIBL_DBL_ERROR to detect lookups of IP addresses due to SA bugs" Sending rules/25_uribl.cf Sending rules/50_scores.cf Transmitting file data .. Committed revision 920291. that's in trunk. patch attached.
> Created an attachment (id=4699) > match r920291. > that's in trunk. patch attached. +1
Btw, I have reviewed 10 days worth of our amavisd log, and I'm very pleased with the quality of the URIBL_DBL (now URIBL_DBL_SPAM) rule. Checked all messages that passed as ham and had a hit on DBL, and apart from mail of our own discussion on the topic with samples, and a few marginal cases, there were practically no hard false positives. I bumped our (local) score to 2.8 .
+1 for 3.3.x We're voting to backport for 3.3.1 right?
> We're voting to backport for 3.3.1 right? Indeed. Btw, when applying this patch, it may conflict with the just applied patch from Bug 6362, a hunk or two may need to be manually resolved (or do it the other way around: undo the r921800, apply this patch, and redo the backed-off change).
thanks - applied. : 312...; svn commit -m "bug 6335: add Spamhaus DBL as URIBL_DBL_SPAM rule, leaving room for future DBL subtypes like phish or malware; add URIBL_DBL_ERROR to detect lookups of IP addresses due to possible future bugs" Sending MANIFEST Sending lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm Sending rules/25_uribl.cf Sending rules/50_scores.cf Sending t/SATest.pm Sending t/data/spam/dnsbl.eml Adding t/data/spam/dnsbl_domsonly.eml Adding t/data/spam/dnsbl_ipsonly.eml Sending t/dnsbl.t Sending t/dnsbl_sc_meta.t Sending t/uribl.t Adding t/uribl_all_types.t Adding t/uribl_domains_only.t Adding t/uribl_ips_only.t Transmitting file data .............. Committed revision 921873.
Found a minor glitch in a test t/uribl_all_types.t, which shows up under perl 5.6.2 . $ perl t/uribl_all_types.t 1..3 # Running under perl version 5.006002 for freebsd # Current time local: Thu Mar 11 21:43:41 2010 # Current time GMT: Thu Mar 11 20:43:41 2010 # Using Test.pm version 1.24 /usr/local/bin/perl -T -w ../spamassassin.raw -C log/test_rules_copy --siteconfigpath log/localrules.tmp -p log/test_default.cf -t < data/spam/dnsbl.eml ok 1 Checking X_URIBL_DOMSONLY Not found: X_URIBL_DOMSONLY = X_URIBL_DOMSONLY [URIs: uribl-example-c.com] at t/uribl_all_types.t line 45. not ok 2 # Failed test 2 in t/SATest.pm at line 717 Checking X_URIBL_IPSONLY ok 3 Output can be examined in: log/d.uribl_all_types/1 Under 5.6.2 the result is: pts rule name description ---- ---------------------- -------------------------------------------------- 1.0 X_URIBL_IPSONLY X_URIBL_IPSONLY [URIs: 144.137.3.98] 5.0 TEST_INVALID_DATE Invalid Date: header (not RFC 2822) 0.0 T_FSL_HELO_NON_FQDN_2 T_FSL_HELO_NON_FQDN_2 0.0 T_LAZY_LISTWASHING Lazy spammer, painfully obvious bogus addresses 1.0 X_URIBL_DOMSONLY X_URIBL_DOMSONLY [URIs: uribl-example-b.com] 0.0 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date 0.0 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL 5.0 TEST_NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL 0.0 T_TO_NO_BRKTS_NORDNS T_TO_NO_BRKTS_NORDNSl network by a host with no rDNS: under 5.10.1 the result is: pts rule name description ---- ---------------------- -------------------------------------------------- 1.0 X_URIBL_IPSONLY X_URIBL_IPSONLY [URIs: 144.137.3.98] 1.0 X_URIBL_DOMSONLY X_URIBL_DOMSONLY [URIs: uribl-example-c.com] 5.0 TEST_INVALID_DATE Invalid Date: header (not RFC 2822) 0.0 T_FSL_HELO_NON_FQDN_2 T_FSL_HELO_NON_FQDN_2 0.0 T_LAZY_LISTWASHING Lazy spammer, painfully obvious bogus addresses 0.0 DATE_IN_FUTURE_06_12 Date: is 6 to 12 hours after Received: date 0.0 NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL 5.0 TEST_NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 T_TO_NO_BRKTS_NORDNS T_TO_NO_BRKTS_NORDNS Apart from reordered results, the key difference is 'b' vs 'c' in: [URIs: uribl-example-b.com] vs.: [URIs: uribl-example-c.com] As there are both links in the sample message: me too: http://uribl-example-b.com/ me too: http://uribl-example-c.com/ either hit should do to fulfill the %patterns requirement. Don't know why a different one is chosen under 5.6.2, but is not incorrect, so the test should probably be able to cope with either.
Btw, the reason for a different behaviour may not be with perl 5.6. Could just well be due to running the test on a virtual host behind a NAT, or some other environmental reason.
Created attachment 4710 [details] fix for nondeterministic test issue ok, here's a fix for that. : 172...; svn commit -m "bug 6335: avoid non-deterministic test issue" t/uribl_all_types.t Sending t/uribl_all_types.t Transmitting file data . Committed revision 922263. Do you think this needs to make it into 3.3.1? Note that those tests are not run by default (they require both "run_net_tests" and "run_long_tests" to be turned on.)
> Committed revision 922263. > > Do you think this needs to make it into 3.3.1? Note that those tests are not > run by default (they require both "run_net_tests" and "run_long_tests" to be > turned on.) Yes, I think it should make it into 3.3.1: - it's a trivial fix, can't hurt even when not needed; - as we do not yet know what is a cause behind this nondeterminism, it's better to play safe and not require something which may not be relied upon
3.3: Bug 6335: avoid non-deterministic test issue Sending t/uribl_all_types.t Committed revision 922695.
re-closing
Actually doing comment 90. :)