Bug 6335 - [review] add domain-only and IP-only lookups for URIDNSBL, for Spamhaus DBL
Summary: [review] add domain-only and IP-only lookups for URIDNSBL, for Spamhaus DBL
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.3.0
Hardware: Other All
: P1 enhancement
Target Milestone: 3.3.1
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: ready for commit
Keywords:
Depends on: 6362
Blocks:
  Show dependency tree
 
Reported: 2010-02-15 16:15 UTC by Justin Mason
Modified: 2010-03-14 14:39 UTC (History)
9 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
first draft patch None Justin Mason [HasCLA]
second draft patch None Justin Mason [HasCLA]
with a score patch None Justin Mason [HasCLA]
redo for 3.3.1 patch None Justin Mason [HasCLA]
third time lucky patch None Justin Mason [HasCLA]
additional fix patch None Justin Mason [HasCLA]
redone patch None Justin Mason [HasCLA]
match r918549 patch None Justin Mason [HasCLA]
match r918599 patch None Justin Mason [HasCLA]
match r918602 patch None Justin Mason [HasCLA]
match r920291. patch None Justin Mason [HasCLA]
fix for nondeterministic test issue patch None Justin Mason [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Mason 2010-02-15 16:15:21 UTC
from Spamhaus:

At MAAWG in a few days time Spamhaus will release the Domain Block List (DBL), a URI blocklist along the lines of SURBL/URIBL. The zone is currently up and running on our public mirrors (just not announced yet) as: dbl.spamhaus.org

Obviously it's designed primarily for use in SpamAssassin so we'd like you guys to implement URIBL_DBL as quickly as possible :) however there is a technical difference to SURBL/URIBL in that the DBL only lists domains (domain.tld) and not IP addresses. This difference is technically important because we would like to actually prevent people making IP address queries to the DBL.

Spamhaus's public DNSBL serving infrastructure is always under very heavy load so we do what we can to cut unnecessary traffic. A lot of users wrongly set their MTAs to query non-existent zones such as "rbl.spamhaus.org" or "bl.spamhaus.org", so we correct that quickly by using this DNS return to stop this sort of traffic before it can get going:

1.2.3.4.rbl.spamhaus.org IN A 127.0.0.2
1.2.3.4.rbl.spamhaus.org IN TXT "SPAMHAUS BLOCKLIST ADDRESS IS WRONG MUST FIX"

We would like to do the same for the DBL zone, which -if the query was for an IP address- would produce this return:

1.2.3.4.dbl.spamhaus.org IN A 127.0.0.2
1.2.3.4.dbl.spamhaus.org IN TXT "SPAMHAUS DOMAIN BLOCKLIST WRONGLY USED AS AN IP BLOCKLIST, MUST FIX"

SpamAssassin at least for URIBL makes both name and IP queries, so we don't know if SpamAssassin will by default try to query IP addresses against DBL too. Can you tell us if it will or not?

What we would like is for Spam filters that analyze URIs to decide if the hostname is a domain name or an IP address, and query the appropriate DNSBL list, in our case that would be:

Domain name: URIBL_DBL
IP address: URIBL_SBL
Comment 1 Justin Mason 2010-02-15 16:17:08 UTC
[current SA code] will definitely query IPs as well, in order to deal with IP-addressed URLs.

My thoughts are that while it'll take new code, we'd have no problem
doing this in 3.3.x and trunk -- it'll give people a good reason to
upgrade ;)

I'm thinking of adding a new tflags keyword for 2
new lookup types: look up only domains, or look up only IPs, then
adding new rules to do those.
Comment 2 Kevin Golding 2010-02-16 09:31:30 UTC
If adding a tflag to only check domains is it also sensible/viable to add one for the util_rb_2tld and util_rb_3tld queries also?  I know Jeff has spoken out about their usage in the past (not sure his current opinions however) and based on those comments I presume SpamHaus would rather not deal with *.blogspot.com et al queries.

That way you could query everything at URIBL, IPs and anything with a registrar at SURBL, and registrar only domains at DBL.
Comment 3 Justin Mason 2010-02-16 10:15:26 UTC
(In reply to comment #2)
> If adding a tflag to only check domains is it also sensible/viable to add one
> for the util_rb_2tld and util_rb_3tld queries also?  I know Jeff has spoken out
> about their usage in the past (not sure his current opinions however) and based
> on those comments I presume SpamHaus would rather not deal with *.blogspot.com
> et al queries.

that's not a bad idea.  Let's see how easy it is....
Comment 4 Raymond Dijkxhoorn 2010-02-16 22:55:57 UTC
(In reply to comment #2)
> If adding a tflag to only check domains is it also sensible/viable to add one
> for the util_rb_2tld and util_rb_3tld queries also?  I know Jeff has spoken out
> about their usage in the past (not sure his current opinions however) and based
> on those comments I presume SpamHaus would rather not deal with *.blogspot.com
> et al queries.
> That way you could query everything at URIBL, IPs and anything with a registrar
> at SURBL, and registrar only domains at DBL.

SURBL might add support for checking 2/3/4 TLD soon. 
To catch up with the abused freehosters. 

So util_rb_3tld and so on will be used most likely pretty soon within SURBL.

Raymond Dijkxhoorn - SURBL
Comment 5 Karsten Bräckelmann 2010-02-17 10:00:39 UTC
That's good news. :)

Spamhaus DBL does not require stripping either. Thus the additional tflags mentioned in comment 2 is not needed.
Comment 6 Justin Mason 2010-02-18 22:54:43 UTC
Created attachment 4672 [details]
first draft

this first draft adds a test to ensure we are implementing "tflags domains_only", and then does it.  It doesn't yet do the opposite (IPs only), add the rules, the additional domain rules, etc.
Comment 7 Jeff Chan 2010-02-19 12:26:18 UTC
For reference, SURBLs have IPs and domains because URIs have IPs and domains. 

As Raymond mentions, SURBL will be adding commonly abused freehosts at the third and fourth level probably next week.  On our side, we will add the second and third level domains to our two-level-tlds file and a new three-level-tlds file.  We will announce and document the changes on the SURBL site and announcement list.
Comment 8 Jeff Chan 2010-02-26 09:27:52 UTC
As previously hinted, SURBL has added a three level tld list, and also added some frequently abused large web hosts to the two and three level tld lists.  Please see the announcement at:

  http://lists.surbl.org/pipermail/announce/2010-February/000202.html

It may be beneficial for SpamAssassin TLD data and TLD handling code to be updated accordingly.  Some of the code may already be in place, for example for handling:

  http://rulesemporium.com/rules/90_3tld.cf

Some of these issues may be appropriate for new tickets.  SA folks, please follow up as needed.
Comment 9 AXB 2010-02-26 09:38:09 UTC
(In reply to comment #8)
> As previously hinted, SURBL has added a three level tld list, and also added
> some frequently abused large web hosts to the two and three level tld lists. 
> Please see the announcement at:
> 
>   http://lists.surbl.org/pipermail/announce/2010-February/000202.html
> 
> It may be beneficial for SpamAssassin TLD data and TLD handling code to be
> updated accordingly.  Some of the code may already be in place, for example for
> handling:
> 
>   http://rulesemporium.com/rules/90_3tld.cf
> 
> Some of these issues may be appropriate for new tickets.  SA folks, please
> follow up as needed.

For simplicity pls consider http://www.rulesemporium.com/rules/90_2tld.cf which contains "extra" both 2ltld and 3ltld (incl. SA version check)
Comment 10 Justin Mason 2010-02-26 22:51:08 UTC
Created attachment 4677 [details]
second draft

getting there.  There's a possibility the URIBL_DBL rule may even work if we can figure out a listed URI ;)

"domains_only" and "ips_only" tflags implemented, tested and doc'ed.
Comment 11 Justin Mason 2010-03-01 01:00:00 UTC
(In reply to comment #10)
> Created an attachment (id=4677) [details]
> second draft
> 
> getting there.  There's a possibility the URIBL_DBL rule may even work if we
> can figure out a listed URI ;)
> 
Mar  1 00:59:11.407 [1098] dbg: check: tests=DATE_IN_FUTURE_06_12,RDNS_NONE,T_FSL_HELO_NON_FQDN_2,T_LAZY_LISTWASHING,T_TO_NO_BRKTS_NORDNS,URIBL_DBL

so looks good.
Comment 12 Justin Mason 2010-03-01 10:37:34 UTC
hmm.  actually, we need to pick a score, so it's not quite review-ready just yet.  I'll commit it to trunk anyway.

: 11...; svn commit -m "bug 6335: add support for 'tflags ips_only' and 'tflags domains_only', to control URIDNSBL lookup behaviour on a rule-by-rule basis; add URIBL_DBL rule for Spamhaus DBL, http://www.spamhaus.org/dbl/"
Sending        MANIFEST
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        rules/25_uribl.cf
Sending        t/SATest.pm
Sending        t/data/spam/dnsbl.eml
Adding         t/data/spam/dnsbl_domsonly.eml
Adding         t/data/spam/dnsbl_ipsonly.eml
Sending        t/dnsbl.t
Sending        t/dnsbl_sc_meta.t
Sending        t/uribl.t
Adding         t/uribl_all_types.t
Adding         t/uribl_domains_only.t
Adding         t/uribl_ips_only.t
Transmitting file data .............
Committed revision 917454.
Comment 13 Justin Mason 2010-03-01 16:01:04 UTC
(In reply to comment #12)
> hmm.  actually, we need to pick a score, so it's not quite review-ready just
> yet.

OK, I propose giving it 1.7 points, comparable to:

score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2
score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2
Comment 14 AXB 2010-03-01 16:12:02 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > hmm.  actually, we need to pick a score, so it's not quite review-ready just
> > yet.
> 
> OK, I propose giving it 1.7 points, comparable to:
> 
> score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2
> score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2

(In reply to comment #13)
> (In reply to comment #12)
> > hmm.  actually, we need to pick a score, so it's not quite review-ready just
> > yet.
> 
> OK, I propose giving it 1.7 points, comparable to:
> 
> score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2
> score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2

+1
Comment 15 Justin Mason 2010-03-01 23:06:37 UTC
Created attachment 4682 [details]
with a score

OK, here's something suitable for 3.3.1 -- please vote...
Comment 16 Justin Mason 2010-03-01 23:09:17 UTC
Created attachment 4683 [details]
redo for 3.3.1

oops.  That patch was missing some files due to overlooking some "svn add"s :(  redo, please vote.
Comment 17 Justin Mason 2010-03-01 23:12:04 UTC
Created attachment 4684 [details]
third time lucky

ugh.  out of disk error meant that patch is incomplete!  try this one...
Comment 18 Jeff Chan 2010-03-02 09:28:52 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > hmm.  actually, we need to pick a score, so it's not quite review-ready just
> > yet.
> 
> OK, I propose giving it 1.7 points, comparable to:
> 
> score URIBL_BLACK 0 1.775 0 1.725 # n=0 n=2
> score URIBL_WS_SURBL 0 1.659 0 1.608 # n=0 n=2

1.7 score could be ok.  DBL may not catch as much as SURBL, but may have lower FP rates.
Comment 19 Mark Martinec 2010-03-02 16:58:12 UTC
> Created an attachment (id=4684) [details]
> third time lucky

Is trunk now in sync with the proposed patch?
Comment 20 Justin Mason 2010-03-02 17:46:33 UTC
oops.  good catch Mark, it was still at "second draft".

: 11...; svn commit -m "bug 6335: sync up trunk with latest patch, id 4684, for URIBL_DBL rule" rules/50_scores.cf 
Sending        rules/50_scores.cf
Transmitting file data .
Committed revision 918126.
Comment 21 Kevin A. McGrail 2010-03-02 19:02:35 UTC
+1 for third time lucky patch though I am NOT using the DBL yet.  Do we have the ability to test this on live systems yet?  I'll likely want to do that before giving a +1 to 3.3.1.
Comment 22 Mark Martinec 2010-03-02 19:28:08 UTC
> Do we have the ability to test this on live systems yet?
> I'll likely want to do that before giving a +1 to 3.3.1.

Seems it is working, using the current trunk - I've been using
it for the last hour. Somehow the results are not impressive,
I get about as many hits on spam as on ham (I don't claim all
'ham' is truly ham, but at least 2/3 of it should be).

Even the last mail from Kevin <4B8D63D1.3020609@PCCC.com>
hit URIBL_DBL. I had to reduce it score to 0.3 .
Comment 23 Mark Martinec 2010-03-02 19:36:06 UTC
Sure enough, we are geting 127.0.1.255 "IP queries prohibited!" most of the time:

dbg: async: completed in 0.365 s: URI-DNSBL, DNSBL:dbl.spamhaus.org.:129.206.32.114
dbg: uridnsbl: domain "delphij.net" listed (URIBL_DBL): 127.0.1.255

Somehow the query is an IP address.
Comment 24 Kevin Golding 2010-03-02 20:03:09 UTC
(In reply to comment #23)
> Sure enough, we are geting 127.0.1.255 "IP queries prohibited!" most of the
> time:
> 
> dbg: async: completed in 0.365 s: URI-DNSBL,
> DNSBL:dbl.spamhaus.org.:129.206.32.114
> dbg: uridnsbl: domain "delphij.net" listed (URIBL_DBL): 127.0.1.255
> 
> Somehow the query is an IP address.

I saw that too.  I was wondering if I'd manage to fluff it somehow and zeroed the score for checking later but I guess I'll save the effort now.  Querying directly gave NXDOMAIN on the domains as expected.
Comment 25 Paul Fisher 2010-03-02 20:08:48 UTC
(In reply to comment #23)
> Sure enough, we are geting 127.0.1.255 "IP queries prohibited!" most of the
> time:

Even after the issue causing this particular problem is fixed, as a general protection against SpamAssassin accidentally sending data that the DBL thinks is an IP address (correctly or otherwise), URIBL_DBL should never fire if the DBL returns 127.0.1.255.

Since the DBL is fundamentally represented as a dnset via rbldnsd, the DBL's view of what is and is not an IP address is not perfect.

For example, the DBL thinks that "example.com.1" is an IP address.

$ host example.com.1.dbl.spamhaus.org
example.com.1.dbl.spamhaus.org has address 127.0.1.255

For our local Exim MTA-level/DBL integration that we're using in production at Berkeley, we make sure we have a sane looking TLD before sending a query off to the DBL, and additionally, we ignore any 127.0.1.255 return codes.
Comment 26 Justin Mason 2010-03-02 22:07:01 UTC
I think I've got it -- the URIBL_SBL-style 3-stage NS->A->DNSBL lookups are triggering lookups on IPs in DBL after the A resolution.
Comment 27 Justin Mason 2010-03-02 22:21:45 UTC
Created attachment 4691 [details]
additional fix

this applies on top of "third time lucky".  I think it'll fix the issue.  please test!
Comment 28 Kevin Golding 2010-03-03 10:19:10 UTC
Patch applied, run some test messages through:

- I appear to be querying the DBL.

- It would appear not to report avg.com as listed now.

Appears to be a +1 here.
Comment 29 Mark Martinec 2010-03-03 11:15:56 UTC
(In reply to comment #27)
> Created an attachment (id=4691) [details]
> additional fix
> 
> this applies on top of "third time lucky".  I think it'll fix the issue. 
> please test!

I applied this patch to my current trunk copy (I hope I understood
correctly that the rest is already rolled in), with the latest sa-update
which does include the URIBL_DBL rule:

uridnssub       URIBL_DBL       dbl.spamhaus.org.       A   2130706688
body            URIBL_DBL       eval:check_uridnsbl('URIBL_DBL')
describe        URIBL_DBL       Contains an URL listed in the DBL blocklist
tflags          URIBL_DBL       net domains_only
score URIBL_DBL 0 1.7 0 1.7

But I'm not getting any queries sent out to dbl.spamhaus.org
at all (only queries to multi.surbl.org, multi.uribl.com,
dob.sibl.support-intelligence.net)

Using a testpoint: http://dbltest.com/ in a mail body.
Comment 30 AXB 2010-03-03 11:25:05 UTC
(In reply to comment #29)
> (In reply to comment #27)
> > Created an attachment (id=4691) [details] [details]
> > additional fix
> > 
> > this applies on top of "third time lucky".  I think it'll fix the issue. 
> > please test!
> 
> I applied this patch to my current trunk copy (I hope I understood
> correctly that the rest is already rolled in), with the latest sa-update
> which does include the URIBL_DBL rule:
> 
> uridnssub       URIBL_DBL       dbl.spamhaus.org.       A   2130706688
> body            URIBL_DBL       eval:check_uridnsbl('URIBL_DBL')
> describe        URIBL_DBL       Contains an URL listed in the DBL blocklist
> tflags          URIBL_DBL       net domains_only
> score URIBL_DBL 0 1.7 0 1.7
> 
> But I'm not getting any queries sent out to dbl.spamhaus.org
> at all (only queries to multi.surbl.org, multi.uribl.com,
> dob.sibl.support-intelligence.net)
> 
> Using a testpoint: http://dbltest.com/ in a mail body.

uridnssub       URIBL_DBL       dbl.spamhaus.org.       A   2130706688 ???

what is 2130706688 ?
Comment 31 Mark Martinec 2010-03-03 11:31:18 UTC
> uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688
> 
> what is 2130706688 ?

Decimal for 0x7f000100, i.e. 127.0.1.0 mask.

man Mail::SpamAssassin::Plugin::URIDNSBL

The mask is poorly documented in that POD, the text does not
explain what does it actually do, and whether ones or zeroes
have the desired meaning (the classical Cisco dilema).
Comment 32 Justin Mason 2010-03-03 12:09:36 UTC
hmm.  Karsten found a bug:

'I think I might have found a corner-case problem with the patch (in
trunk) for bug 6335. The problem appears to be, that $dnsbl_lookup_ips
and $is_ip are not independent.

+  my $cf = $scanner->{uridnsbl_active_rules_revipbl};
+  my $dnsbl_lookup_ips = 0;
+  foreach my $rulename (keys %{$cf}) {
+    if ($tflags->{$rulename} !~ /\bdomains_only\b/) {
+      $dnsbl_lookup_ips++;

$dnsbl_lookup_ips == 0  IFF *all* $tflags->{$rulename} *do*
match /domains_only/ (assumption (1)).

Due to

+  if ($dnsbl_lookup_ips && $dom =~ /^\d+\.\d+\.\d+\.\d+$/) {

$is_ip then also is 0, even if $dom indeed *is* an IP.

+      next if ($is_ip && $tflags->{$rulename} =~ /\bdomains_only\b/);

This test later on then fails to skip, because $is_ip is zero. Again,
even in the case of $dom actually being an IP. And the respective tflags
set to domains_only, as per the corner-case assumption of *all* such
rules having that tflag set.'
Comment 33 Justin Mason 2010-03-03 14:07:33 UTC
(In reply to comment #29)
> But I'm not getting any queries sent out to dbl.spamhaus.org
> at all (only queries to multi.surbl.org, multi.uribl.com,
> dob.sibl.support-intelligence.net)

I can confirm that. :(   needs more work.  I also need to fix guenther's issue too.
Comment 34 Justin Mason 2010-03-03 14:27:24 UTC
Created attachment 4692 [details]
redone

OK -- I've addressed guenther's issue, and cleaned up that conditional code in a more consistent way with the existing code.  Also, the reason the rule was now not hitting was because I used the wrong rule type in the first place for URIBL_DBL!  doh.  using "urirhssub" now seems to be working with a test msg here.


applied to trunk: 
: 9...; svn commit -m "bug 6335: address guenther's issue, clean up ips_only/domains_only in a more consistent way, and fix case where URIBL_DBL was not hitting with new code due to use of the wrong rule type"
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        rules/25_uribl.cf
Transmitting file data ..
Committed revision 918483.

the attached patch is against vanilla 3.3, so doesn't depend on applying the earlier patches -- they're now all obsoleted.
Comment 35 Jeff Chan 2010-03-03 15:08:24 UTC
(In reply to comment #31)
> > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688
> > 
> > what is 2130706688 ?
> 
> Decimal for 0x7f000100, i.e. 127.0.1.0 mask.
> 
> man Mail::SpamAssassin::Plugin::URIDNSBL
> 
> The mask is poorly documented in that POD, the text does not
> explain what does it actually do, and whether ones or zeroes
> have the desired meaning (the classical Cisco dilema).

It would be clearer if the number could be specified as an IP address.
Comment 36 Karsten Bräckelmann 2010-03-03 15:16:51 UTC
(In reply to comment #35)
> > > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688
> 
> It would be clearer if the number could be specified as an IP address.

IP addresses are exact match only. ANDed bitmask is an option. REs not supported here.

That number is (almost) identical to 256 anyway. Yes, the AND bitmask is pretty fuzzy. :/
Comment 37 Jeff Chan 2010-03-03 15:20:12 UTC
(In reply to comment #36)
> (In reply to comment #35)
> > > > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688
> > 
> > It would be clearer if the number could be specified as an IP address.
> 
> IP addresses are exact match only. ANDed bitmask is an option. REs not
> supported here.
> 
> That number is (almost) identical to 256 anyway. Yes, the AND bitmask is pretty
> fuzzy. :/

Bitmask would be clearer than hex or decimal number.
Comment 38 Mark Martinec 2010-03-03 15:23:47 UTC
> Committed revision 918483.

Very good, I'm finally getting useful results.

Btw, following my offline communication with João Gouveia, it may be a
good idea to add another rule (e.g. URIBL_DBL_MISUSE, with low/symbolic
score), matching a 127.0.1.255 - to make it obvious when something goes
wrong with our IP classification logic or Spamhaus wildcarding.
Comment 39 Karsten Bräckelmann 2010-03-03 15:28:55 UTC
(In reply to comment #37)
> Bitmask would be clearer than hex or decimal number.

It *is* a bitmask. 32 bit integer, decimal representation.

Hmm, this is Perl. It should accept a 0x hex number, too. complete_dnsbl_lookup() doesn't though, unless $subtest has been converted to /\d+/ behind the scenes at that point.
Comment 40 Jeff Chan 2010-03-03 15:39:37 UTC
(In reply to comment #39)
> (In reply to comment #37)
> > Bitmask would be clearer than hex or decimal number.
> 
> It *is* a bitmask. 32 bit integer, decimal representation.
> 
> Hmm, this is Perl. It should accept a 0x hex number, too.
> complete_dnsbl_lookup() doesn't though, unless $subtest has been converted to
> /\d+/ behind the scenes at that point.

An IP address octet style bitmask would be clearer.  It would look more like an IP address or IP address netmask than a raw number does.
Comment 41 Jeff Chan 2010-03-03 15:41:22 UTC
(In reply to comment #31)
> > uridnssub URIBL_DBL dbl.spamhaus.org. A 2130706688
> > 
> > what is 2130706688 ?
> 
> Decimal for 0x7f000100, i.e. 127.0.1.0 mask.
> 
> man Mail::SpamAssassin::Plugin::URIDNSBL
> 
> The mask is poorly documented in that POD, the text does not
> explain what does it actually do, and whether ones or zeroes
> have the desired meaning (the classical Cisco dilema).

I presume a netmask for the above would be 128.255.254.255 which looks much more reasonable than decimal or hex notation.
Comment 42 Karsten Bräckelmann 2010-03-03 15:43:53 UTC
(In reply to comment #40)
> An IP address octet style bitmask would be clearer.  It would look more like an
> IP address or IP address netmask than a raw number does.

Yup. And the AND bitmask is too fuzzy anyway.  But both these are different, unrelated issues. Feel free to file a bug and link into this discussion.
Comment 43 Justin Mason 2010-03-03 16:27:24 UTC
Created attachment 4693 [details]
match r918549

Taking the chance to address guenther's concerns about the inefficiency of the unnecessary "foreach" loop, and a little extra code/doco cleanup while I was at it.  This also includes a test improvement too to exercise more edge cases.
Comment 44 Karsten Bräckelmann 2010-03-03 16:43:08 UTC
Sorry Justin, I hate to be a bitch like this... ;)

+  my ($is_ip, my $single_dnsbl);
Comment 45 Karsten Bräckelmann 2010-03-03 16:51:13 UTC
$ svn diff
Index: URIDNSBL.pm
===================================================================
--- URIDNSBL.pm	(revision 918562)
+++ URIDNSBL.pm	(working copy)
@@ -664,7 +664,7 @@
   my $tflags = $scanner->{conf}->{tflags};
   my $cf = $scanner->{uridnsbl_active_rules_revipbl};
 
-  my ($is_ip, my $single_dnsbl);
+  my ($is_ip, $single_dnsbl);
   if ($dom =~ /^\d+\.\d+\.\d+\.\d+$/) {
     my $IPV4_ADDRESS = IPV4_ADDRESS;
     my $IP_PRIVATE = IP_PRIVATE;

Sending        Plugin/URIDNSBL.pm
Transmitting file data .
Committed revision 918565.
Comment 46 Mark Martinec 2010-03-03 17:02:26 UTC
In the last two hours or so I got 1900 hits on URIBL_DBL for spam,
and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/).
Good!
Comment 47 Mark Martinec 2010-03-03 17:09:07 UTC
> and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/).

Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc
and only sends out queries for 'co.cc' ???
Comment 48 Karsten Bräckelmann 2010-03-03 17:22:01 UTC
(In reply to comment #47)
> > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/).

$ host co.cc.dbl.spamhaus.org
co.cc.dbl.spamhaus.org has address 127.0.1.2

Hmm. :/

> Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc
> and only sends out queries for 'co.cc' ???

[guenther@monkey trunk]$ grep -rl co.cc .

Doesn't appear to be listed as a second-level TLD, so that would be normal.
Comment 49 Paul Fisher 2010-03-03 17:44:48 UTC
I've filed bug 6363.  This work has impacted production 3.3.0 installs -- URIBL_DBL was pushed to the sa-update channel and is misfiring since the 3.3.1 changes are not in place and URIBL_DBL treats 127.0.1.255 as a valid return code from the DBL.
Comment 50 Justin Mason 2010-03-03 17:45:09 UTC
Created attachment 4695 [details]
match r918599

include guenther's typo fix
Comment 51 Justin Mason 2010-03-03 18:07:18 UTC
Created attachment 4696 [details]
match r918602

update with "if can()" fix
Comment 52 Mark Martinec 2010-03-03 18:41:02 UTC
> > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/).
> Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc
> and only sends out queries for 'co.cc' ???

Isn's dbl.spamhaus.org supposed to receive a full domain name, and do
its own stripping/wildcarding? I could imagine for example that dbl
would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc.
Comment 53 Karsten Bräckelmann 2010-03-03 19:06:29 UTC
(In reply to comment #52)
> > > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/).
> > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc
> > and only sends out queries for 'co.cc' ???
> 
> Isn's dbl.spamhaus.org supposed to receive a full domain name, and do
> its own stripping/wildcarding? I could imagine for example that dbl

Yes. But that would require a *lot* more code and a whole new type, if we want to do that for DBL. Either way, I don't really see why they would list a 2tld, or a free sub-domain hoster, but not some of their hosted stuff.

> would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc.

co.cc.dbl.spamhaus.org has address 127.0.1.2
www.co.cc.dbl.spamhaus.org has address 127.0.1.2

Check out what that is. A free sub-domain hoster. *sigh*

foo.co.cc.dbl.spamhaus.org has address 127.0.1.2

Does respond, but is free for grabs.

osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2
www.osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2

And they do list them...
Comment 54 Mark Martinec 2010-03-03 20:02:14 UTC
>> would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc

> Check out what that is. A free sub-domain hoster. *sigh*

I know, but that does not mean they can't be hosting a couple
of legitimate non-spam domains too, which could be excluded
from a *.co.cc wildcard.

> And they do list them...

Yes, which is a pity, and a false positive for the osvisnet project.
Comment 55 Karsten Bräckelmann 2010-03-03 20:11:29 UTC
Agreed, it is a FP with DBL.

DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com. This is the same situation with a different sub-domain hoster.
Comment 56 Bill Landry 2010-03-03 20:54:05 UTC
I ran sa-update last night and have been monitoring DBL hits since then.  Although it seems to be running fine, I'm finding that it randomly reports the wrong domain in the test result headers.  For example:

3.5 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist
    [URIs: scridest.com]
3.5 URIBL_BLACK Contains an URL listed in the URIBL blacklist
    [URIs: scridest.com]
3.5 URIBL_DBL Contains an URL listed in the DBL blocklist
    [URIs: w.interia.pl]

However, when I test the domain being reported by the DBL header manually, immediately after receiving the message, I get:

host w.interia.pl.dbl.spamhaus.org
Host w.interia.pl.dbl.spamhaus.org not found: 3(NXDOMAIN)

But testing the domain that's reported in the headers by URIBL_BLACK or SURBL or other URIBL, I get:

host scridest.com.dbl.spamhaus.org
scridest.com.dbl.spamhaus.org has address 127.0.1.2

I know this is just a cosmetic issue, but would be nice to have this fixed.

Bill
Comment 57 Karsten Bräckelmann 2010-03-03 21:02:53 UTC
(In reply to comment #56)
> I ran sa-update last night and have been monitoring DBL hits since then. 
> Although it seems to be running fine, I'm finding that it randomly reports the
> wrong domain in the test result headers.  For example:

Last night?

That's a side-effect of the bugs in the first cuts of the code. See today's comments and the code changes.

I believe what you are seeing are FP hits due to the code, that has been corrected since. The reported domain just happens to have been checked first against DBL.

Please update from SVN, and see if you still can reproduce this.
Comment 58 Bill Landry 2010-03-03 21:27:30 UTC
well, since this update was released via sa-update, shouldn't the fix be release there as well, especially since that is how most people update their SA installations?

And since my SA install was done via yum:

rpm -q spamassassin
spamassassin-3.3.0-5.fc12.i686

I'd rather not do an SVN update over the top of my current running RPM.

When is this fix planned to be available via sa-update?

Bill
Comment 59 Karsten Bräckelmann 2010-03-03 21:36:24 UTC
(In reply to comment #58)
> well, since this update was released via sa-update, shouldn't the fix be
> release there as well, especially since that is how most people update their SA
> installations?

See comment 49 and bug 6363.

> I'd rather not do an SVN update over the top of my current running RPM.

Yes. Sorry, I assumed you're running SVN...

> When is this fix planned to be available via sa-update?

Bug 6363 comment 4. Update has been done, should already be propagated to mirrors and DNS.

Meanwhile, add  score URIBL_DBL 0  to local.cf, until sa-update corrected this.
Comment 60 Mark Martinec 2010-03-03 23:54:23 UTC
>> Isn's dbl.spamhaus.org supposed to receive a full domain name, and do
>> its own stripping/wildcarding? I could imagine for example that dbl
>> would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc.
>
> Yes. But that would require a *lot* more code and a whole new type,
> if we want to do that for DBL.

How much is 'a *lot* more code'?

> Either way, I don't really see why they would list a 2tld, or a free
> sub-domain hoster, but not some of their hosted stuff.

For precisely the cases like the one I stumbled across. Because some
of their subdomains can be whitehats. The DNS technology covers such
cases just fine (a wildcard, along with more specific exceptions).

> DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com.
> This is the same situation with a different sub-domain hoster.

Let's modify this example a bit: spammer.live.com blacklisted, but
live.com not. And our plugin strips out the first label, querying for
live.com given an URL http://spammer.live.com/, and we miss a hit.

What I'm driving at is - it doesn't take much stretching to come across
cases of FP or FN due to the fact that URIDNSBL plugin can easily
split/strip domain names differently than DBL does.
Comment 61 Karsten Bräckelmann 2010-03-04 00:26:21 UTC
(In reply to comment #60)
> >> Isn's dbl.spamhaus.org supposed to receive a full domain name, and do
> >> its own stripping/wildcarding? I could imagine for example that dbl
> >> would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc.
> >
> > Yes. But that would require a *lot* more code and a whole new type,
> > if we want to do that for DBL.
> 
> How much is 'a *lot* more code'?

A new URIDNSBL::uri* rule? And most likely some background code, since the existing PMS::get_uri_list() and PMS::get_uri_detail_list() do quite a bit of munging, cleaning, and adding variants that's undesired here. Definitely do not call Util::uri_list_canonify(), Util::uri_to_domain() and friends, which strip on the RegistrarBoundaries.

> > Either way, I don't really see why they would list a 2tld, or a free
> > sub-domain hoster, but not some of their hosted stuff.
> 
> For precisely the cases like the one I stumbled across. Because some
> of their subdomains can be whitehats. The DNS technology covers such
> cases just fine (a wildcard, along with more specific exceptions).

A blackhat that dark that it should be blacklisted. With whitehat customers as sub-domains...

Note that you did not find a case like you just described. I have not found any co.cc sub-domain that was not blacklisted by DBL. Dunno if they would, honestly. All we've seen is *.co.cc blacklisted.

> > DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com.
> > This is the same situation with a different sub-domain hoster.
> 
> Let's modify this example a bit: spammer.live.com blacklisted, but
> live.com not. And our plugin strips out the first label, querying for
> live.com given an URL http://spammer.live.com/, and we miss a hit.

Using vanilla RegistrarBoundaries -- yes.

Using the third-party 90_2tld.cf -- no. The same with URIBL and SURBL.

Given the previous mud-slinging (as evidenced in this bugzilla) between these two URI DNSBLs, whether or not to list 2tld sub-domain hosters, involved traffic, etc, I don't think they would be particularly happy about RegistrarBoundaries stripped variants (including third-party settings), *plus* the unstripped ones.

The latter, in the absence of extensive, agreed-upon lists of 2tld and 3tld sub-hoster domains, is exactly what DBL wants in a case like this, though. So to not being harassed by other URI DNSBLs, we either can re-use those boundaries and settings, or come up with code and a data-structure that keeps only un-altered, un-stripped URIs -- with the path, etc again stripped though.
Comment 62 Karsten Bräckelmann 2010-03-04 00:39:35 UTC
In addition, the previously existing URI DNSBLs do NOT work the same as DBL does with respect to wildcard DNS. Just checked, spammer domain masked.

example.net.multi.uribl.com has address 127.0.0.2
Host sub.example.net.multi.uribl.com not found: 3(NXDOMAIN)

example.net.multi.surbl.org has address 127.0.0.6
Host sub.example.net.multi.surbl.org not found: 3(NXDOMAIN)

They absolutely need the RegistrarBoundaries stripping. Thus, the above definitely would require new data structures and code to not strip anything but keep the full URIs for DBL.
Comment 63 Justin Mason 2010-03-04 00:41:17 UTC
It'd effectively be a pretty large quantity of DBL-specific code, alright.
Comment 64 Karsten Bräckelmann 2010-03-04 01:01:38 UTC
Oh, and given Spamhaus chooses to actively fight useless traffic by deliberately returning FPs for IP lookups, they most likely wouldn't be happy about multiple lookups like co.cc and foo.co.cc either.

Dilemma. Strip registrar boundaries, as we used to. Or introduce code to not do that, specifically for DBL.
Comment 65 Jeff Chan 2010-03-04 10:45:54 UTC
> > > DBL lists $spammer.spaces.live.com, but they won't list spaces.live.com.
> > > This is the same situation with a different sub-domain hoster.
> > 
> > Let's modify this example a bit: spammer.live.com blacklisted, but
> > live.com not. And our plugin strips out the first label, querying for
> > live.com given an URL http://spammer.live.com/, and we miss a hit.
> 
> Using vanilla RegistrarBoundaries -- yes.
> 
> Using the third-party 90_2tld.cf -- no. The same with URIBL and SURBL.
> 
> Given the previous mud-slinging (as evidenced in this bugzilla) between these
> two URI DNSBLs, whether or not to list 2tld sub-domain hosters, involved
> traffic, etc, I don't think they would be particularly happy about
> RegistrarBoundaries stripped variants (including third-party settings), *plus*
> the unstripped ones.
> 
> The latter, in the absence of extensive, agreed-upon lists of 2tld and 3tld
> sub-hoster domains, is exactly what DBL wants in a case like this, though. So
> to not being harassed by other URI DNSBLs, we either can re-use those
> boundaries and settings, or come up with code and a data-structure that keeps
> only un-altered, un-stripped URIs -- with the path, etc again stripped though.

SURBL and URIBL have agreed on most of the 2ld and 3lds in our tld lists now.  co.cc is included in the 2ld list, which means foo.co.cc would be checked, but co.cc would not.  In other words, the right thing is done.  Thus the request to use the revised TLD data for RegistrarBoundaries:

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6361
Comment 66 Jeff Chan 2010-03-04 10:56:21 UTC
(In reply to comment #64)
> Oh, and given Spamhaus chooses to actively fight useless traffic by
> deliberately returning FPs for IP lookups, they most likely wouldn't be happy
> about multiple lookups like co.cc and foo.co.cc either.
> 
> Dilemma. Strip registrar boundaries, as we used to. Or introduce code to not do
> that, specifically for DBL.

Spamhaus would like DBL to be thought of as a new type of list, so perhaps a new rule might be appropriate than trying to modify urirhssub.  Some of the differences between a URIBL and DBL are:

1.  No URI IPs in DBL.  DBL is not a URI list; it's a domain list.

2.  IPs in URIs should be checked in SBL instead of DBL.  Obviously this complicates URI handling somewhat.

3.  DBL uses wildcards, therefore domain stripping by the mail filter application is not necessary.  This simplifies the application, but it may have a negative impact on on-rbldnsd DNS caching, for example BIND forwards.

A relevant question is whether these differences are enough to justify a new rule.
Comment 67 Jeff Chan 2010-03-04 10:57:44 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/).
> 
> $ host co.cc.dbl.spamhaus.org
> co.cc.dbl.spamhaus.org has address 127.0.1.2
> 
> Hmm. :/
> 
> > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc
> > and only sends out queries for 'co.cc' ???
> 
> [guenther@monkey trunk]$ grep -rl co.cc .
> 
> Doesn't appear to be listed as a second-level TLD, so that would be normal.


It seems that co.cc is no longer listed in DBL:

% host co.cc.dbl.spamhaus.org
Host co.cc.dbl.spamhaus.org not found: 3(NXDOMAIN)
Comment 68 Jeff Chan 2010-03-04 10:59:59 UTC
(In reply to comment #53)
> (In reply to comment #52)
> > > > and exactly one false positive (on: co.cc, http://www.osvisnet.co.cc/).
> > > Is it normal that uridnsbl strips '*osvisnet' out of www.osvisnet.co.cc
> > > and only sends out queries for 'co.cc' ???
> > 
> > Isn's dbl.spamhaus.org supposed to receive a full domain name, and do
> > its own stripping/wildcarding? I could imagine for example that dbl
> 
> Yes. But that would require a *lot* more code and a whole new type, if we want
> to do that for DBL. Either way, I don't really see why they would list a 2tld,
> or a free sub-domain hoster, but not some of their hosted stuff.
> 
> > would NOT blacklist osvisnet.co.cc but keep blacklisting co.cc.
> 
> co.cc.dbl.spamhaus.org has address 127.0.1.2
> www.co.cc.dbl.spamhaus.org has address 127.0.1.2
> 
> Check out what that is. A free sub-domain hoster. *sigh*
> 
> foo.co.cc.dbl.spamhaus.org has address 127.0.1.2
> 
> Does respond, but is free for grabs.
> 
> osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2
> www.osvisnet.co.cc.dbl.spamhaus.org has address 127.0.1.2
> 
> And they do list them...

No longer listed it seems:

host osvisnet.co.cc.dbl.spamhaus.org
Host osvisnet.co.cc.dbl.spamhaus.org not found: 3(NXDOMAIN)
Comment 69 Justin Mason 2010-03-04 12:14:27 UTC
(In reply to comment #66)
> Spamhaus would like DBL to be thought of as a new type of list, so perhaps a
> new rule might be appropriate than trying to modify urirhssub. 

I considered adding a new rule keyword for this.  My thoughts were that we already have a lot of different rule types, and in reality this is the same kind of RHSBL-lookup mechanism -- just with a different selection of what to query.  Hence a tflag was more appropriate.


Regarding the IP mask -- 0x7f000100 was chosen since initial plans were to return 127.0.0.2 for the "don't do that" listings for IP addresses.  That would not match those, instead matching solely the 127.0.1.2 "real" listings.  Since then, though, Spamhaus have instead gone with using 127.0.1.255, which is in the 127.0.1.x space, rendering that attempt futile.

We could therefore now just use 2 as the mask for the URIBL_DBL rule, with the same effect and causing less confusion.  devs, thoughts?
Comment 70 Mark Martinec 2010-03-04 13:36:19 UTC
> Regarding the IP mask -- 0x7f000100 was chosen since initial plans were to
> return 127.0.0.2 for the "don't do that" listings for IP addresses.  That would
> not match those, instead matching solely the 127.0.1.2 "real" listings.  Since
> then, though, Spamhaus have instead gone with using 127.0.1.255, which is in
> the 127.0.1.x space, rendering that attempt futile.
> 
> We could therefore now just use 2 as the mask for the URIBL_DBL rule, with the
> same effect and causing less confusion.  devs, thoughts?

Please followups on this masking topic to Bug 6362.
Comment 71 Karsten Bräckelmann 2010-03-04 14:40:11 UTC
Spamhaus DBL return codes / sections are aligned on human comprehensible decimal boundaries. Thus a bitmask does not have any advantage here.
  http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291

Instead of using a bitmask of 2, I'd rather use the expected 127.0.1.2 return code for the URIBL_DBL rule. This, unlike a bitmask, will also prevent the "IP queries prohibited!" .225 response from FP'ing.
Comment 72 Mark Martinec 2010-03-04 16:48:32 UTC
(In reply to comment #69)
> (In reply to comment #66)
> > Spamhaus would like DBL to be thought of as a new type of list, so perhaps a
> > new rule might be appropriate than trying to modify urirhssub. 
> 
> I considered adding a new rule keyword for this.  My thoughts were that we
> already have a lot of different rule types, and in reality this is the same
> kind of RHSBL-lookup mechanism

We already have a:
  urinsrhsbl,urinsrhssub  vs.  urifullnsrhsbl,urifullnsrhssub  rules

so a urifullrhsbl would fill-in a natural missing gap:
  urirhsbl vs. urifullrhsbl

and avoid having to worry about matching zone boundaries for this type of a query.
Comment 73 Justin Mason 2010-03-05 16:02:38 UTC
we've lost track of status here -- this still isn't in 3.3.1.
Comment 74 Mark Martinec 2010-03-05 16:17:42 UTC
> we've lost track of status here -- this still isn't in 3.3.1.

I'd consider this bug a dependency on Bug 6362, along with a need
to rewrite rule(s) according to the DBL documentation:
  http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291

Consider example in Bug 6362 comment 8 as a suggestion, and also
Comment 38 in this bug.

There is also a dependency on Bug 6361 to reduce chances on mismatched
boundaries - and ideally a promise to turn these rules into urifullrhsbl
eventually. I see the current stripping solution as just a stop-gap
temporary solution - works most of the time, but is not according to
Spamhaus specifications.
Comment 75 Justin Mason 2010-03-05 16:36:32 UTC
(In reply to comment #74)
> > we've lost track of status here -- this still isn't in 3.3.1.
> 
> I'd consider this bug a dependency on Bug 6362, along with a need
> to rewrite rule(s) according to the DBL documentation:
>   http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291

those aren't yet in production, though.

> Consider example in Bug 6362 comment 8 as a suggestion, and also
> Comment 38 in this bug.
> 
> There is also a dependency on Bug 6361 to reduce chances on mismatched
> boundaries - and ideally a promise to turn these rules into urifullrhsbl
> eventually. I see the current stripping solution as just a stop-gap
> temporary solution - works most of the time, but is not according to
> Spamhaus specifications.

do you think the current patch needs reworking before it is suitable for release as 3.3.1?
Comment 76 Mark Martinec 2010-03-05 17:09:00 UTC
> > according to the DBL documentation:
> >   http://www.spamhaus.org/faq/answers.lasso?section=Spamhaus%20DBL#291
> 
> those aren't yet in production, though.

True, but the moment they are (like the PHISH and MALWARE ranges),
we'd need to issue another code release, if we do not introduce
subtest ranges now.

> > Consider example in Bug 6362 comment 8 as a suggestion, and also
> > Comment 38 in this bug.

> do you think the current patch needs reworking before it is suitable for
> release as 3.3.1?

I think the code is alright as it is (barring a wish for enhancement),
but I'd like to see rules augmented: separate phish and malware from
spam, and provide a separate rule to catch the .1.255 error status,
so that any future blunder will be detected soon.
Comment 77 Mark Martinec 2010-03-05 17:51:11 UTC
> True, but the moment they are (like the PHISH and MALWARE ranges),
> we'd need to issue another code release, if we do not introduce
> subtest ranges now.

Btw, is there a testpoint like dbltest.com for phish/malware ranges?
It could facilitate testing.
Comment 78 Justin Mason 2010-03-08 12:07:18 UTC
(In reply to comment #76)
> > > Consider example in Bug 6362 comment 8 as a suggestion, and also
> > > Comment 38 in this bug.
> 
> > do you think the current patch needs reworking before it is suitable for
> > release as 3.3.1?
> 
> I think the code is alright as it is (barring a wish for enhancement),
> but I'd like to see rules augmented: separate phish and malware from
> spam, and provide a separate rule to catch the .1.255 error status,
> so that any future blunder will be detected soon.

ok, I've done that.  I haven't actually _added_ rules for phish and malware yet, as Spamhaus haven't yet started using those codes, and we can trivially add them in updates.
Comment 79 Justin Mason 2010-03-08 12:09:08 UTC
Created attachment 4699 [details]
match r920291.

: 114...; svn commit -m "bug 6335: rename URIBL_DBL to URIBL_DBL_SPAM, to make way for future DBL subtypes like phish or malware; use new IP-address style lookup result; add URIBL_DBL_ERROR to detect lookups of IP addresses due to SA bugs"
Sending        rules/25_uribl.cf
Sending        rules/50_scores.cf
Transmitting file data ..
Committed revision 920291.

that's in trunk.  patch attached.
Comment 80 Mark Martinec 2010-03-10 02:42:30 UTC
> Created an attachment (id=4699)
> match r920291.
> that's in trunk.  patch attached.

+1
Comment 81 Mark Martinec 2010-03-10 23:51:26 UTC
Btw, I have reviewed 10 days worth of our amavisd log, and I'm very pleased
with the quality of the URIBL_DBL (now URIBL_DBL_SPAM) rule. Checked all
messages that passed as ham and had a hit on DBL, and apart from mail of our
own discussion on the topic with samples, and a few marginal cases, there
were practically no hard false positives. I bumped our (local) score to 2.8 .
Comment 82 Warren Togami 2010-03-11 03:09:11 UTC
+1 for 3.3.x

We're voting to backport for 3.3.1 right?
Comment 83 Mark Martinec 2010-03-11 11:16:22 UTC
> We're voting to backport for 3.3.1 right?

Indeed.

Btw, when applying this patch, it may conflict with the just applied
patch from Bug 6362, a hunk or two may need to be manually resolved
(or do it the other way around: undo the r921800, apply this patch,
and redo the backed-off change).
Comment 84 Justin Mason 2010-03-11 14:41:38 UTC
thanks - applied.

: 312...; svn commit -m "bug 6335: add Spamhaus DBL as URIBL_DBL_SPAM rule, leaving room for future DBL subtypes like phish or malware; add URIBL_DBL_ERROR to detect lookups of IP addresses due to possible future bugs"
Sending        MANIFEST
Sending        lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
Sending        rules/25_uribl.cf
Sending        rules/50_scores.cf
Sending        t/SATest.pm
Sending        t/data/spam/dnsbl.eml
Adding         t/data/spam/dnsbl_domsonly.eml
Adding         t/data/spam/dnsbl_ipsonly.eml
Sending        t/dnsbl.t
Sending        t/dnsbl_sc_meta.t
Sending        t/uribl.t
Adding         t/uribl_all_types.t
Adding         t/uribl_domains_only.t
Adding         t/uribl_ips_only.t
Transmitting file data ..............
Committed revision 921873.
Comment 85 Mark Martinec 2010-03-11 20:44:51 UTC
Found a minor glitch in a test t/uribl_all_types.t, which shows up
under perl 5.6.2 .


$ perl t/uribl_all_types.t
1..3
# Running under perl version 5.006002 for freebsd
# Current time local: Thu Mar 11 21:43:41 2010
# Current time GMT:   Thu Mar 11 20:43:41 2010
# Using Test.pm version 1.24
        /usr/local/bin/perl -T -w ../spamassassin.raw -C log/test_rules_copy  --siteconfigpath log/localrules.tmp -p log/test_default.cf  -t < data/spam/dnsbl.eml
ok 1
        Checking X_URIBL_DOMSONLY
        Not found: X_URIBL_DOMSONLY =  X_URIBL_DOMSONLY [URIs: uribl-example-c.com]  at t/uribl_all_types.t line 45.
not ok 2
# Failed test 2 in t/SATest.pm at line 717
        Checking X_URIBL_IPSONLY
ok 3
Output can be examined in: log/d.uribl_all_types/1




Under 5.6.2 the result is:

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.0 X_URIBL_IPSONLY        X_URIBL_IPSONLY
                            [URIs: 144.137.3.98]
 5.0 TEST_INVALID_DATE      Invalid Date: header (not RFC 2822)
 0.0 T_FSL_HELO_NON_FQDN_2  T_FSL_HELO_NON_FQDN_2
 0.0 T_LAZY_LISTWASHING     Lazy spammer, painfully obvious bogus addresses
 1.0 X_URIBL_DOMSONLY       X_URIBL_DOMSONLY
                            [URIs: uribl-example-b.com]
 0.0 DATE_IN_FUTURE_06_12   Date: is 6 to 12 hours after Received: date
 0.0 NORMAL_HTTP_TO_IP      URI: Uses a dotted-decimal IP address in URL
 5.0 TEST_NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL
 0.0 T_TO_NO_BRKTS_NORDNS   T_TO_NO_BRKTS_NORDNSl network by a host with no rDNS:

under 5.10.1 the result is:

 pts rule name              description
---- ---------------------- --------------------------------------------------
 1.0 X_URIBL_IPSONLY        X_URIBL_IPSONLY
                            [URIs: 144.137.3.98]
 1.0 X_URIBL_DOMSONLY       X_URIBL_DOMSONLY
                            [URIs: uribl-example-c.com]
 5.0 TEST_INVALID_DATE      Invalid Date: header (not RFC 2822)
 0.0 T_FSL_HELO_NON_FQDN_2  T_FSL_HELO_NON_FQDN_2
 0.0 T_LAZY_LISTWASHING     Lazy spammer, painfully obvious bogus addresses
 0.0 DATE_IN_FUTURE_06_12   Date: is 6 to 12 hours after Received: date
 0.0 NORMAL_HTTP_TO_IP      URI: Uses a dotted-decimal IP address in URL
 5.0 TEST_NORMAL_HTTP_TO_IP URI: Uses a dotted-decimal IP address in URL
 1.3 RDNS_NONE              Delivered to internal network by a host with no rDNS
 0.0 T_TO_NO_BRKTS_NORDNS   T_TO_NO_BRKTS_NORDNS


Apart from reordered results, the key difference is 'b' vs 'c' in:
  [URIs: uribl-example-b.com]
vs.:
  [URIs: uribl-example-c.com]

As there are both links in the sample message:
  me too: http://uribl-example-b.com/
  me too: http://uribl-example-c.com/
either hit should do to fulfill the %patterns requirement.

Don't know why a different one is chosen under 5.6.2, but is not incorrect,
so the test should probably be able to cope with either.
Comment 86 Mark Martinec 2010-03-12 01:10:24 UTC
Btw, the reason for a different behaviour may not be with perl 5.6.
Could just well be due to running the test on a virtual host behind
a NAT, or some other environmental reason.
Comment 87 Justin Mason 2010-03-12 14:09:39 UTC
Created attachment 4710 [details]
fix for nondeterministic test issue

ok, here's a fix for that. 

: 172...; svn commit -m "bug 6335: avoid non-deterministic test issue" t/uribl_all_types.t
Sending        t/uribl_all_types.t
Transmitting file data .
Committed revision 922263.

Do you think this needs to make it into 3.3.1?  Note that those tests are not run by default (they require both "run_net_tests" and "run_long_tests" to be turned on.)
Comment 88 Mark Martinec 2010-03-13 23:58:52 UTC
> Committed revision 922263.
> 
> Do you think this needs to make it into 3.3.1?  Note that those tests are not
> run by default (they require both "run_net_tests" and "run_long_tests" to be
> turned on.)

Yes, I think it should make it into 3.3.1:
- it's a trivial fix, can't hurt even when not needed;
- as we do not yet know what is a cause behind this nondeterminism,
  it's better to play safe and not require something which may not be
  relied upon
Comment 89 Mark Martinec 2010-03-14 00:02:47 UTC
3.3:

Bug 6335: avoid non-deterministic test issue
Sending t/uribl_all_types.t
Committed revision 922695.
Comment 90 Mark Martinec 2010-03-14 00:54:38 UTC
re-closing
Comment 91 Karsten Bräckelmann 2010-03-14 14:39:35 UTC
Actually doing comment 90. :)