--- /dev/null
+Message-Id: <199412081919.NAA23234@austin.BSDI.COM>
+To: Eric Allman <eric@cs.berkeley.edu>
+Subject: Re: sorting mailings lists with fastest delivery users first
+In-reply-to: Your message of Thu, 08 Dec 1994 06:08:33 PST.
+References: <199412081408.GAA06210@mastodon.CS.Berkeley.EDU>
+From: Tony Sanders <sanders@bsdi.com>
+Organization: Berkeley Software Design, Inc.
+Date: Thu, 08 Dec 1994 13:19:39 -0600
+Sender: sanders@austin.BSDI.COM
+
+Eric Allman writes:
+> Nope, that's a new one, so far as I know. Any interest in
+> contributing it? For small lists it seems overkill, but for
+> large lists it could be a major win.
+
+Sure, I will contribute it; after I sent you mail last night I went ahead
+and finished up what I thought needed to be done. I would like to get
+some feedback from you on a few items, if you have time.
+
+There are two programs, mailprio_mkdb and mailprio (source below).
+
+mailprio_mkdb reads maillog files and creates a DB file of address vs.
+delay. I'm not too happy with how it does the averages right now but this
+is just a quick hack. However, it should at least order sites that take
+days vs. those that deliver on the first pass through. One thing that
+would make this information a lot more accurate is if sendmail could log
+a "transaction delay" (on failures also), as well as total delivery delay.
+Perhaps, as an option, it could maintain the DB file itself?
+
+mailprio then simply reads a list of addresses from stdin (the mailing
+list), and tries to prioritize them according to the info the database.
+It collects comment lines and other junk at the top of the file; all
+mailprio does is reorder lines, the actual text of the file should
+be unchanged to the extent that you can verify it with:
+ sort sorted_list > checkit; sort mailing-list | diff - checkit
+Users with no delay information are put next. The prioritized list is last.
+Of course, this function could also be built-into sendmail (eventually).
+
+Putting "new account" info at the top with the current averaging function
+probably adversly affects the prioritized list (at least in the short
+term), but putting it at the bottom would not really give the new accounts
+a fair chance. I suspect this isn't that big of a problem. I'm running
+this here on a list with 461 accounts and about 10 messages per day so
+I'll see how it goes. I'll keep some stats on delay times and see what
+happens.
+
+Another thing that would help this situation, is if sendmail had the queue
+ordered by site (but you already know this). If you ever get to do per
+site queuing you should consider "blocking" a queue for some short period
+of time if a connection fails to that site [sendmail does this inside a
+single process on a per account basis now right?]; this would allow multiple
+sendmails to quickly skip over those sites for people like me that run:
+
+ for i in 1 2 3 4 5 6 7 8 ; do daemon sendmail -q; done
+
+to flush a queue that has gotten behind. You could also do this inside
+sendmail with a parallelism option (when it is time to run the queue, how
+many processes to start).
+
+#! /bin/sh
+# This is a shell archive. Remove anything before this line, then unpack
+# it by saving it into a file and typing "sh file". To overwrite existing
+# files, type "sh file -c". You can also feed this as standard input via
+# unshar, or by typing "sh <file", e.g.. If this archive is complete, you
+# will see the following message at the end:
+# "End of shell archive."
+# Contents: mailprio mailprio_mkdb
+# Wrapped by sanders@austin.BSDI.COM on Fri Dec 9 18:07:02 1994
+PATH=/bin:/usr/bin:/usr/ucb ; export PATH
+if test -f 'mailprio' -a "${1}" != "-c" ; then
+ echo shar: Will not clobber existing file \"'mailprio'\"
+else
+echo shar: Extracting \"'mailprio'\" \(3093 characters\)
+sed "s/^X//" >'mailprio' <<'END_OF_FILE'
+X#!/usr/bin/perl
+X#
+X# mailprio -- setup mail priorities for a mailing list
+X#
+X# Sort mailing list by mailprio database:
+X# mailprio < mailing-list > sorted_list
+X# Double check against orig:
+X# sort sorted_list > checkit; sort mailing-list | diff - checkit
+X# If it checks out, install it.
+X#
+X# TODO:
+X# option to process mqueue files so we can reorder files in the queue!
+X$usage = "Usage: mailprio [-p priodb]\n";
+X$home = "/home/sanders/lists";
+X$priodb = "$home/mailprio";
+X
+Xif ($main'ARGV[0] =~ /^-/) {
+X $args = shift;
+X if ($args =~ m/\?/) { print $usage; exit 0; }
+X if ($args =~ m/p/) {
+X $priodb = shift || die $usage, "-p requires argument\n"; }
+X}
+X
+X# In shell script, it goes something like this:
+X# old_mailprio > /tmp/a
+X# fgrep -f lists/inet-access /tmp/a | sed -e 's/^.......//' > /tmp/b
+X# ; /tmp/b contains list of known users, faster delivery first
+X# fgrep -v -f /tmp/b lists/inet-access > /tmp/c
+X# ; put all unknown stuff at the top of new list for now
+X# echo '# -----' >> /tmp/c
+X# cat /tmp/b >> /tmp/c
+X
+X# Setup %list and @list
+Xlocal($addr, $canon);
+Xwhile ($addr = <STDIN>) {
+X chop $addr;
+X next if $addr =~ /^# ----- /; # that's our line
+X push(@list, $addr), next if $addr =~ /^\s*#/; # save comments
+X $canon = &canonicalize((&simplify_address($addr))[0]);
+X unless (defined $canon) {
+X warn "no address found: $addr\n";
+X push(@list, $addr); # save it anyway
+X next;
+X }
+X if (defined $list{$canon}) {
+X warn "duplicate: ``$addr -> $canon''\n";
+X push(@list, $addr); # save it anyway
+X next;
+X }
+X $list{$canon} = $addr;
+X}
+X
+Xlocal(*prio);
+Xdbmopen(%prio, $priodb, 0644) || die "$priodb: $!\n";
+Xforeach $to (keys %list) {
+X if (defined $prio{$to}) {
+X # add to list of found users (%userprio) and remove from %list
+X # so that we know what users were not yet prioritized
+X $userprio{$to} = $prio{$to}; # priority
+X $useracct{$to} = $list{$to}; # string
+X delete $list{$to};
+X }
+X}
+Xdbmclose(%prio);
+X
+X# Put all the junk we found at the very top
+X# (this might not always be a feature)
+Xprint join("\n", @list), "\n";
+X
+X# unprioritized users go next, slow accounts will get moved down quickly
+Xprint '# ----- unprioritized users', "\n";
+Xforeach $to (keys %list) { print $list{$to}, "\n"; }
+X
+X# finally, our prioritized list of users
+Xprint '# ----- prioritized users', "\n";
+Xforeach $to (sort { $userprio{$a} <=> $userprio{$b}; } keys %userprio) {
+X die "Opps! Something is seriously wrong with useracct: $to\n"
+X unless defined $useracct{$to};
+X print $useracct{$to}, "\n";
+X}
+X
+Xexit(0);
+X
+X# REPL-LIB ---------------------------------------------------------------
+X
+Xsub canonicalize {
+X local($addr) = @_;
+X # lowercase, strip leading/trailing whitespace
+X $addr =~ y/A-Z/a-z/; $addr =~ s/^\s+//; $addr =~ s/\s+$//; $addr;
+X}
+X
+X# @addrs = simplify_address($addr);
+Xsub simplify_address {
+X local($_) = shift;
+X 1 while s/\([^\(\)]*\)//g; # strip comments
+X 1 while s/"[^"]*"//g; # strip comments
+X split(/,/); # split into parts
+X foreach (@_) {
+X 1 while s/.*<(.*)>.*/\1/;
+X s/^\s+//;
+X s/\s+$//;
+X }
+X @_;
+X}
+END_OF_FILE
+if test 3093 -ne `wc -c <'mailprio'`; then
+ echo shar: \"'mailprio'\" unpacked with wrong size!
+fi
+chmod +x 'mailprio'
+# end of 'mailprio'
+fi
+if test -f 'mailprio_mkdb' -a "${1}" != "-c" ; then
+ echo shar: Will not clobber existing file \"'mailprio_mkdb'\"
+else
+echo shar: Extracting \"'mailprio_mkdb'\" \(3504 characters\)
+sed "s/^X//" >'mailprio_mkdb' <<'END_OF_FILE'
+X#!/usr/bin/perl
+X#
+X# mailprio_mkdb -- make mail priority database based on delay times
+X#
+X$usage = "Usage: mailprio_mkdb [-l maillog] [-p priodb]\n";
+X$home = "/home/sanders/lists";
+X$maillog = "/var/log/maillog";
+X$priodb = "$home/mailprio";
+X
+Xif ($main'ARGV[0] =~ /^-/) {
+X $args = shift;
+X if ($args =~ m/\?/) { print $usage; exit 0; }
+X if ($args =~ m/l/) {
+X $maillog = shift || die $usage, "-l requires argument\n"; }
+X if ($args =~ m/p/) {
+X $priodb = shift || die $usage, "-p requires argument\n"; }
+X}
+X
+Xlocal(*prio);
+X# We'll merge with existing information if it's already there.
+Xdbmopen(%prio, $priodb, 0644) || die "$priodb: $!\n";
+X&getlog_stats($maillog, *prio);
+X# foreach $addr (sort { $prio{$a} <=> $prio{$b}; } keys %prio) {
+X# printf("%06d %s\n", $prio{$addr}, $addr); }
+Xdbmclose(%prio);
+Xexit(0);
+X
+Xsub getlog_stats {
+X local($maillog, *stats) = @_;
+X local($to, $delay);
+X local($h, $m, $s);
+X open(MAILLOG, "< $maillog") || die "$maillog: $!\n";
+X while (<MAILLOG>) {
+X ($delay) = (m/, delay=([^,]*), /);
+X $delay || next;
+X ($h, $m, $s) = split(/:/, $delay);
+X $delay = ($h * 60 * 60) + ($m * 60) + $s;
+X
+X # deleting everything after ", " seems safe enough, though
+X # it is possible that it was inside "..."'s and that we will
+X # miss some addresses because of it. However, I'm not willing
+X # to do full parsing just for that case. If this bothers you
+X # you could do something like: s/, (delay|ctladdr)=.*//;
+X # but you have to make sure you catch all the possible names.
+X $to = $_; $to =~ s/^.* to=//; $to =~ s/, .*//;
+X foreach $addr (&simplify_address($to)) {
+X next unless $addr;
+X $addr = &canonicalize($addr);
+X # print $delay, " ", $addr, "\n";
+X $stats{$addr} = $delay unless defined $stats{$addr}; # init
+X
+X # This average function moves the value around quite rapidly
+X # which may or may not be a feature.
+X #
+X # This has at least one odd behavior because we currently only
+X # use the delay information from maillog which is only logged
+X # on actual delivery. This works backwards from what we really
+X # want to happen when a fast host goes down for a while and then
+X # comes back up.
+X #
+X # I spoke with Eric and he suggested adding an xdelay statistic
+X # for a per transaction delay which would help that situation
+X # a lot. What I believe you want in that cases something like:
+X # delay fast, xdelay fast: smokin', these hosts go first
+X # delay slow, xdelay fast: put host high on the list (back up?)
+X # delay fast, xdelay slow: host is down/having problems/slow
+X # delay slow, xdelay slow: poorly connected sites, very last
+X # Of course, you have to reorder the distribution list fairly
+X # often for that to help. Come to think of it, you should
+X # also reorder /var/spool/mqueue files also (if they aren't
+X # locked of course). Hmmm....
+X $stats{$addr} = int(($stats{$addr} + $delay) / 2);
+X }
+X }
+X close(MAILLOG);
+X}
+X
+X# REPL-LIB ---------------------------------------------------------------
+X
+Xsub canonicalize {
+X local($addr) = @_;
+X # lowercase, strip leading/trailing whitespace
+X $addr =~ y/A-Z/a-z/; $addr =~ s/^\s+//; $addr =~ s/\s+$//; $addr;
+X}
+X
+X# @addrs = simplify_address($addr);
+Xsub simplify_address {
+X local($_) = shift;
+X 1 while s/\([^\(\)]*\)//g; # strip comments
+X 1 while s/"[^"]*"//g; # strip comments
+X split(/,/); # split into parts
+X foreach (@_) {
+X 1 while s/.*<(.*)>.*/\1/;
+X s/^\s+//;
+X s/\s+$//;
+X }
+X @_;
+X}
+END_OF_FILE
+if test 3504 -ne `wc -c <'mailprio_mkdb'`; then
+ echo shar: \"'mailprio_mkdb'\" unpacked with wrong size!
+fi
+chmod +x 'mailprio_mkdb'
+# end of 'mailprio_mkdb'
+fi
+echo shar: End of shell archive.
+exit 0