DragonFly On-Line Manual Pages
ezmlm-archive(1) DragonFly General Commands Manual ezmlm-archive(1)
NAME
ezmlm-archive - create thread and author index for a mailing list
archive
SYNOPSIS
ezmlm-archive [ -cCFTvV ][ -f msg1 ] ][ -t msg2 ] dir
DESCRIPTION
ezmlm-archive reads the index files from a message archive, and creates
a subject index, a collection of subject files, and a collection of
author files. These files are suitable as an index for WWW access to,
and navigation through a mailing list archive by ezmlm-cgi(1).
The index files read are created by ezmlm-idx(1) on a per-list basis
and by ezmlm-send(1) on a per-message archive for a indexed list.
The output files created are:
dir/archive/threads/yyyymm
The thread index. It contains one line per subject, starting
with the number of the first message with that subject within
the set investigated, ``:'', a 20 character subject hash, blank,
``[n]'' where ``n'' is the number of messages in the thread,
blank, and the subject. The file ``yyyymm'' contains entries
for all threads that have messages in the month ``yyyymm'' or
that have messages both before and after that month. The
subject hash is a key to the subject files; the message number
is a key to the index file. The lines are in ascending order by
message number when the index is created de novo on an existing
archive. When the messages are added one-by-one as in normal
archive operation, ``n'' is the number of message in the thread
for the particular month and the order is in reverse of latest
message, i.e. the last extended thread is shown last. The
message number accompanying a thread is always a message within
the thread. It is the first in archives created on existing
lists, and the last message in incrementally created archives.
Use the corresponding subject index file to get a list of all
messages in the thread in ascending order.
dir/archive/subjects/xx/yyyyyyyyyyyyyyyyyy
A subject file. The first line is the subject hash, a space, and
the subject. This is followed by one line per message with this
subject, in the format message number, ``:'', date (yyyymm),
``:'', author hash, blank, author from line. The lines are
sorted by message number. The author hash is a key to the author
files; the message number is a key to the index file. The file
in the example would be for the subject hash
``xxyyyyyyyyyyyyyyyyyy''.
dir/archive/authors/xx/yyyyyyyyyyyyyyyyyy
An author file. The first line is the author hash, a space, and
the author from line. This is followed by one line per message
with this author, in the format message number, ``:'', date
(yyyymm), ``:'', subject hash, blank, subject. The lines are
sorted by message number. The subject hash is a key to the
subject files; the message number is a key to the index file.
The file in the example would be for the author hash
``xxyyyyyyyyyyyyyyyyyy''.
dir/archnum keeps track of the last message processed. Normally,
ezmlm-archive will process entries for messages from one above
the contents of this file up to an including the message number
in dir/num.
OPTIONS
ezmlm-archive writes messages in a crash-proof manner when run in
normal mode. When overriding the normal message range with any of the
options listed, the normal sync(3) of the output files is suppressed
for efficiency. Should the computer crash during this time the state of
the indices is not defined. Use the -s option in the (extremely rare)
cases where this would be a problem.
-c Create a new index. This overrides dir/archnum causing
ezmlm-archive to start with the first message in the archive.
Synonym for -f0. NOTE: ezmlm-archive does not remove files in
the index. While it will overwrite/update old files it will not
remove files that are obsolete for other reasons.
-C (Default.) Process entries starting with the message after the
message listed in dir/archnum.
-f msg1
Process messages from the archive section (set of 100 messages)
containing message msg1. This is useful if you have removed
part of the archive, as it will shorten processing time and
decrease memory use. NOTE: ezmlm-archive does not remove files
in the index. While it will overwrite/update old files it will
not remove files that are obsolete for other reasons. The number
of messages per thread will be incorrect when using of the -f
and -t switches leads to partial re-indexing of already indexed
messages.
-F (Default.) Do not change the starting message from the default
(see -C).
-s Always sync files.
-S (Default.) Sync files, except when on of the message range
modifying options is used.
-t msg2
Process messages to message msg2 instead of the last message in
the archive. Again, files written are corrected, but other files
are not explicitly removed.
-T (Default.) Process entries for messages up to the last message
in the archive.
-v Display ezmlm-archive version info.
-V Display ezmlm-archive version info.
MEMORY USAGE
ezmlm-archive stores its linked lists in memory. On at 32-bit
architecture, it uses 12 bytes per message, 28 bytes per thread (plus
one copy of the subject), and 20 bytes per author (plus one copy of the
author from line).
In normal list use, it processes only at most a few messages at a time,
but for initial processing of a large archive, considerable amounts of
memory may be used. Assuming 40 bytes for subject/from line, 5 messages
per thread, 100,000 messages, and 1000 authors, this is 2.5 MB. For
1,000,000 messages this is about 20 MB.
Thus, for large archives, it may be useful to use the -t switch to
process the archive in multiple subsets, starting with e.g. the first
100,000, then the next, and so on.
SEE ALSO
ezmlm-cgi(1), ezmlm-idx(1), ezmlm-send(1), ezmlm(5)
ezmlm-archive(1)