DragonFly On-Line Manual Pages
MU-INDEX(1) DragonFly General Commands Manual MU-INDEX(1)
NAME
mu index - index e-mail messages stored in Maildirs
SYNOPSIS
mu index [options]
DESCRIPTION
mu index is the mu command for scanning the contents of Maildir
directories and storing the results in a Xapian database. The data can
then be queried using mu-find(1)
index understands Maildirs as defined by Daniel Bernstein for qmail(7).
In addition, it understands recursive Maildirs (Maildirs within
Maildirs), Maildir++. It can also deal with VFAT-based Maildirs which
use '!' as the separators instead of ':' as used by Tinymail/Modest and
some other e-mail programs.
E-mail messages which are not stored in something resembling a maildir
leaf-directory (cur and new) are ignored, as are the cache directories
for notmuch and gnus.
Symlinks are not followed.
If there is a file called .noindex in a directory, the contents of that
directory and all of its subdirectories will be ignored. This can be
useful to exclude certain directories from the indexing process, for
example directories with spam-messages.
If there is a file called .noupdate in a directory, the contents of
that directory and all of its subdirectories will be ignored, unless we
do a full rebuild (with --rebuild). This can be useful to speed up
things you have some maildirs that never change. Note that you can
still search for these messages, this only affects updating the
database.
The first run of mu index may take a few minutes if you have a lot of
mail (tens of thousands of messages). Fortunately, such a full scan
needs to be done only once; after that it suffices to index the
changes, which goes much faster. See the 'Note on performance' below
for more information.
The optional 'phase two' of the indexing-process is the removal of
messages from the database for which there is no longer a corresponding
file in the Maildir. If you do not want this, you can use -n,
--nocleanup.
When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM
(e.g., when you press Ctrl-C during the indexing process), it tries to
shutdown gracefully; it tries to save and commit data, and close the
database etc. If it receives another signal (e.g., when pressing Ctrl-C
once more), mu index will terminate immediately.
OPTIONS
Note, some of the general options are described in the mu(1) man-page
and not here, as they apply to multiple mu commands.
-m, --maildir=<maildir>
starts searching at <maildir>. By default, mu uses whatever the
MAILDIR environment variable is set to; if it is not set, it
tries ~/Maildir. See the note on mixing sub-maildirs below.
--my-address=<my-email-address>
specifies that some e-mail address is 'my-address' (--my-address
can be used multiple times). This is used by mu cfind -- any e-
mail address found in the address fields of a message which also
has <my-email-address> in one of its address fields is
considered a personal e-mail address. This allows you, for
example, to filter out (mu cfind --personal) addresses which
were merely seen in mailing list messages.
--nocleanup
disables the database cleanup that mu does by default after
indexing.
--rebuild
clear all messages from the database before indexing. --rebuild
guarantees that after the indexing has finished, there are no
'old' messages in the database anymore, which is not true with
--reindex when indexing only a part of messages (using
--maildir). For this reason, it is necessary to run mu index
--rebuild when there is an upgrade in the database format. mu
index will issue a warning about this.
--autoupgrade
automatically use -y, --empty when mu notices that the database
version is not up-to-date. This option is for use in cron
scripts and the like, so they won't require any user
interaction, even when mu introduces a new database version.
--xbatchsize=<batch size>
set the maximum number of messages to process in a single Xapian
transaction. In practice, this option is only useful if you find
that mu is running out of memory while indexing; in that case,
you can set the batch size to (for example) 1000, which will
reduce memory consumption, but also substantially reduce the
indexing performance.
--max-msg-size=<max msg size>
set the maximum size (in bytes) for messages. The default
maximum (currently at 50Mb) should be enough in most cases, but
if you encounter warnings from mu about ignoring messsage
because they are too big, you may want to increase this. Note
that the reason for having a maximum size is that big message
require big memory allocations, which may lead to problems.
NOTE: It is not recommended to mix maildirs and sub-maildirs
within the hierarchy in the same database; for example, it's
better not to index both with --maildir=~/MyMaildir and
--maildir=~/MyMaildir/foo, as this may lead to unexpected
results when searching with the 'maildir:' search parameter (see
below).
A note on performance (i)
As a non-scientific benchmark, a simple test on the author's machine (a
Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with
no existing database, and a maildir with 27273 messages:
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total
(about 103 messages per second)
A second run, which is the more typical use case when there is a
database already, goes much faster:
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total
(more than 56818 messages per second)
Note that each test flushes the caches first; a more common use case
might be to run mu index when new mail has arrived; the cache may stay
quite 'warm' in that case:
$ time mu index --quiet
0,33s user 0,40s system 80% cpu 0,905 total
which is more than 30000 messages per second.
A note on performance (ii)
As per June 2012, we did the same non-scientific benchmark, this time
with an Intel) i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir
with 22589 messages.
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total
(about 813 messages per second)
A second run, which is the more typical use case when there is a
database already, goes much faster:
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total
(more than 173000 messages per second)
In general, mu has been getting faster with each release, even with
relatively expensive new features such as text-normalization (for case-
insensitve/accent-insensitive matching). The profiles are dominated by
operations in the Xapian database now.
FILES
By default, mu index stores its message database in ~/.mu/xapian; the
database has an embedded version number, and mu will automatically
update it when it notices a different version. This allows for
automatic updating of mu-versions, without the need to clear out any
old databases.
However, note that versions of mu before 0.7 used a different scheme,
which puts the database in ~/.mu/xapian-<version>. These older
databases can safely be deleted. Starting from version 0.7, this manual
cleanup should no longer be needed.
mu stores logs of its operations and queries in <muhome>/mu.log (by
default, this is ~/.mu/mu.log). Upon startup, mu checks the size of
this log file. If it exceeds 1 MB, it will be moved to
~/.mu/mu.log.old, overwriting any existing file of that name, and start
with an empty log file. This scheme allows for continued use of mu
without the need for any manual maintenance of log files.
ENVIRONMENT
mu index uses MAILDIR to find the user's Maildir if it has not been
specified explicitly with --maildir=<maildir>. If MAILDIR is not set,
mu index will try ~/Maildir.
RETURN VALUE
mu index return 0 upon successful completion, and any other number
greater than 0 signals an error.
BUGS
Please report bugs if you find them: https://github.com/djcb/mu/issues
AUTHOR
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
SEE ALSO
maildir(5) mu(1) mu-find(1) mu-cfind(1)
User Manuals September 2013 MU-INDEX(1)