DragonFly On-Line Manual Pages
dieharder(1) DragonFly General Commands Manual dieharder(1)
NAME
dieharder - A testing and benchmarking tool for random number
generators.
SYNOPSIS
dieharder [-a] [-d dieharder test number] [-f filename] [-B]
[-D output flag [-D output flag] ... ] [-F] [-c separator]
[-g generator number or -1] [-h] [-k ks_flag] [-l]
[-L overlap] [-m multiply_p] [-n ntuple]
[-p number of p samples] [-P Xoff]
[-o filename] [-s seed strategy] [-S random number seed]
[-n ntuple] [-p number of p samples] [-o filename]
[-s seed strategy] [-S random number seed]
[-t number of test samples] [-v verbose flag]
[-W weak] [-X fail] [-Y Xtrategy]
[-x xvalue] [-y yvalue] [-z zvalue]
dieharder OPTIONS
-a runs all the tests with standard/default options to create a
user-controllable report. To control the formatting of the
report, see -D below. To control the power of the test (which
uses default values for tsamples that cannot generally be varied
and psamples which generally can) see -m below as a "multiplier"
of the default number of psamples (used only in a -a run).
-d test number - selects specific diehard test.
-f filename - generators 201 or 202 permit either raw binary or
formatted ASCII numbers to be read in from a file for testing.
generator 200 reads in raw binary numbers from stdin. Note
well: many tests with default parameters require a lot of rands!
To see a sample of the (required) header for ASCII formatted
input, run
dieharder -o -f example.input -t 10
and then examine the contents of example.input. Raw binary
input reads 32 bit increments of the specified data stream.
stdin_input_raw accepts a pipe from a raw binary stream.
-B binary mode (used with -o below) causes output rands to be written
in raw binary, not formatted ascii.
-D output flag - permits fields to be selected for inclusion in
dieharder output. Each flag can be entered as a binary number
that turns on a specific output field or header or by flag name;
flags are aggregated. To see all currently known flags use the
-F command.
-F - lists all known flags by name and number.
-c table separator - where separator is e.g. ',' (CSV) or ' '
(whitespace).
-g generator number - selects a specific generator for testing. Using
-g -1 causes all known generators to be printed out to the
display.
-h prints context-sensitive help -- usually Usage (this message) or a
test synopsis if entered as e.g. dieharder -d 3 -h.
-k ks_flag - ks_flag
0 is fast but slightly sloppy for psamples > 4999 (default).
1 is MUCH slower but more accurate for larger numbers of
psamples.
2 is slower still, but (we hope) accurate to machine precision
for any number of psamples up to some as yet unknown numerical
upper limit (it has been tested out to at least hundreds of
thousands).
3 is kuiper ks, fast, quite inaccurate for small samples,
deprecated.
-l list all known tests.
-L overlap
1 (use overlap, default)
0 (don't use overlap)
in operm5 or other tests that support overlapping and non-
overlapping sample modes.
-m multiply_p - multiply default # of psamples in -a(ll) runs to crank
up the resolution of failure. -n ntuple - set ntuple length for
tests on short bit strings that permit the length to be varied
(e.g. rgb bitdist).
-o filename - output -t count random numbers from current generator to
file.
-p count - sets the number of p-value samples per test (default 100).
-P Xoff - sets the number of psamples that will cumulate before
deciding
that a generator is "good" and really, truly passes even a -Y 2
T2D run. Currently the default is 100000; eventually it will be
set from AES-derived T2D test failure thresholds for fully
automated reliable operation, but for now it is more a "boredom"
threshold set by how long one might reasonably want to wait on
any given test run.
-S seed - where seed is a uint. Overrides the default random seed
selection. Ignored for file or stdin input.
-s strategy - if strategy is the (default) 0, dieharder reseeds (or
rewinds) once at the beginning when the random number generator
is selected and then never again. If strategy is nonzero, the
generator is reseeded or rewound at the beginning of EACH TEST.
If -S seed was specified, or a file is used, this means every
test is applied to the same sequence (which is useful for
validation and testing of dieharder, but not a good way to test
rngs). Otherwise a new random seed is selected for each test.
-t count - sets the number of random entities used in each test, where
possible. Be warned -- some tests have fixed sample sizes;
others are variable but have practical minimum sizes. It is
suggested you begin with the values used in -a and experiment
carefully on a test by test basis.
-W weak - sets the "weak" threshold to make the test(s) more or less
forgiving during e.g. a test-to-destruction run. Default is
currently 0.005.
-X fail - sets the "fail" threshold to make the test(s) more or less
forgiving during e.g. a test-to-destruction run. Default is
currently 0.000001, which is basically "certain failure of the
null hypothesis", the desired mode of reproducible generator
failure.
-Y Xtrategy - the Xtrategy flag controls the new "test to failure"
(T2F) modes. These flags and their modes act as follows:
0 - just run dieharder with the specified number of tsamples
and psamples, do not dynamically modify a run based on results.
This is the way it has always run, and is the default.
1 - "resolve ambiguity" (RA) mode. If a test returns "weak",
this is an undesired result. What does that mean, after all?
If you run a long test series, you will see occasional weak
returns for a perfect generators because p is uniformly
distributed and will appear in any finite interval from time to
time. Even if a test run returns more than one weak result, you
cannot be certain that the generator is failing. RA mode adds
psamples (usually in blocks of 100) until the test result ends
up solidly not weak or proceeds to unambiguous failure. This is
morally equivalent to running the test several times to see if a
weak result is reproducible, but eliminates the bias of personal
judgement in the process since the default failure threshold is
very small and very unlikely to be reached by random chance even
in many runs.
This option should only be used with -k 2.
2 - "test to destruction" mode. Sometimes you just want to
know where or if a generator will .I ever fail a test (or test
series). -Y 2 causes psamples to be added 100 at a time until a
test returns an overall pvalue lower than the failure threshold
or a specified maximum number of psamples (see -P) is reached.
Note well! In this mode one may well fail due to the alternate
null hypothesis -- the test itself is a bad test and fails!
Many dieharder tests, despite our best efforts, are numerically
unstable or have only approximately known target statistics or
are straight up asymptotic results, and will eventually return a
failing result even for a gold-standard generator (such as AES),
or for the hypercautious the XOR generator with AES, threefish,
kiss, all loaded at once and xor'd together. It is therefore
safest to use this mode .I comparatively, executing a T2D run on
AES to get an idea of the test failure threshold(s) (something I
will eventually do and publish on the web so everybody doesn't
have to do it independently) and then running it on your target
generator. Failure with numbers of psamples within an order of
magnitude of the AES thresholds should probably be considered
possible test failures, not generator failures. Failures at
levels significantly less than the known gold standard generator
failure thresholds are, of course, probably failures of the
generator.
This option should only be used with -k 2.
-v verbose flag -- controls the verbosity of the output for debugging
only. Probably of little use to non-developers, and developers
can read the enum(s) in dieharder.h and the test sources to see
which flag values turn on output on which routines. 1 is result
in a highly detailed trace of program activity.
-x,-y,-z number - Some tests have parameters that can safely be varied
from their default value. For example, in the diehard birthdays
test, one can vary the number of length, which can also be
varied. -x 2048 -y 30 alters these two values but should still
run fine. These parameters should be documented internally
(where they exist) in the e.g. -d 0 -h visible notes.
NOTE WELL: The assessment(s) for the rngs may, in fact, be
completely incorrect or misleading. There are still "bad tests"
in dieharder, although we are working to fix and improve them
(and try to document them in the test descriptions visible with
-g testnumber -h). In particular, 'Weak' pvalues should occur
one test in two hundred, and 'Failed' pvalues should occur one
test in a million with the default thresholds - that's what p
MEANS. Use them at your Own Risk! Be Warned!
Or better yet, use the new -Y 1 and -Y 2 resolve ambiguity or
test to destruction modes above, comparing to similar runs on
one of the as-good-as-it-gets cryptographic generators, AES or
threefish.
DESCRIPTION
dieharder
Welcome to the current snapshot of the dieharder random number tester.
It encapsulates all of the Gnu Scientific Library (GSL) random number
generators (rngs) as well as a number of generators from the R
statistical library, hardware sources such as /dev/*random, "gold
standard" cryptographic quality generators (useful for testing
dieharder and for purposes of comparison to new generators) as well as
generators contributed by users or found in the literature into a
single harness that can time them and subject them to various tests for
randomness. These tests are variously drawn from George Marsaglia's
"Diehard battery of random number tests", the NIST Statistical Test
Suite, and again from other sources such as personal invention, user
contribution, other (open source) test suites, or the literature.
The primary point of dieharder is to make it easy to time and test
(pseudo)random number generators, including both software and hardware
rngs, with a fully open source tool. In addition to providing
"instant" access to testing of all built-in generators, users can
choose one of three ways to test their own random number generators or
sources: a unix pipe of a raw binary (presumed random) bitstream; a
file containing a (presumed random) raw binary bitstream or formatted
ascii uints or floats; and embedding your generator in dieharder's GSL-
compatible rng harness and adding it to the list of built-in
generators. The stdin and file input methods are described below in
their own section, as is suggested "best practice" for newbies to
random number generator testing.
An important motivation for using dieharder is that the entire test
suite is fully Gnu Public License (GPL) open source code and hence
rather than being prohibited from "looking underneath the hood" all
users are openly encouraged to critically examine the dieharder code
for errors, add new tests or generators or user interfaces, or use it
freely as is to test their own favorite candidate rngs subject only to
the constraints of the GPL. As a result of its openness, literally
hundreds of improvements and bug fixes have been contributed by users
to date, resulting in a far stronger and more reliable test suite than
would have been possible with closed and locked down sources or even
open sources (such as STS) that lack the dynamical feedback mechanism
permitting corrections to be shared.
Even small errors in test statistics permit the alternative (usually
unstated) null hypothesis to become an important factor in rng testing
-- the unwelcome possibility that your generator is just fine but it is
the test that is failing. One extremely useful feature of dieharder is
that it is at least moderately self validating. Using the "gold
standard" aes and threefish cryptographic generators, you can observe
how these generators perform on dieharder runs to the same general
degree of accuracy that you wish to use on the generators you are
testing. In general, dieharder tests that consistently fail at any
given level of precision (selected with e.g. -a -m 10) on both of the
gold standard rngs (and/or the better GSL generators, mt19937, gfsr4,
taus) are probably unreliable at that precision and it would hardly be
surprising if they failed your generator as well.
Experts in statistics are encouraged to give the suite a try, perhaps
using any of the example calls below at first and then using it freely
on their own generators or as a harness for adding their own tests.
Novices (to either statistics or random number generator testing) are
strongly encouraged to read the next section on p-values and the null
hypothesis and running the test suite a few times with a more verbose
output report to learn how the whole thing works.
QUICK START EXAMPLES
Examples for how to set up pipe or file input are given below.
However, it is recommended that a user play with some of the built in
generators to gain familiarity with dieharder reports and tests before
tackling their own favorite generator or file full of possibly random
numbers.
To see dieharder's default standard test report for its default
generator (mt19937) simply run:
dieharder -a
To increase the resolution of possible failures of the standard -a(ll)
test, use the -m "multiplier" for the test default numbers of pvalues
(which are selected more to make a full test run take an hour or so
instead of days than because it is truly an exhaustive test sequence)
run:
dieharder -a -m 10
To test a different generator (say the gold standard AES_OFB) simply
specify the generator on the command line with a flag:
dieharder -g 205 -a -m 10
Arguments can be in any order. The generator can also be selected by
name:
dieharder -g AES_OFB -a
To apply only the diehard opso test to the AES_OFB generator, specify
the test by name or number:
dieharder -g 205 -d 5
or
dieharder -g 205 -d diehard_opso
Nearly every aspect or field in dieharder's output report format is
user-selectable by means of display option flags. In addition, the
field separator character can be selected by the user to make the
output particularly easy for them to parse (-c ' ') or import into a
spreadsheet (-c ','). Try:
dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues
to see an extremely terse, easy to import report or
dieharder -g 205 -d diehard_opso -c ' ' -D default -D histogram -D
description
to see a verbose report good for a "beginner" that includes a full
description of each test itself.
Finally, the dieharder binary is remarkably autodocumenting even if the
man page is not available. All users should try the following commands
to see what they do:
dieharder -h
(prints the command synopsis like the one above).
dieharder -a -h
dieharder -d 6 -h
(prints the test descriptions only for -a(ll) tests or for the specific
test indicated).
dieharder -l
(lists all known tests, including how reliable rgb thinks that they are
as things stand).
dieharder -g -1
(lists all known rngs).
dieharder -F
(lists all the currently known display/output control flags used with
-D).
Both beginners and experts should be aware that the assessment provided
by dieharder in its standard report should be regarded with great
suspicion. It is entirely possible for a generator to "pass" all tests
as far as their individual p-values are concerned and yet to fail
utterly when considering them all together. Similarly, it is probable
that a rng will at the very least show up as "weak" on 0, 1 or 2 tests
in a typical -a(ll) run, and may even "fail" 1 test one such run in 10
or so. To understand why this is so, it is necessary to understand
something of rng testing, p-values, and the null hypothesis!
P-VALUES AND THE NULL HYPOTHESIS
dieharder returns "p-values". To understand what a p-value is and how
to use it, it is essential to understand the null hypothesis, H0.
The null hypothesis for random number generator testing is "This
generator is a perfect random number generator, and for any choice of
seed produces a infinitely long, unique sequence of numbers that have
all the expected statistical properties of random numbers, to all
orders". Note well that we know that this hypothesis is technically
false for all software generators as they are periodic and do not have
the correct entropy content for this statement to ever be true.
However, many hardware generators fail a priori as well, as they
contain subtle bias or correlations due to the deterministic physics
that underlies them. Nature is often unpredictable but it is rarely
random and the two words don't (quite) mean the same thing!
The null hypothesis can be practically true, however. Both software
and hardware generators can be "random" enough that their sequences
cannot be distinguished from random ones, at least not easily or with
the available tools (including dieharder!) Hence the null hypothesis is
a practical, not a theoretically pure, statement.
To test H0 , one uses the rng in question to generate a sequence of
presumably random numbers. Using these numbers one can generate any
one of a wide range of test statistics -- empirically computed numbers
that are considered random samples that may or may not be covariant
subject to H0, depending on whether overlapping sequences of random
numbers are used to generate successive samples while generating the
statistic(s), drawn from a known distribution. From a knowledge of the
target distribution of the statistic(s) and the associated cumulative
distribution function (CDF) and the empirical value of the randomly
generated statistic(s), one can read off the probability of obtaining
the empirical result if the sequence was truly random, that is, if the
null hypothesis is true and the generator in question is a "good"
random number generator! This probability is the "p-value" for the
particular test run.
For example, to test a coin (or a sequence of bits) we might simply
count the number of heads and tails in a very long string of flips. If
we assume that the coin is a "perfect coin", we expect the number of
heads and tails to be binomially distributed and can easily compute the
probability of getting any particular number of heads and tails. If we
compare our recorded number of heads and tails from the test series to
this distribution and find that the probability of getting the count we
obtained is very low with, say, way more heads than tails we'd suspect
the coin wasn't a perfect coin. dieharder applies this very test (made
mathematically precise) and many others that operate on this same
principle to the string of random bits produced by the rng being tested
to provide a picture of how "random" the rng is.
Note that the usual dogma is that if the p-value is low -- typically
less than 0.05 -- one "rejects" the null hypothesis. In a word, it is
improbable that one would get the result obtained if the generator is a
good one. If it is any other value, one does not "accept" the
generator as good, one "fails to reject" the generator as bad for this
particular test. A "good random number generator" is hence one that we
haven't been able to make fail yet!
This criterion is, of course, naive in the extreme and cannot be used
with dieharder! It makes just as much sense to reject a generator that
has p-values of 0.95 or more! Both of these p-value ranges are equally
unlikely on any given test run, and should be returned for (on average)
5% of all test runs by a perfect random number generator. A generator
that fails to produce p-values less than 0.05 5% of the time it is
tested with different seeds is a bad random number generator, one that
fails the test of the null hypothesis. Since dieharder returns over
100 pvalues by default per test, one would expect any perfectly good
rng to "fail" such a naive test around five times by this criterion in
a single dieharder run!
The p-values themselves, as it turns out, are test statistics! By
their nature, p-values should be uniformly distributed on the range
0-1. In 100+ test runs with independent seeds, one should not be
surprised to obtain 0, 1, 2, or even (rarely) 3 p-values less than
0.01. On the other hand obtaining 7 p-values in the range 0.24-0.25,
or seeing that 70 of the p-values are greater than 0.5 should make the
generator highly suspect! How can a user determine when a test is
producing "too many" of any particular value range for p? Or too few?
Dieharder does it for you, automatically. One can in fact convert a
set of p-values into a p-value by comparing their distribution to the
expected one, using a Kolmogorov-Smirnov test against the expected
uniform distribution of p.
These p-values obtained from looking at the distribution of p-values
should in turn be uniformly distributed and could in principle be
subjected to still more KS tests in aggregate. The distribution of p-
values for a good generator should be idempotent, even across different
test statistics and multiple runs.
A failure of the distribution of p-values at any level of aggregation
signals trouble. In fact, if the p-values of any given test are
subjected to a KS test, and those p-values are then subjected to a KS
test, as we add more p-values to either level we will either observe
idempotence of the resulting distribution of p to uniformity, or we
will observe idempotence to a single p-value of zero! That is, a good
generator will produce a roughly uniform distribution of p-values, in
the specific sense that the p-values of the distributions of p-values
are themselves roughly uniform and so on ad infinitum, while a bad
generator will produce a non-uniform distribution of p-values, and as
more p-values drawn from the non-uniform distribution are added to its
KS test, at some point the failure will be absolutely unmistakeable as
the resulting p-value approaches 0 in the limit. Trouble indeed!
The question is, trouble with what? Random number tests are themselves
complex computational objects, and there is a probability that their
code is incorrectly framed or that roundoff or other numerical -- not
methodical -- errors are contributing to a distortion of the
distribution of some of the p-values obtained. This is not an idle
observation; when one works on writing random number generator testing
programs, one is always testing the tests themselves with "good" (we
hope) random number generators so that egregious failures of the null
hypothesis signal not a bad generator but an error in the test code.
The null hypothesis above is correctly framed from a theoretical point
of view, but from a real and practical point of view it should read:
"This generator is a perfect random number generator, and for any
choice of seed produces a infinitely long, unique sequence of numbers
that have all the expected statistical properties of random numbers, to
all orders and this test is a perfect test and returns precisely
correct p-values from the test computation." Observed "failure" of
this joint null hypothesis H0' can come from failure of either or both
of these disjoint components, and comes from the second as often or
more often than the first during the test development process. When
one cranks up the "resolution" of the test (discussed next) to where a
generator starts to fail some test one realizes, or should realize,
that development never ends and that new test regimes will always
reveal new failures not only of the generators but of the code.
With that said, one of dieharder's most significant advantages is the
control that it gives you over a critical test parameter. From the
remarks above, we can see that we should feel very uncomfortable about
"failing" any given random number generator on the basis of a 5%, or
even a 1%, criterion, especially when we apply a test suite like
dieharder that returns over 100 (and climbing) distinct test p-values
as of the last snapshot. We want failure to be unambiguous and
reproducible!
To accomplish this, one can simply crank up its resolution. If we ran
any given test against a random number generator and it returned a p-
value of (say) 0.007328, we'd be perfectly justified in wondering if it
is really a good generator. However, the probability of getting this
result isn't really all that small -- when one uses dieharder for hours
at a time numbers like this will definitely happen quite frequently and
mean nothing. If one runs the same test again (with a different seed
or part of the random sequence) and gets a p-value of 0.009122, and a
third time and gets 0.002669 -- well, that's three 1% (or less) shots
in a row and that should happen only one in a million times. One way
to clearly resolve failures, then, is to increase the number of
p-values generated in a test run. If the actual distribution of p
being returned by the test is not uniform, a KS test will eventually
return a p-value that is not some ambiguous 0.035517 but is instead
0.000000, with the latter produced time after time as we rerun.
For this reason, dieharder is extremely conservative about announcing
rng "weakness" or "failure" relative to any given test. It's internal
criterion for these things are currently p < 0.5% or p > 99.5% weakness
(at the 1% level total) and a considerably more stringent criterion for
failure: p < 0.05% or p > 99.95%. Note well that the ranges are
symmetric -- too high a value of p is just as bad (and unlikely) as too
low, and it is critical to flag it, because it is quite possible for a
rng to be too good, on average, and not to produce enough low p-values
on the full spectrum of dieharder tests. This is where the final
kstest is of paramount importance, and where the "histogram" option can
be very useful to help you visualize the failure in the distribution of
p -- run e.g.:
dieharder [whatever] -D default -D histogram
and you will see a crude ascii histogram of the pvalues that failed (or
passed) any given level of test.
Scattered reports of weakness or marginal failure in a preliminary
-a(ll) run should therefore not be immediate cause for alarm. Rather,
they are tests to repeat, to watch out for, to push the rng harder on
using the -m option to -a or simply increasing -p for a specific test.
Dieharder permits one to increase the number of p-values generated for
any test, subject only to the availability of enough random numbers
(for file based tests) and time, to make failures unambiguous. A test
that is truly weak at -p 100 will almost always fail egregiously at
some larger value of psamples, be it -p 1000 or -p 100000. However,
because dieharder is a research tool and is under perpetual development
and testing, it is strongly suggested that one always consider the
alternative null hypothesis -- that the failure is a failure of the
test code in dieharder itself in some limit of large numbers -- and
take at least some steps (such as running the same test at the same
resolution on a "gold standard" generator) to ensure that the failure
is indeed probably in the rng and not the dieharder code.
Lacking a source of perfect random numbers to use as a reference,
validating the tests themselves is not easy and always leaves one with
some ambiguity (even aes or threefish). During development the best
one can usually do is to rely heavily on these "presumed good" random
number generators. There are a number of generators that we have
theoretical reasons to expect to be extraordinarily good and to lack
correlations out to some known underlying dimensionality, and that also
test out extremely well quite consistently. By using several such
generators and not just one, one can hope that those generators have
(at the very least) different correlations and should not all uniformly
fail a test in the same way and with the same number of p-values. When
all of these generators consistently fail a test at a given level, I
tend to suspect that the problem is in the test code, not the
generators, although it is very difficult to be certain, and many
errors in dieharder's code have been discovered and ultimately fixed in
just this way by myself or others.
One advantage of dieharder is that it has a number of these "good
generators" immediately available for comparison runs, courtesy of the
Gnu Scientific Library and user contribution (notably David Bauer, who
kindly encapsulated aes and threefish). I use AES_OFB, Threefish_OFB,
mt19937_1999, gfsr4, ranldx2 and taus2 (as well as "true random"
numbers from random.org) for this purpose, and I try to ensure that
dieharder will "pass" in particular the -g 205 -S 1 -s 1 generator at
any reasonable p-value resolution out to -p 1000 or farther.
Tests (such as the diehard operm5 and sums test) that consistently fail
at these high resolutions are flagged as being "suspect" -- possible
failures of the alternative null hypothesis -- and they are strongly
deprecated! Their results should not be used to test random number
generators pending agreement in the statistics and random number
community that those tests are in fact valid and correct so that
observed failures can indeed safely be attributed to a failure of the
intended null hypothesis.
As I keep emphasizing (for good reason!) dieharder is community
supported. I therefore openly ask that the users of dieharder who are
expert in statistics to help me fix the code or algorithms being
implemented. I would like to see this test suite ultimately be
validated by the general statistics community in hard use in an open
environment, where every possible failure of the testing mechanism
itself is subject to scrutiny and eventual correction. In this way we
will eventually achieve a very powerful suite of tools indeed, ones
that may well give us very specific information not just about failure
but of the mode of failure as well, just how the sequence tested
deviates from randomness.
Thus far, dieharder has benefitted tremendously from the community.
Individuals have openly contributed tests, new generators to be tested,
and fixes for existing tests that were revealed by their own work with
the testing instrument. Efforts are underway to make dieharder more
portable so that it will build on more platforms and faster so that
more thorough testing can be done. Please feel free to participate.
FILE INPUT
The simplest way to use dieharder with an external generator that
produces raw binary (presumed random) bits is to pipe the raw binary
output from this generator (presumed to be a binary stream of 32 bit
unsigned integers) directly into dieharder, e.g.:
cat /dev/urandom | ./dieharder -a -g 200
Go ahead and try this example. It will run the entire dieharder suite
of tests on the stream produced by the linux built-in generator
/dev/urandom (using /dev/random is not recommended as it is too slow to
test in a reasonable amount of time).
Alternatively, dieharder can be used to test files of numbers produced
by a candidate random number generators:
dieharder -a -g 201 -f random.org_bin
for raw binary input or
dieharder -a -g 202 -f random.org.txt
for formatted ascii input.
A formatted ascii input file can accept either uints (integers in the
range 0 to 2^31-1, one per line) or decimal uniform deviates with at
least ten significant digits (that can be multiplied by UINT_MAX = 2^32
to produce a uint without dropping precition), also one per line.
Floats with fewer digits will almost certainly fail bitlevel tests,
although they may pass some of the tests that act on uniform deviates.
Finally, one can fairly easily wrap any generator in the same (GSL)
random number harness used internally by dieharder and simply test it
the same way one would any other internal generator recognized by
dieharder. This is strongly recommended where it is possible, because
dieharder needs to use a lot of random numbers to thoroughly test a
generator. A built in generator can simply let dieharder determine how
many it needs and generate them on demand, where a file that is too
small will "rewind" and render the test results where a rewind occurs
suspect.
Note well that file input rands are delivered to the tests on demand,
but if the test needs more than are available it simply rewinds the
file and cycles through it again, and again, and again as needed.
Obviously this significantly reduces the sample space and can lead to
completely incorrect results for the p-value histograms unless there
are enough rands to run EACH test without repetition (it is harmless to
reuse the sequence for different tests). Let the user beware!
BEST PRACTICE
A frequently asked question from new users wishing to test a generator
they are working on for fun or profit (or both) is "How should I get
its output into dieharder?" This is a nontrivial question, as
dieharder consumes enormous numbers of random numbers in a full test
cycle, and then there are features like -m 10 or -m 100 that let one
effortlessly demand 10 or 100 times as many to stress a new generator
even more.
Even with large file support in dieharder, it is difficult to provide
enough random numbers in a file to really make dieharder happy. It is
therefore strongly suggested that you either:
a) Edit the output stage of your random number generator and get it to
write its production to stdout as a random bit stream -- basically
create 32 bit unsigned random integers and write them directly to
stdout as e.g. char data or raw binary. Note that this is not the same
as writing raw floating point numbers (that will not be random at all
as a bitstream) and that "endianness" of the uints should not matter
for the null hypothesis of a "good" generator, as random bytes are
random in any order. Crank the generator and feed this stream to
dieharder in a pipe as described above.
b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your
generator (or calls to your generator's hardware interface). Follow
the examples in the ./dieharder source directory to add it as a "user"
generator in the command line interface, rebuild, and invoke the
generator as a "native" dieharder generator (it should appear in the
list produced by -g -1 when done correctly). The advantage of doing it
this way is that you can then (if your new generator is highly
successful) contribute it back to the dieharder project if you wish!
Not to mention the fact that it makes testing it very easy.
Most users will probably go with option a) at least initially, but be
aware that b) is probably easier than you think. The dieharder
maintainers may be able to give you a hand with it if you get into
trouble, but no promises.
WARNING!
A warning for those who are testing files of random numbers. dieharder
is a tool that tests random number generators, not files of random
numbers! It is extremely inappropriate to try to "certify" a file of
random numbers as being random just because it fails to "fail" any of
the dieharder tests in e.g. a dieharder -a run. To put it bluntly, if
one rejects all such files that fail any test at the 0.05 level (or any
other), the one thing one can be certain of is that the files in
question are not random, as a truly random sequence would fail any
given test at the 0.05 level 5% of the time!
To put it another way, any file of numbers produced by a generator that
"fails to fail" the dieharder suite should be considered "random", even
if it contains sequences that might well "fail" any given test at some
specific cutoff. One has to presume that passing the broader tests of
the generator itself, it was determined that the p-values for the test
involved was globally correctly distributed, so that e.g. failure at
the 0.01 level occurs neither more nor less than 1% of the time, on
average, over many many tests. If one particular file generates a
failure at this level, one can therefore safely presume that it is a
random file pulled from many thousands of similar files the generator
might create that have the correct distribution of p-values at all
levels of testing and aggregation.
To sum up, use dieharder to validate your generator (via input from
files or an embedded stream). Then by all means use your generator to
produce files or streams of random numbers. Do not use dieharder as an
accept/reject tool to validate the files themselves!
EXAMPLES
To demonstrate all tests, run on the default GSL rng, enter:
dieharder -a
To demonstrate a test of an external generator of a raw binary stream
of bits, use the stdin (raw) interface:
cat /dev/urandom | dieharder -g 200 -a
To use it with an ascii formatted file:
dieharder -g 202 -f testrands.txt -a
(testrands.txt should consist of a header such as:
#==================================================================
# generator mt19937_1999 seed = 1274511046
#==================================================================
type: d
count: 100000
numbit: 32
3129711816
85411969
2545911541
etc.).
To use it with a binary file
dieharder -g 201 -f testrands.bin -a
or
cat testrands.bin | dieharder -g 200 -a
An example that demonstrates the use of "prefixes" on the output lines
that make it relatively easy to filter off the different parts of the
output report and chop them up into numbers that can be used in other
programs or in spreadsheets, try:
dieharder -a -c ',' -D default -D prefix
DISPLAY OPTIONS
As of version 3.x.x, dieharder has a single output interface that
produces tabular data per test, with common information in headers.
The display control options and flags can be used to customize the
output to your individual specific needs.
The options are controlled by binary flags. The flags, and their text
versions, are displayed if you enter:
dieharder -F
by itself on a line.
The flags can be entered all at once by adding up all the desired
option flags. For example, a very sparse output could be selected by
adding the flags for the test_name (8) and the associated pvalues (128)
to get 136:
dieharder -a -D 136
Since the flags are cumulated from zero (unless no flag is entered and
the default is used) you could accomplish the same display via:
dieharder -a -D 8 -D pvalues
Note that you can enter flags by value or by name, in any combination.
Because people use dieharder to obtain values and then with to export
them into spreadsheets (comma separated values) or into filter scripts,
you can chance the field separator character. For example:
dieharder -a -c ',' -D default -D -1 -D -2
produces output that is ideal for importing into a spreadsheet (note
that one can subtract field values from the base set of fields provided
by the default option as long as it is given first).
An interesting option is the -D prefix flag, which turns on a field
identifier prefix to make it easy to filter out particular kinds of
data. However, it is equally easy to turn on any particular kind of
output to the exclusion of others directly by means of the flags.
Two other flags of interest to novices to random number generator
testing are the -D histogram (turns on a histogram of the underlying
pvalues, per test) and -D description (turns on a complete test
description, per test). These flags turn the output table into more of
a series of "reports" of each test.
PUBLICATION RULES
dieharder is entirely original code and can be modified and used at
will by any user, provided that:
a) The original copyright notices are maintained and that the source,
including all modifications, is made publically available at the time
of any derived publication. This is open source software according to
the precepts and spirit of the Gnu Public License. See the
accompanying file COPYING, which also must accompany any
redistribution.
b) The primary author of the code (Robert G. Brown) is appropriately
acknowledged and referenced in any derived publication. It is strongly
suggested that George Marsaglia and the Diehard suite and the various
authors of the Statistical Test Suite be similarly acknowledged,
although this suite shares no actual code with these random number test
suites.
c) Full responsibility for the accuracy, suitability, and
effectiveness of the program rests with the users and/or modifiers. As
is clearly stated in the accompanying copyright.h:
THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
ACKNOWLEDGEMENTS
The author of this suite gratefully acknowledges George Marsaglia (the
author of the diehard test suite) and the various authors of NIST
Special Publication 800-22 (which describes the Statistical Test Suite
for testing pseudorandom number generators for cryptographic
applications), for excellent descriptions of the tests therein. These
descriptions enabled this suite to be developed with a GPL.
The author also wishes to reiterate that the academic correctness and
accuracy of the implementation of these tests is his sole
responsibility and not that of the authors of the Diehard or STS
suites. This is especially true where he has seen fit to modify those
tests from their strict original descriptions.
COPYRIGHT
GPL 2b; see the file COPYING that accompanies the source of this
program. This is the "standard Gnu General Public License version 2 or
any later version", with the one minor (humorous) "Beverage"
modification listed below. Note that this modification is probably not
legally defensible and can be followed really pretty much according to
the honor rule.
As to my personal preferences in beverages, red wine is great, beer is
delightful, and Coca Cola or coffee or tea or even milk acceptable to
those who for religious or personal reasons wish to avoid stressing my
liver.
The Beverage Modification to the GPL:
Any satisfied user of this software shall, upon meeting the primary
author(s) of this software for the first time under the appropriate
circumstances, offer to buy him or her or them a beverage. This
beverage may or may not be alcoholic, depending on the personal ethical
and moral views of the offerer. The beverage cost need not exceed one
U.S. dollar (although it certainly may at the whim of the offerer:-)
and may be accepted or declined with no further obligation on the part
of the offerer. It is not necessary to repeat the offer after the
first meeting, but it can't hurt...
dieharder Copyright 2003 Robert G. Brown dieharder(1)