DragonFly On-Line Manual Pages

dieharder(1)							  dieharder(1)

NAME
       dieharder  -  A testing and benchmarking tool for random number genera-
       tors.

SYNOPSIS
       dieharder [-a] [-d dieharder test number] [-f filename] [-B]
		 [-D output flag [-D output flag] ... ] [-F] [-c separator]
		 [-g generator number or -1] [-h] [-k ks_flag] [-l]
		 [-L overlap] [-m multiply_p] [-n ntuple]
		 [-p number of p samples] [-P Xoff]
		 [-o filename] [-s seed strategy] [-S random number seed]
		 [-n ntuple] [-p number of p samples] [-o filename]
		 [-s seed strategy] [-S random number seed]
		 [-t number of test samples] [-v verbose flag]
		 [-W weak] [-X fail] [-Y Xtrategy]
		 [-x xvalue] [-y yvalue] [-z zvalue]

dieharder OPTIONS
       -a runs all the tests with standard/default options to create a
	      user-controllable report.  To  control  the  formatting  of  the
	      report,  see  -D below.  To control the power of the test (which
	      uses default values for tsamples that cannot generally be varied
	      and psamples which generally can) see -m below as a "multiplier"
	      of the default number of psamples (used only in a -a run).

       -d test number -  selects specific diehard test.

       -f filename - generators 201 or 202 permit either raw binary or
	      formatted ASCII numbers to be read in from a file  for  testing.
	      generator  200  reads  in  raw  binary numbers from stdin.  Note
	      well: many tests with default parameters require a lot of rands!
	      To  see  a  sample  of the (required) header for ASCII formatted
	      input, run

		       dieharder -o -f example.input -t 10

	      and then examine the  contents  of  example.input.   Raw	binary
	      input  reads  32	bit  increments  of the specified data stream.
	      stdin_input_raw accepts a pipe from a raw binary stream.

       -B binary mode (used with -o below) causes output rands to  be  written
       in raw binary, not formatted ascii.

       -D output flag - permits fields to be selected for inclusion in
	      dieharder  output.   Each flag can be entered as a binary number
	      that turns on a specific output field or header or by flag name;
	      flags  are aggregated.  To see all currently known flags use the
	      -F command.

       -F - lists all known flags by name and number.

       -c table separator - where separator is e.g. ',' (CSV) or '  '  (white-
       space).

       -g generator number - selects a specific generator for testing.	Using
	      -g  -1 causes all known generators to be printed out to the dis-
	      play.

       -h prints context-sensitive help -- usually Usage (this message) or a
	      test synopsis if entered as e.g. dieharder -d 3 -h.

       -k ks_flag - ks_flag

	      0 is fast but slightly sloppy for psamples > 4999 (default).

	      1 is MUCH slower but more accurate for larger numbers  of  psam-
	      ples.

	      2  is  slower still, but (we hope) accurate to machine precision
	      for any number of psamples up to some as yet  unknown  numerical
	      upper  limit  (it  has  been  tested out to at least hundreds of
	      thousands).

	      3 is kuiper ks, fast, quite inaccurate for small samples, depre-
	      cated.

       -l list all known tests.

       -L overlap

	      1 (use overlap, default)

	      0 (don't use overlap)

	      in  operm5 or other tests that support overlapping and non-over-
	      lapping sample modes.

       -m multiply_p - multiply default # of psamples in -a(ll) runs to crank
	      up the resolution of failure.  -n ntuple - set ntuple length for
	      tests  on  short bit strings that permit the length to be varied
	      (e.g. rgb bitdist).

       -o filename - output -t count random numbers from current generator  to
       file.

       -p count - sets the number of p-value samples per test (default 100).

       -P  Xoff - sets the number of psamples that will cumulate before decid-
       ing
	      that a generator is "good" and really, truly passes even a -Y  2
	      T2D run.	Currently the default is 100000; eventually it will be
	      set from AES-derived T2D test failure thresholds for fully auto-
	      mated  reliable  operation,  but	for now it is more a "boredom"
	      threshold set by how long one might reasonably want to  wait  on
	      any given test run.

       -S seed - where seed is a uint.	Overrides the default random seed
	      selection.  Ignored for file or stdin input.

       -s strategy - if strategy is the (default) 0, dieharder reseeds (or
	      rewinds)	once at the beginning when the random number generator
	      is selected and then never again.  If strategy is  nonzero,  the
	      generator  is reseeded or rewound at the beginning of EACH TEST.
	      If -S seed was specified, or a file is used,  this  means  every
	      test  is applied to the same sequence (which is useful for vali-
	      dation and testing of dieharder, but not	a  good  way  to  test
	      rngs).  Otherwise a new random seed is selected for each test.

       -t count - sets the number of random entities used in each test, where
	      possible.  Be warned -- some tests have fixed sample sizes; oth-
	      ers are variable but have practical minimum sizes.  It  is  sug-
	      gested you begin with the values used in -a and experiment care-
	      fully on a test by test basis.

       -W weak - sets the "weak" threshold to make the test(s) more or less
	      forgiving during e.g. a  test-to-destruction  run.   Default  is
	      currently 0.005.

       -X fail - sets the "fail" threshold to make the test(s) more or less
	      forgiving  during  e.g.  a  test-to-destruction run.  Default is
	      currently 0.000001, which is basically "certain failure  of  the
	      null  hypothesis",  the  desired	mode of reproducible generator
	      failure.

       -Y Xtrategy - the Xtrategy flag controls  the  new  "test  to  failure"
       (T2F)
	      modes.  These flags and their modes act as follows:

		0  -  just run dieharder with the specified number of tsamples
	      and psamples, do not dynamically modify a run based on  results.
	      This is the way it has always run, and is the default.

		1  - "resolve ambiguity" (RA) mode.  If a test returns "weak",
	      this is an undesired result.  What does that  mean,  after  all?
	      If  you  run  a  long  test series, you will see occasional weak
	      returns for a perfect generators because p is uniformly distrib-
	      uted  and  will appear in any finite interval from time to time.
	      Even if a test run returns more than one weak result, you cannot
	      be certain that the generator is failing.  RA mode adds psamples
	      (usually in blocks of 100) until the test result ends up solidly
	      not  weak  or  proceeds to unambiguous failure.  This is morally
	      equivalent to running the test several times to see  if  a  weak
	      result  is  reproducible,  but  eliminates  the bias of personal
	      judgement in the process since the default failure threshold  is
	      very small and very unlikely to be reached by random chance even
	      in many runs.

	      This option should only be used with -k 2.

		2 - "test to destruction" mode.  Sometimes you	just  want  to
	      know  where  or if a generator will .I ever fail a test (or test
	      series).	-Y 2 causes psamples to be added 100 at a time until a
	      test  returns an overall pvalue lower than the failure threshold
	      or a specified maximum number of psamples (see -P) is reached.

	      Note well!  In this mode one may well fail due to the  alternate
	      null  hypothesis	--  the  test  itself is a bad test and fails!
	      Many dieharder tests, despite our best efforts, are  numerically
	      unstable	or  have only approximately known target statistics or
	      are straight up asymptotic results, and will eventually return a
	      failing result even for a gold-standard generator (such as AES),
	      or for the hypercautious the XOR generator with AES,  threefish,
	      kiss,  all  loaded  at once and xor'd together.  It is therefore
	      safest to use this mode .I comparatively, executing a T2D run on
	      AES to get an idea of the test failure threshold(s) (something I
	      will eventually do and publish on the web so  everybody  doesn't
	      have  to do it independently) and then running it on your target
	      generator.  Failure with numbers of psamples within an order  of
	      magnitude  of  the  AES thresholds should probably be considered
	      possible test failures, not  generator  failures.   Failures  at
	      levels significantly less than the known gold standard generator
	      failure thresholds are, of course, probably failures of the gen-
	      erator.

	      This option should only be used with -k 2.

       -v verbose flag -- controls the verbosity of the output for debugging
	      only.   Probably of little use to non-developers, and developers
	      can read the enum(s) in dieharder.h and the test sources to  see
	      which flag values turn on output on which routines.  1 is result
	      in a highly detailed trace of program activity.

       -x,-y,-z number - Some tests have parameters that can safely be varied
	      from their default value.  For example, in the diehard birthdays
	      test,  one can vary the number of length, which can also be var-
	      ied.  -x 2048 -y 30 alters these two values but should still run
	      fine.   These  parameters should be documented internally (where
	      they exist) in the e.g. -d 0 -h visible notes.

	      NOTE WELL: The assessment(s) for the rngs may, in fact, be  com-
	      pletely incorrect or misleading.	There are still "bad tests" in
	      dieharder, although we are working to fix and improve them  (and
	      try  to  document  them in the test descriptions visible with -g
	      testnumber -h).  In particular, 'Weak' pvalues should occur  one
	      test  in two hundred, and 'Failed' pvalues should occur one test
	      in a million with the default thresholds - that's what p	MEANS.
	      Use them at your Own Risk!  Be Warned!

	      Or  better  yet,	use the new -Y 1 and -Y 2 resolve ambiguity or
	      test to destruction modes above, comparing to  similar  runs  on
	      one  of  the as-good-as-it-gets cryptographic generators, AES or
	      threefish.

DESCRIPTION
       dieharder

       Welcome to the current snapshot of the dieharder random number  tester.
       It  encapsulates  all of the Gnu Scientific Library (GSL) random number
       generators (rngs) as well as a number of generators from the R  statis-
       tical  library,	hardware sources such as /dev/*random, "gold standard"
       cryptographic quality generators (useful for testing dieharder and  for
       purposes  of  comparison  to new generators) as well as generators con-
       tributed by users or found in the literature into a single harness that
       can  time them and subject them to various tests for randomness.  These
       tests are variously drawn from George Marsaglia's "Diehard  battery  of
       random  number  tests", the NIST Statistical Test Suite, and again from
       other sources such as  personal	invention,  user  contribution,  other
       (open source) test suites, or the literature.

       The  primary  point  of	dieharder  is to make it easy to time and test
       (pseudo)random number generators, including both software and  hardware
       rngs,  with  a  fully  open  source  tool.   In	addition  to providing
       "instant" access to testing  of	all  built-in  generators,  users  can
       choose  one of three ways to test their own random number generators or
       sources:  a unix pipe of a raw binary (presumed	random)  bitstream;  a
       file  containing  a (presumed random) raw binary bitstream or formatted
       ascii uints or floats; and embedding your generator in dieharder's GSL-
       compatible  rng	harness  and adding it to the list of built-in genera-
       tors.  The stdin and file input methods are described  below  in  their
       own section, as is suggested "best practice" for newbies to random num-
       ber generator testing.

       An important motivation for using dieharder is  that  the  entire  test
       suite  is  fully  Gnu  Public  License (GPL) open source code and hence
       rather than being prohibited from "looking  underneath  the  hood"  all
       users  are  openly  encouraged to critically examine the dieharder code
       for errors, add new tests or generators or user interfaces, or  use  it
       freely  as is to test their own favorite candidate rngs subject only to
       the constraints of the GPL.  As a result  of  its  openness,  literally
       hundreds  of  improvements and bug fixes have been contributed by users
       to date, resulting in a far stronger and more reliable test suite  than
       would  have  been  possible with closed and locked down sources or even
       open sources (such as STS) that lack the dynamical  feedback  mechanism
       permitting corrections to be shared.

       Even  small  errors  in test statistics permit the alternative (usually
       unstated) null hypothesis to become an important factor in rng  testing
       -- the unwelcome possibility that your generator is just fine but it is
       the test that is failing.  One extremely useful feature of dieharder is
       that  it is at least moderately self validating.  Using the "gold stan-
       dard" aes and threefish cryptographic generators, you can  observe  how
       these  generators  perform on dieharder runs to the same general degree
       of accuracy that you wish to use on the generators you are testing.  In
       general,  dieharder  tests that consistently fail at any given level of
       precision (selected with e.g. -a -m 10) on both of  the	gold  standard
       rngs (and/or the better GSL generators, mt19937, gfsr4, taus) are prob-
       ably unreliable at that precision and it would hardly be surprising  if
       they failed your generator as well.

       Experts	in  statistics are encouraged to give the suite a try, perhaps
       using any of the example calls below at first and then using it	freely
       on  their  own  generators  or as a harness for adding their own tests.
       Novices (to either statistics or random number generator  testing)  are
       strongly  encouraged  to read the next section on p-values and the null
       hypothesis and running the test suite a few times with a  more  verbose
       output report to learn how the whole thing works.

QUICK START EXAMPLES
       Examples  for  how  to set up pipe or file input are given below.  How-
       ever, it is recommended that a user play with some of the built in gen-
       erators	to  gain  familiarity  with dieharder reports and tests before
       tackling their own favorite generator or file full of  possibly	random
       numbers.

       To see dieharder's default standard test report for its default genera-
       tor (mt19937) simply run:

	  dieharder -a

       To increase the resolution of possible failures of the standard	-a(ll)
       test,  use  the -m "multiplier" for the test default numbers of pvalues
       (which are selected more to make a full test run take  an  hour	or  so
       instead	of  days than because it is truly an exhaustive test sequence)
       run:

	  dieharder -a -m 10

       To test a different generator (say the gold  standard  AES_OFB)	simply
       specify the generator on the command line with a flag:

	  dieharder -g 205 -a -m 10

       Arguments  can  be in any order.  The generator can also be selected by
       name:

	  dieharder -g AES_OFB -a

       To apply only the diehard opso test to the AES_OFB  generator,  specify
       the test by name or number:

	  dieharder -g 205 -d 5

       or

	  dieharder -g 205 -d diehard_opso

       Nearly  every  aspect  or  field in dieharder's output report format is
       user-selectable by means of display option  flags.   In	addition,  the
       field  separator character can be selected by the user to make the out-
       put particularly easy for them to parse (-c  '  ')  or  import  into  a
       spreadsheet (-c ',').  Try:

	  dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues

       to see an extremely terse, easy to import report or

	  dieharder  -g  205 -d diehard_opso -c ' ' -D default -D histogram -D
       description

       to see a verbose report good for a  "beginner"  that  includes  a  full
       description of each test itself.

       Finally, the dieharder binary is remarkably autodocumenting even if the
       man page is not available. All users should try the following  commands
       to see what they do:

	  dieharder -h

       (prints the command synopsis like the one above).

	  dieharder -a -h
	  dieharder -d 6 -h

       (prints the test descriptions only for -a(ll) tests or for the specific
       test indicated).

	  dieharder -l

       (lists all known tests, including how reliable rgb thinks that they are
       as things stand).

	  dieharder -g -1

       (lists all known rngs).

	  dieharder -F

       (lists  all  the currently known display/output control flags used with
       -D).

       Both beginners and experts should be aware that the assessment provided
       by  dieharder in its standard report should be regarded with great sus-
       picion.	It is entirely possible for a generator to "pass" all tests as
       far  as their individual p-values are concerned and yet to fail utterly
       when considering them all together.  Similarly, it is probable  that  a
       rng  will  at  the very least show up as "weak" on 0, 1 or 2 tests in a
       typical -a(ll) run, and may even "fail" 1 test one such run  in	10  or
       so.   To understand why this is so, it is necessary to understand some-
       thing of rng testing, p-values, and the null hypothesis!

P-VALUES AND THE NULL HYPOTHESIS
       dieharder returns "p-values".  To understand what a p-value is and  how
       to use it, it is essential to understand the null hypothesis, H0.

       The null hypothesis for random number generator testing is "This gener-
       ator is a perfect random number generator, and for any choice  of  seed
       produces  a  infinitely	long, unique sequence of numbers that have all
       the expected statistical properties of random numbers, to all  orders".
       Note  well  that  we know that this hypothesis is technically false for
       all software generators as they are periodic and do not have  the  cor-
       rect entropy content for this statement to ever be true.  However, many
       hardware generators fail a priori as well, as they contain subtle  bias
       or  correlations  due to the deterministic physics that underlies them.
       Nature is often unpredictable but it is rarely random and the two words
       don't (quite) mean the same thing!

       The  null  hypothesis  can be practically true, however.  Both software
       and hardware generators can be "random"	enough	that  their  sequences
       cannot  be  distinguished from random ones, at least not easily or with
       the available tools (including dieharder!) Hence the null hypothesis is
       a practical, not a theoretically pure, statement.

       To  test  H0  ,	one uses the rng in question to generate a sequence of
       presumably random numbers.  Using these numbers one  can  generate  any
       one  of a wide range of test statistics -- empirically computed numbers
       that are considered random samples that may or  may  not  be  covariant
       subject	to  H0,  depending  on whether overlapping sequences of random
       numbers are used to generate successive samples	while  generating  the
       statistic(s), drawn from a known distribution.  From a knowledge of the
       target distribution of the statistic(s) and the	associated  cumulative
       distribution  function  (CDF)  and  the empirical value of the randomly
       generated statistic(s), one can read off the probability  of  obtaining
       the  empirical result if the sequence was truly random, that is, if the
       null hypothesis is true and the generator in question is a "good"  ran-
       dom  number  generator!	This probability is the "p-value" for the par-
       ticular test run.

       For example, to test a coin (or a sequence of  bits)  we  might	simply
       count the number of heads and tails in a very long string of flips.  If
       we assume that the coin is a "perfect coin", we expect  the  number  of
       heads and tails to be binomially distributed and can easily compute the
       probability of getting any particular number of heads and tails.  If we
       compare	our recorded number of heads and tails from the test series to
       this distribution and find that the probability of getting the count we
       obtained  is very low with, say, way more heads than tails we'd suspect
       the coin wasn't a perfect coin.	dieharder applies this very test (made
       mathematically precise) and many others that operate on this same prin-
       ciple to the string of random bits produced by the rng being tested  to
       provide a picture of how "random" the rng is.

       Note  that  the	usual dogma is that if the p-value is low -- typically
       less than 0.05 -- one "rejects" the null hypothesis.  In a word, it  is
       improbable that one would get the result obtained if the generator is a
       good one.  If it is any other value, one does not "accept" the  genera-
       tor  as	good, one "fails to reject" the generator as bad for this par-
       ticular test.  A "good random number generator" is hence  one  that  we
       haven't been able to make fail yet!

       This  criterion	is, of course, naive in the extreme and cannot be used
       with dieharder!	It makes just as much sense to reject a generator that
       has p-values of 0.95 or more!  Both of these p-value ranges are equally
       unlikely on any given test run, and should be returned for (on average)
       5%  of all test runs by a perfect random number generator.  A generator
       that fails to produce p-values less than 0.05 5%  of  the  time	it  is
       tested  with different seeds is a bad random number generator, one that
       fails the test of the null hypothesis.  Since  dieharder  returns  over
       100  pvalues  by  default per test, one would expect any perfectly good
       rng to "fail" such a naive test around five times by this criterion  in
       a single dieharder run!

       The  p-values  themselves,  as  it  turns out, are test statistics!  By
       their nature, p-values should be uniformly  distributed	on  the  range
       0-1.   In 100+ test runs with independent seeds, one should not be sur-
       prised to obtain 0, 1, 2, or even (rarely) 3 p-values less  than  0.01.
       On  the other hand obtaining 7 p-values in the range 0.24-0.25, or see-
       ing that 70 of the p-values are greater than 0.5 should make the gener-
       ator highly suspect!  How can a user determine when a test is producing
       "too many" of any particular value range for p?	Or too few?

       Dieharder does it for you, automatically.  One can in  fact  convert  a
       set  of	p-values into a p-value by comparing their distribution to the
       expected one, using a Kolmogorov-Smirnov test against the expected uni-
       form distribution of p.

       These  p-values	obtained  from looking at the distribution of p-values
       should in turn be uniformly distributed and could in principle be  sub-
       jected to still more KS tests in aggregate.  The distribution of p-val-
       ues for a good generator should be idempotent,  even  across  different
       test statistics and multiple runs.

       A  failure  of the distribution of p-values at any level of aggregation
       signals trouble.  In fact, if the p-values of any given test  are  sub-
       jected  to  a  KS  test,  and those p-values are then subjected to a KS
       test, as we add more p-values to either level we  will  either  observe
       idempotence  of	the  resulting	distribution of p to uniformity, or we
       will observe idempotence to a single p-value of zero!  That is, a  good
       generator  will	produce a roughly uniform distribution of p-values, in
       the specific sense that the p-values of the distributions  of  p-values
       are themselves roughly uniform and so on ad infinitum, while a bad gen-
       erator will produce a non-uniform distribution of p-values, and as more
       p-values  drawn	from  the non-uniform distribution are added to its KS
       test, at some point the failure will be absolutely unmistakeable as the
       resulting p-value approaches 0 in the limit.  Trouble indeed!

       The question is, trouble with what?  Random number tests are themselves
       complex computational objects, and there is a  probability  that  their
       code  is  incorrectly framed or that roundoff or other numerical -- not
       methodical -- errors are contributing to a distortion of the  distribu-
       tion  of  some  of the p-values obtained.  This is not an idle observa-
       tion; when one works on writing random number  generator  testing  pro-
       grams, one is always testing the tests themselves with "good" (we hope)
       random number generators so that egregious failures of the null hypoth-
       esis  signal  not  a  bad generator but an error in the test code.  The
       null hypothesis above is correctly framed from a theoretical  point  of
       view, but from a real and practical point of view it should read: "This
       generator is a perfect random number generator, and for any  choice  of
       seed  produces  a infinitely long, unique sequence of numbers that have
       all the expected statistical  properties  of  random  numbers,  to  all
       orders and this test is a perfect test and returns precisely correct p-
       values from the test computation."  Observed "failure"  of  this  joint
       null  hypothesis  H0'  can come from failure of either or both of these
       disjoint components, and comes from the second as often or  more  often
       than the first during the test development process.  When one cranks up
       the "resolution" of the test (discussed	next)  to  where  a  generator
       starts to fail some test one realizes, or should realize, that develop-
       ment never ends and that new test regimes will always reveal new  fail-
       ures not only of the generators but of the code.

       With  that  said, one of dieharder's most significant advantages is the
       control that it gives you over a critical  test	parameter.   From  the
       remarks	above, we can see that we should feel very uncomfortable about
       "failing" any given random number generator on the basis of  a  5%,  or
       even  a	1%,  criterion,  especially  when  we  apply a test suite like
       dieharder that returns over 100 (and climbing) distinct	test  p-values
       as  of the last snapshot.  We want failure to be unambiguous and repro-
       ducible!

       To accomplish this, one can simply crank up its resolution.  If we  ran
       any  given  test against a random number generator and it returned a p-
       value of (say) 0.007328, we'd be perfectly justified in wondering if it
       is  really  a good generator.  However, the probability of getting this
       result isn't really all that small -- when one uses dieharder for hours
       at a time numbers like this will definitely happen quite frequently and
       mean nothing.  If one runs the same test again (with a  different  seed
       or  part  of the random sequence) and gets a p-value of 0.009122, and a
       third time and gets 0.002669 -- well, that's three 1% (or  less)  shots
       in  a  row and that should happen only one in a million times.  One way
       to clearly resolve failures, then, is to increase the number of	p-val-
       ues  generated  in  a  test run.  If the actual distribution of p being
       returned by the test is not uniform, a KS test will eventually return a
       p-value	that  is  not some ambiguous 0.035517 but is instead 0.000000,
       with the latter produced time after time as we rerun.

       For this reason, dieharder is extremely conservative  about  announcing
       rng  "weakness" or "failure" relative to any given test.  It's internal
       criterion for these things are currently p < 0.5% or p > 99.5% weakness
       (at the 1% level total) and a considerably more stringent criterion for
       failure: p < 0.05% or p > 99.95%.  Note well that the ranges  are  sym-
       metric  --  too	high a value of p is just as bad (and unlikely) as too
       low, and it is critical to flag it, because it is quite possible for  a
       rng  to be too good, on average, and not to produce enough low p-values
       on the full spectrum of dieharder  tests.   This  is  where  the  final
       kstest is of paramount importance, and where the "histogram" option can
       be very useful to help you visualize the failure in the distribution of
       p -- run e.g.:

	 dieharder [whatever] -D default -D histogram

       and you will see a crude ascii histogram of the pvalues that failed (or
       passed) any given level of test.

       Scattered reports of weakness or  marginal  failure  in	a  preliminary
       -a(ll)  run should therefore not be immediate cause for alarm.  Rather,
       they are tests to repeat, to watch out for, to push the rng  harder  on
       using  the -m option to -a or simply increasing -p for a specific test.
       Dieharder permits one to increase the number of p-values generated  for
       any  test,  subject  only  to the availability of enough random numbers
       (for file based tests) and time, to make failures unambiguous.  A  test
       that  is  truly	weak  at -p 100 will almost always fail egregiously at
       some larger value of psamples, be it -p 1000 or	-p  100000.   However,
       because dieharder is a research tool and is under perpetual development
       and testing, it is strongly suggested  that  one  always  consider  the
       alternative  null  hypothesis  --  that the failure is a failure of the
       test code in dieharder itself in some limit of  large  numbers  --  and
       take  at  least	some  steps (such as running the same test at the same
       resolution on a "gold standard" generator) to ensure that  the  failure
       is indeed probably in the rng and not the dieharder code.

       Lacking a source of perfect random numbers to use as a reference, vali-
       dating the tests themselves is not easy and always leaves one with some
       ambiguity (even aes or threefish).  During development the best one can
       usually do is to rely heavily on these "presumed  good"	random	number
       generators.   There are a number of generators that we have theoretical
       reasons to expect to be extraordinarily good and to  lack  correlations
       out  to	some  known  underlying dimensionality, and that also test out
       extremely well quite consistently.  By using  several  such  generators
       and  not just one, one can hope that those generators have (at the very
       least) different correlations and should not all uniformly fail a  test
       in  the	same  way  and	with the same number of p-values.  When all of
       these generators consistently fail a test at a given level, I  tend  to
       suspect	that  the  problem  is	in  the test code, not the generators,
       although it is very  difficult  to  be  certain,  and  many  errors  in
       dieharder's code have been discovered and ultimately fixed in just this
       way by myself or others.

       One advantage of dieharder is that it has a number of these "good  gen-
       erators" immediately available for comparison runs, courtesy of the Gnu
       Scientific Library and user  contribution  (notably  David  Bauer,  who
       kindly  encapsulated aes and threefish).  I use AES_OFB, Threefish_OFB,
       mt19937_1999, gfsr4, ranldx2 and taus2 (as well as "true  random"  num-
       bers  from  random.org)	for  this  purpose,  and  I try to ensure that
       dieharder will "pass" in particular the -g 205 -S 1 -s 1  generator  at
       any reasonable p-value resolution out to -p 1000 or farther.

       Tests (such as the diehard operm5 and sums test) that consistently fail
       at these high resolutions are flagged as being  "suspect"  --  possible
       failures  of  the  alternative null hypothesis -- and they are strongly
       deprecated!  Their results should not be used  to  test	random	number
       generators pending agreement in the statistics and random number commu-
       nity that those tests are in fact valid and correct  so	that  observed
       failures  can  indeed safely be attributed to a failure of the intended
       null hypothesis.

       As I keep emphasizing (for good reason!) dieharder  is  community  sup-
       ported.	 I  therefore  openly  ask that the users of dieharder who are
       expert in statistics to help me fix the code or algorithms being imple-
       mented.	I would like to see this test suite ultimately be validated by
       the general statistics community in hard use in	an  open  environment,
       where every possible failure of the testing mechanism itself is subject
       to scrutiny and eventual correction.  In this way  we  will  eventually
       achieve	a very powerful suite of tools indeed, ones that may well give
       us very specific information not just about failure but of the mode  of
       failure as well, just how the sequence tested deviates from randomness.

       Thus  far,  dieharder  has  benefitted tremendously from the community.
       Individuals have openly contributed tests, new generators to be tested,
       and  fixes for existing tests that were revealed by their own work with
       the testing instrument.	Efforts are underway to  make  dieharder  more
       portable  so  that  it  will build on more platforms and faster so that
       more thorough testing can be done.  Please feel free to participate.

FILE INPUT
       The simplest way to use dieharder with an external generator that  pro-
       duces  raw binary (presumed random) bits is to pipe the raw binary out-
       put from this generator (presumed to be	a  binary  stream  of  32  bit
       unsigned integers) directly into dieharder, e.g.:

	 cat /dev/urandom | ./dieharder -a -g 200

       Go  ahead and try this example.	It will run the entire dieharder suite
       of tests on  the  stream  produced  by  the  linux  built-in  generator
       /dev/urandom (using /dev/random is not recommended as it is too slow to
       test in a reasonable amount of time).

       Alternatively, dieharder can be used to test files of numbers  produced
       by a candidate random number generators:

	 dieharder -a -g 201 -f random.org_bin

       for raw binary input or

	 dieharder -a -g 202 -f random.org.txt

       for formatted ascii input.

       A  formatted  ascii input file can accept either uints (integers in the
       range 0 to 2^31-1, one per line) or decimal uniform  deviates  with  at
       least ten significant digits (that can be multiplied by UINT_MAX = 2^32
       to produce a uint without  dropping  precition),  also  one  per  line.
       Floats  with  fewer  digits  will almost certainly fail bitlevel tests,
       although they may pass some of the tests that act on uniform deviates.

       Finally, one can fairly easily wrap any generator  in  the  same  (GSL)
       random  number  harness used internally by dieharder and simply test it
       the same way one would  any  other  internal  generator	recognized  by
       dieharder.   This is strongly recommended where it is possible, because
       dieharder needs to use a lot of random numbers  to  thoroughly  test  a
       generator.  A built in generator can simply let dieharder determine how
       many it needs and generate them on demand, where a  file  that  is  too
       small  will  "rewind" and render the test results where a rewind occurs
       suspect.

       Note well that file input rands are delivered to the tests  on  demand,
       but  if	the  test  needs more than are available it simply rewinds the
       file and cycles through it again,  and  again,  and  again  as  needed.
       Obviously  this	significantly reduces the sample space and can lead to
       completely incorrect results for the p-value  histograms  unless  there
       are enough rands to run EACH test without repetition (it is harmless to
       reuse the sequence for different tests).  Let the user beware!

BEST PRACTICE
       A frequently asked question from new users wishing to test a  generator
       they  are  working  on for fun or profit (or both) is "How should I get
       its  output  into  dieharder?"	This  is  a  nontrivial  question,  as
       dieharder  consumes  enormous  numbers of random numbers in a full test
       cycle, and then there are features like -m 10 or -m 100	that  let  one
       effortlessly  demand  10 or 100 times as many to stress a new generator
       even more.

       Even with large file support in dieharder, it is difficult  to  provide
       enough  random numbers in a file to really make dieharder happy.  It is
       therefore strongly suggested that you either:

       a) Edit the output stage of your random number generator and get it  to
       write its production to stdout as a random bit stream -- basically cre-
       ate 32 bit unsigned random integers and write them directly  to	stdout
       as  e.g.  char  data  or raw binary.  Note that this is not the same as
       writing raw floating point numbers (that will not be random at all as a
       bitstream) and that "endianness" of the uints should not matter for the
       null hypothesis of a "good" generator, as random bytes  are  random  in
       any  order.  Crank the generator and feed this stream to dieharder in a
       pipe as described above.

       b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your
       generator  (or  calls  to your generator's hardware interface).	Follow
       the examples in the ./dieharder source directory to add it as a	"user"
       generator in the command line interface, rebuild, and invoke the gener-
       ator as a "native" dieharder generator (it should appear  in  the  list
       produced by -g -1 when done correctly).	The advantage of doing it this
       way is that you can then (if your new generator is  highly  successful)
       contribute  it  back to the dieharder project if you wish!  Not to men-
       tion the fact that it makes testing it very easy.

       Most users will probably go with option a) at least initially,  but  be
       aware  that  b) is probably easier than you think.  The dieharder main-
       tainers may be able to give you a hand with it if you get into trouble,
       but no promises.

WARNING!
       A warning for those who are testing files of random numbers.  dieharder
       is a tool that tests random number generators, not files of random num-
       bers!  It is extremely inappropriate to try to "certify" a file of ran-
       dom numbers as being random just because it fails to "fail" any of  the
       dieharder  tests in e.g. a dieharder -a run.  To put it bluntly, if one
       rejects all such files that fail any test at the  0.05  level  (or  any
       other),	the one thing one can be certain of is that the files in ques-
       tion are not random, as a truly random sequence would  fail  any  given
       test at the 0.05 level 5% of the time!

       To put it another way, any file of numbers produced by a generator that
       "fails to fail" the dieharder suite should be considered "random", even
       if  it contains sequences that might well "fail" any given test at some
       specific cutoff.  One has to presume that passing the broader tests  of
       the  generator itself, it was determined that the p-values for the test
       involved was globally correctly distributed, so that  e.g.  failure  at
       the  0.01  level  occurs  neither more nor less than 1% of the time, on
       average, over many many tests.  If  one	particular  file  generates  a
       failure	at  this  level, one can therefore safely presume that it is a
       random file pulled from many thousands of similar files	the  generator
       might create that have the correct distribution of p-values at all lev-
       els of testing and aggregation.

       To sum up, use dieharder to validate your  generator  (via  input  from
       files  or an embedded stream).  Then by all means use your generator to
       produce files or streams of random numbers.  Do not use dieharder as an
       accept/reject tool to validate the files themselves!

EXAMPLES
       To demonstrate all tests, run on the default GSL rng, enter:

	 dieharder -a

       To  demonstrate	a test of an external generator of a raw binary stream
       of bits, use the stdin (raw) interface:

	 cat /dev/urandom | dieharder -g 200 -a

       To use it with an ascii formatted file:

	 dieharder -g 202 -f testrands.txt -a

       (testrands.txt should consist of a header such as:

	#==================================================================
	# generator mt19937_1999  seed = 1274511046
	#==================================================================
	type: d
	count: 100000
	numbit: 32
	3129711816
	  85411969
	2545911541

       etc.).

       To use it with a binary file

	 dieharder -g 201 -f testrands.bin -a

       or

	 cat testrands.bin | dieharder -g 200 -a

       An example that demonstrates the use of "prefixes" on the output  lines
       that  make  it relatively easy to filter off the different parts of the
       output report and chop them up into numbers that can be used  in  other
       programs or in spreadsheets, try:

	 dieharder -a -c ',' -D default -D prefix

DISPLAY OPTIONS
       As  of version 3.x.x, dieharder has a single output interface that pro-
       duces tabular data per test, with common information in	headers.   The
       display	control  options and flags can be used to customize the output
       to your individual specific needs.

       The options are controlled by binary flags.  The flags, and their  text
       versions, are displayed if you enter:

	 dieharder -F

       by itself on a line.

       The  flags  can	be  entered  all  at once by adding up all the desired
       option flags.  For example, a very sparse output could be  selected  by
       adding the flags for the test_name (8) and the associated pvalues (128)
       to get 136:

	 dieharder -a -D 136

       Since the flags are cumulated from zero (unless no flag is entered  and
       the default is used) you could accomplish the same display via:

	 dieharder -a -D 8 -D pvalues

       Note  that you can enter flags by value or by name, in any combination.
       Because people use dieharder to obtain values and then with  to	export
       them into spreadsheets (comma separated values) or into filter scripts,
       you can chance the field separator character.  For example:

	 dieharder -a -c ',' -D default -D -1 -D -2

       produces output that is ideal for importing into  a  spreadsheet  (note
       that one can subtract field values from the base set of fields provided
       by the default option as long as it is given first).

       An interesting option is the -D prefix flag, which  turns  on  a  field
       identifier  prefix  to  make  it easy to filter out particular kinds of
       data.  However, it is equally easy to turn on any  particular  kind  of
       output to the exclusion of others directly by means of the flags.

       Two other flags of interest to novices to random number generator test-
       ing are the -D histogram (turns on a histogram of the underlying  pval-
       ues,  per  test)  and -D description (turns on a complete test descrip-
       tion, per test).  These flags turn the output  table  into  more  of  a
       series of "reports" of each test.

PUBLICATION RULES
       dieharder  is  entirely	original  code and can be modified and used at
       will by any user, provided that:

	 a) The original copyright notices are maintained and that the source,
       including  all  modifications, is made publically available at the time
       of any derived publication.  This is open source software according  to
       the  precepts and spirit of the Gnu Public License.  See the accompany-
       ing file COPYING, which also must accompany any redistribution.

	 b) The primary author of the code (Robert G. Brown) is  appropriately
       acknowledged and referenced in any derived publication.	It is strongly
       suggested that George Marsaglia and the Diehard suite and  the  various
       authors	of  the  Statistical  Test  Suite  be  similarly acknowledged,
       although this suite shares no actual code with these random number test
       suites.

	 c)  Full responsibility for the accuracy, suitability, and effective-
       ness of the program rests with  the  users  and/or  modifiers.	As  is
       clearly stated in the accompanying copyright.h:

       THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFT-
       WARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND  FITNESS,
       IN  NO  EVENT  SHALL  THE  COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL,
       INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES  WHATSOEVER  RESULTING
       FROM  LOSS  OF  USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
       NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF  OR	IN  CONNECTION
       WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

ACKNOWLEDGEMENTS
       The  author of this suite gratefully acknowledges George Marsaglia (the
       author of the diehard test suite) and the various authors of NIST  Spe-
       cial Publication 800-22 (which describes the Statistical Test Suite for
       testing pseudorandom number generators for cryptographic applications),
       for  excellent  descriptions  of the tests therein.  These descriptions
       enabled this suite to be developed with a GPL.

       The author also wishes to reiterate that the academic  correctness  and
       accuracy  of the implementation of these tests is his sole responsibil-
       ity and not that of the authors of the Diehard or STS suites.  This  is
       especially  true where he has seen fit to modify those tests from their
       strict original descriptions.

COPYRIGHT
       GPL 2b; see the file COPYING that accompanies the source of  this  pro-
       gram.   This  is  the "standard Gnu General Public License version 2 or
       any later version", with the one minor (humorous) "Beverage"  modifica-
       tion listed below.  Note that this modification is probably not legally
       defensible and can be followed really  pretty  much  according  to  the
       honor rule.

       As  to my personal preferences in beverages, red wine is great, beer is
       delightful, and Coca Cola or coffee or tea or even milk	acceptable  to
       those  who for religious or personal reasons wish to avoid stressing my
       liver.

       The Beverage Modification to the GPL:

       Any satisfied user of this software shall,  upon  meeting  the  primary
       author(s)  of  this  software  for the first time under the appropriate
       circumstances, offer to buy him or her or them a beverage.  This bever-
       age  may or may not be alcoholic, depending on the personal ethical and
       moral views of the offerer.  The beverage cost need not exceed one U.S.
       dollar (although it certainly may at the whim of the offerer:-) and may
       be accepted or declined with no further obligation on the part  of  the
       offerer.  It is not necessary to repeat the offer after the first meet-
       ing, but it can't hurt...

dieharder		Copyright 2003 Robert G. Brown		  dieharder(1)