bernoulli

The bernoulli operator lets you choose a subset of cells from an array by a random (Bernoulli) process.

Synopsis

bernoulli( array, probability [, seed] )

Summary

The bernoulli operator evaluates each cell by generating a random number and seeing if it lies in the range (0, probability). If so, the cell is included. Use the optional, integer seed parameter to reproduce results; using the same seed on the same array with the same configuration will return identical results.  Please note that different instance configurations and different distributions will yield different results.  Also note that because bernoulli does not accept replicated input, its behavior in that case will change depending on the context in which it is used.  For example, if used inside a store, as in store(bernoulli(array,...), OUT, distribution: row_cyclic), then the selection behavior of bernoulli will be the same as for a row_cyclic input array.

Use the bernoulli operator when a sample statistic is sufficient. That is, if the array is very large using the bernoulli operator yields a fast, approximate result.

Examples

To select a Bernoulli Random Sample of cells from a 5×5 array with a probability of inclusion = 0.5, do the following:

  1. Create an array called data:

    AFL% CREATE ARRAY data<val:double>[i=0:4; j=0:4];
  2. Store values of 1–25 in data:

    AFL% store(build(data,i*5+1+j), data); 


    The output is:

    {i,j} val
    {0,0} 1
    {0,1} 2
    {0,2} 3
    {0,3} 4
    {0,4} 5
    {1,0} 6
    {1,1} 7
    {1,2} 8
    {1,3} 9
    {1,4} 10
    {2,0} 11
    {2,1} 12
    {2,2} 13
    {2,3} 14
    {2,4} 15
    {3,0} 16
    {3,1} 17
    {3,2} 18
    {3,3} 19
    {3,4} 20
    {4,0} 21
    {4,1} 22
    {4,2} 23
    {4,3} 24
    {4,4} 25

     

  3. Select cells at random with a probability of 0.5 that a cell will be included. Each successive call to bernoulli will return different results.

    AFL% bernoulli(data,0.5); 


    The output might look like this:

    {i,j} val
    {0,1} 2
    {0,3} 4
    {0,4} 5
    {1,3} 9
    {2,0} 11
    {2,1} 12
    {2,2} 13
    {2,3} 14
    {3,1} 17
    {3,4} 20
    {4,2} 23
    {4,4} 25


    Repeating the query:

    AFL% bernoulli(data,0.5); 


    Yields different output:

    {i,j} val
    {0,3} 4
    {1,2} 8
    {1,3} 9
    {2,0} 11
    {2,1} 12
    {2,2} 13
    {2,4} 15
    {3,0} 16
    {3,4} 20
    {4,1} 22
    {4,3} 24
    {4,4} 25


  4. To generate the same results repeatedly, use the same distribution and a seed value. Seeds must be an integer on the interval [0, INT_MAX].

    AFL% bernoulli(data,0.5,15); 


    The output for the hashed distribution and seed=15 is:

    {i,j} val
    {0,2} 3
    {1,0} 6
    {1,4} 10
    {2,0} 11
    {2,3} 14
    {3,1} 17
    {3,3} 19
    {4,1} 22
    {4,2} 23
    {4,3} 24


    Repeating the query:

    AFL% bernoulli(data,0.5,15);


    Yields the same output:

    {i,j} val
    {0,2} 3
    {1,0} 6
    {1,4} 10
    {2,0} 11
    {2,3} 14
    {3,1} 17
    {3,3} 19
    {4,1} 22
    {4,2} 23
    {4,3} 24

    Â