quantile

 The quantile operator returns the quantiles of the specified array.

Synopsis

quantile( srcArray, q_num [ , attribute [ , dimension_n ... ] ] )

Summary

A q_quantile is a point taken at a specified interval on a sorted data set that divides the data set into q subsets. The quantiles are the data values marking the boundaries between consecutive subsets.

You specify the source array (srcArray) and the number of quantiles (q_num). Optionally, you can specify an attribute and a dimension for grouping. To group by one or more dimensions, you must specify the attribute.

Note the following: 

  • The quantile operator returns q_num+1 values, which correspond to the lower and upper bounds for each subset.
  • The quantile operator returns the same datatype as the attribute.
  • The q_num argument must be a positive integer. Otherwise SciDB returns an error.

Inputs

The quantile operator takes the following arguments:

  • srcArray: A source array with one or more attributes and one or more dimensions.
  • q_num: The number of quantiles.
  • attribute: An optional attribute to use for the quantiles. If you don't specify an attribute, SciDB uses the first one.
  • dimension_n: An optional list of dimensions to group by.

Examples

Calculate the 2-Quantile for a 1-Dimensional Array

To calculate the 2-quantile for a 1-dimensional array, do the following:

  1. Create a 1-dimensional array called quantile_array:

    AFL% create array quantile_array <val:int64>[i=0:10];


    The output is:

    Query was executed successfully. 
  2. Put eleven numerical values between 0 and 11 into quantile_array:

    AFL% store(build(quantile_array, '[10,3,0,3,4,5,9,11,7,3,3]', true), quantile_array);


    The output is:

    {i} val
    {0} 10
    {1} 3
    {2} 0
    {3} 3
    {4} 4
    {5} 5
    {6} 9
    {7} 11
    {8} 7
    {9} 3
    {10} 3

     

  3. Find the 2-quantile of quantile_array

    AFL% quantile(quantile_array,2);  


    The output is:

    {quantile} percentage,val_quantile
    {0} 0,0
    {1} 0.5,4
    {2} 1,11


  4. Remove the quantile_array

    AFL% remove(quantile_array);  


    The output is:

    Query was executed successfully

The Group-by-Dimension Parameter

To see/use the group-by-dimension parameter, do the following:

  1. Start with a 5x5 array, with a single, integer attribute:

    AFL% create array m5x5<val:int32>[i=0:4; j=0:4];


    The output is:

    Query was executed successfully. 
  2. Initialize the data in the array

    AFL% store(build(m5x5, '[[16,13,22,7,13],[11,19,23,21,24],[16,21,15,7,16],[10,19,0,23,23],[12,7,18,7,8]]', true), m5x5);


    The output is:

    {i,j} val
    {0,0} 16
    {0,1} 13
    {0,2} 22
    {0,3} 7
    {0,4} 13
    {1,0} 11
    {1,1} 19
    {1,2} 23
    {1,3} 21
    {1,4} 24
    {2,0} 16
    {2,1} 21
    {2,2} 15
    {2,3} 7
    {2,4} 16
    {3,0} 10
    {3,1} 19
    {3,2} 0
    {3,3} 23
    {3,4} 23
    {4,0} 12
    {4,1} 7
    {4,2} 18
    {4,3} 7
    {4,4} 8
  3. Find the 2-quantile of the array, and then by the first dimension, and then by the second dimension.

    AFL% quantile(m5x5,2);  


    The output is:

    {quantile} percentage,val_quantile
    {0} 0,0
    {1} 0.5,16
    {2} 1,24 


    AFL% quantile(m5x5,2,val,i); 


    The output is:

    {i,quantile} percentage,val_quantile
    {0,0} 0,7
    {0,1} 0.5,13
    {0,2} 1,22
    {1,0} 0,11
    {1,1} 0.5,21
    {1,2} 1,24
    {2,0} 0,7
    {2,1} 0.5,16
    {2,2} 1,21
    {3,0} 0,0
    {3,1} 0.5,19
    {3,2} 1,23
    {4,0} 0,7
    {4,1} 0.5,8
    {4,2} 1,18


    AFL% quantile(m5x5,2,val,j);


    The output is:

    {j,quantile} percentage,val_quantile
    {0,0} 0,10
    {0,1} 0.5,12
    {0,2} 1,16
    {1,0} 0,7
    {1,1} 0.5,19
    {1,2} 1,21
    {2,0} 0,0
    {2,1} 0.5,18
    {2,2} 1,23
    {3,0} 0,7
    {3,1} 0.5,7
    {3,2} 1,23
    {4,0} 0,8
    {4,1} 0.5,16
    {4,2} 1,24
  4. Remove the array 

    AFL% remove(m5x5);  


    The output is:

    Query was executed successfully