summarize

The summarize operator quickly computes chunk density, size and skew statistics over an array.

Synopsis

summarize( array [, by_instance:true/false] [, by_attribute:true/false] );


The two optional boolean parameters default to false.

Summary

summarize quickly returns several useful metrics with regard to an array.  In addition, these metrics may be broken out by instance or attribute or both by specifying the boolean parameters, by_instance and by_attribute.

The metrics returned are described as follows:

  • inst (dimension): the logical ID of the instance returning the data
  • attid (dimension): the attribute ID or 0 when returning totals across attributes
  • att: the string attribute name or 'all'
  • count: the total count of non-empty cells - equal across multiple attributes
  • bytes: the total number of used bytes
  • chunks: the total number of chunks - added up across multiple attributes
  • min/avg/max_count: statistics about the number of non-empty cells in each chunk
  • min/avg/max_bytes: statistics about the sizes of chunks

Examples

Using the Operator

To demonstrate summarize operator, do the following:

  1. Create an array with random values, two randomly distributed text values, and an attribute filled with zeros:

    AFL% store(apply(build(<val:double> [x=1:10000000,1000000,0], random()), val2, iif(x%2=0, 'abc','def'), val3, 0), temp)
  2. Summarize the array.

    AFL% "summarize(temp)"


    The output is:

    {inst,attid}   att,    count,     bytes, chunks, min_count, avg_count, max_count, min_bytes,   avg_bytes, max_bytes
    {0,0}        'all', 10000000, 170002720,     40,   1000000,     1e+06,   1000000,        48, 4.25007e+06,   9000072

    The output shows the totals and averages across all the cells in the array.

  3. Summarize by instance.

    AFL% summarize(temp, by_instance:true);


    The output is:

    {inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes
    {0,0} 'all',3000000,51000816,12,1000000,1e+06,1000000,48,4.25007e+06,9000072
    {1,0} 'all',3000000,51000816,12,1000000,1e+06,1000000,48,4.25007e+06,9000072
    {2,0} 'all',2000000,34000544,8,1000000,1e+06,1000000,48,4.25007e+06,9000072
    {3,0} 'all',2000000,34000544,8,1000000,1e+06,1000000,48,4.25007e+06,9000072
  4. Summarize by attribute.

    AFL% summarize(temp, by_attribute:true);


    The output is:

    {inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes
    {0,0} 'val',10000000,80000720,10,1000000,1e+06,1000000,8000072,8.00007e+06,8000072
    {0,1} 'val2',10000000,90000720,10,1000000,1e+06,1000000,9000072,9.00007e+06,9000072
    {0,2} 'val3',10000000,800,10,1000000,1e+06,1000000,80,80,80
    {0,3} 'EmptyTag',10000000,480,10,1000000,1e+06,1000000,48,48,48
  5. Summarize by both simultaneously.

    summarize(temp, by_attribute:true, by_instance=true);


    The output is:

    {inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes
    {0,0} 'val',3000000,24000216,3,1000000,1e+06,1000000,8000072,8.00007e+06,8000072
    {0,1} 'val2',3000000,27000216,3,1000000,1e+06,1000000,9000072,9.00007e+06,9000072
    {0,2} 'val3',3000000,240,3,1000000,1e+06,1000000,80,80,80
    {0,3} 'EmptyTag',3000000,144,3,1000000,1e+06,1000000,48,48,48
    {1,0} 'val',3000000,24000216,3,1000000,1e+06,1000000,8000072,8.00007e+06,8000072
    {1,1} 'val2',3000000,27000216,3,1000000,1e+06,1000000,9000072,9.00007e+06,9000072
    {1,2} 'val3',3000000,240,3,1000000,1e+06,1000000,80,80,80
    {1,3} 'EmptyTag',3000000,144,3,1000000,1e+06,1000000,48,48,48
    {2,0} 'val',2000000,16000144,2,1000000,1e+06,1000000,8000072,8.00007e+06,8000072
    {2,1} 'val2',2000000,18000144,2,1000000,1e+06,1000000,9000072,9.00007e+06,9000072
    {2,2} 'val3',2000000,160,2,1000000,1e+06,1000000,80,80,80
    {2,3} 'EmptyTag',2000000,96,2,1000000,1e+06,1000000,48,48,48
    {3,0} 'val',2000000,16000144,2,1000000,1e+06,1000000,8000072,8.00007e+06,8000072
    {3,1} 'val2',2000000,18000144,2,1000000,1e+06,1000000,9000072,9.00007e+06,9000072
    {3,2} 'val3',2000000,160,2,1000000,1e+06,1000000,80,80,80
    {3,3} 'EmptyTag',2000000,96,2,1000000,1e+06,1000000,48,48,48
  6. Remove the array:

    AFL% remove(temp);