summarize
The summarize operator quickly computes chunk density, size and skew statistics over an array.
Synopsis
summarize( array [, by_instance:true/false] [, by_attribute:true/false] );
The two optional boolean parameters default to false.
Summary
summarize quickly returns several useful metrics with regard to an array. In addition, these metrics may be broken out by instance or attribute or both by specifying the boolean parameters, by_instance and by_attribute.
The metrics returned are described as follows:
- inst (dimension): the logical ID of the instance returning the data
- attid (dimension): the attribute ID or 0 when returning totals across attributes
- att: the string attribute name or 'all'
- count: the total count of non-empty cells - equal across multiple attributes
- bytes: the total number of used bytes
- chunks: the total number of chunks - added up across multiple attributes
- min/avg/max_count: statistics about the number of non-empty cells in each chunk
- min/avg/max_bytes: statistics about the sizes of chunks
Examples
Using the Operator
To demonstrate summarize operator, do the following:
Create an array with random values, two randomly distributed text values, and an attribute filled with zeros:
AFL% store(apply(build(<val:double> [x=1:10000000,1000000,0], random()), val2, iif(x%2=0, 'abc','def'), val3, 0), temp)
Summarize the array.
AFL% "summarize(temp)"
The output is:
{inst,attid} att, count, bytes, chunks, min_count, avg_count, max_count, min_bytes, avg_bytes, max_bytes {0,0} 'all', 10000000, 170002720, 40, 1000000, 1e+06, 1000000, 48, 4.25007e+06, 9000072
The output shows the totals and averages across all the cells in the array.
Summarize by instance.
AFL% summarize(temp, by_instance:true);
The output is:{inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes {0,0} 'all',3000000,51000816,12,1000000,1e+06,1000000,48,4.25007e+06,9000072 {1,0} 'all',3000000,51000816,12,1000000,1e+06,1000000,48,4.25007e+06,9000072 {2,0} 'all',2000000,34000544,8,1000000,1e+06,1000000,48,4.25007e+06,9000072 {3,0} 'all',2000000,34000544,8,1000000,1e+06,1000000,48,4.25007e+06,9000072
Summarize by attribute.
AFL% summarize(temp, by_attribute:true);
The output is:{inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes {0,0} 'val',10000000,80000720,10,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {0,1} 'val2',10000000,90000720,10,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {0,2} 'val3',10000000,800,10,1000000,1e+06,1000000,80,80,80 {0,3} 'EmptyTag',10000000,480,10,1000000,1e+06,1000000,48,48,48
Summarize by both simultaneously.
summarize(temp, by_attribute:true, by_instance=true);
The output is:{inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes {0,0} 'val',3000000,24000216,3,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {0,1} 'val2',3000000,27000216,3,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {0,2} 'val3',3000000,240,3,1000000,1e+06,1000000,80,80,80 {0,3} 'EmptyTag',3000000,144,3,1000000,1e+06,1000000,48,48,48 {1,0} 'val',3000000,24000216,3,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {1,1} 'val2',3000000,27000216,3,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {1,2} 'val3',3000000,240,3,1000000,1e+06,1000000,80,80,80 {1,3} 'EmptyTag',3000000,144,3,1000000,1e+06,1000000,48,48,48 {2,0} 'val',2000000,16000144,2,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {2,1} 'val2',2000000,18000144,2,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {2,2} 'val3',2000000,160,2,1000000,1e+06,1000000,80,80,80 {2,3} 'EmptyTag',2000000,96,2,1000000,1e+06,1000000,48,48,48 {3,0} 'val',2000000,16000144,2,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {3,1} 'val2',2000000,18000144,2,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {3,2} 'val3',2000000,160,2,1000000,1e+06,1000000,80,80,80 {3,3} 'EmptyTag',2000000,96,2,1000000,1e+06,1000000,48,48,48
Remove the array:
AFL% remove(temp);