summarize
The summarize operator quickly computes chunk density, size and skew statistics over an array.
Synopsis
summarize( array [, by_instance:true/false] [, by_attribute:true/false] );
The two optional boolean parameters default to false.
Summary
summarize quickly returns several useful metrics with regard to an array. In addition, these metrics may be broken out by instance or attribute or both by specifying the boolean parameters, by_instance and by_attribute.
The metrics returned are described as follows:
inst (dimension): the logical ID of the instance returning the data
attid (dimension): the attribute ID or 0 when returning totals across attributes
att: the string attribute name or 'all'
count: the total count of non-empty cells - equal across multiple attributes
bytes: the total number of used bytes
chunks: the total number of chunks - added up across multiple attributes
min/avg/max_count: statistics about the number of non-empty cells in each chunk
min/avg/max_bytes: statistics about the sizes of chunks
Examples
Using the Operator
To demonstrate summarize operator, do the following:
Create an array with random values, two randomly distributed text values, and an attribute filled with zeros:
AFL% store(apply(build(<val:double> [x=1:10000000,1000000,0], random()), val2, iif(x%2=0, 'abc','def'), val3, 0), temp)Summarize the array.
AFL% "summarize(temp)"The output is:
{inst,attid} att, count, bytes, chunks, min_count, avg_count, max_count, min_bytes, avg_bytes, max_bytes {0,0} 'all', 10000000, 170002720, 40, 1000000, 1e+06, 1000000, 48, 4.25007e+06, 9000072The output shows the totals and averages across all the cells in the array.
Summarize by instance.
AFL% summarize(temp, by_instance:true);
The output is:{inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes {0,0} 'all',3000000,51000816,12,1000000,1e+06,1000000,48,4.25007e+06,9000072 {1,0} 'all',3000000,51000816,12,1000000,1e+06,1000000,48,4.25007e+06,9000072 {2,0} 'all',2000000,34000544,8,1000000,1e+06,1000000,48,4.25007e+06,9000072 {3,0} 'all',2000000,34000544,8,1000000,1e+06,1000000,48,4.25007e+06,9000072Summarize by attribute.
AFL% summarize(temp, by_attribute:true);
The output is:{inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes {0,0} 'val',10000000,80000720,10,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {0,1} 'val2',10000000,90000720,10,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {0,2} 'val3',10000000,800,10,1000000,1e+06,1000000,80,80,80 {0,3} 'EmptyTag',10000000,480,10,1000000,1e+06,1000000,48,48,48Summarize by both simultaneously.
summarize(temp, by_attribute:true, by_instance=true);
The output is:{inst,attid} att,count,bytes,chunks,min_count,avg_count,max_count,min_bytes,avg_bytes,max_bytes {0,0} 'val',3000000,24000216,3,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {0,1} 'val2',3000000,27000216,3,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {0,2} 'val3',3000000,240,3,1000000,1e+06,1000000,80,80,80 {0,3} 'EmptyTag',3000000,144,3,1000000,1e+06,1000000,48,48,48 {1,0} 'val',3000000,24000216,3,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {1,1} 'val2',3000000,27000216,3,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {1,2} 'val3',3000000,240,3,1000000,1e+06,1000000,80,80,80 {1,3} 'EmptyTag',3000000,144,3,1000000,1e+06,1000000,48,48,48 {2,0} 'val',2000000,16000144,2,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {2,1} 'val2',2000000,18000144,2,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {2,2} 'val3',2000000,160,2,1000000,1e+06,1000000,80,80,80 {2,3} 'EmptyTag',2000000,96,2,1000000,1e+06,1000000,48,48,48 {3,0} 'val',2000000,16000144,2,1000000,1e+06,1000000,8000072,8.00007e+06,8000072 {3,1} 'val2',2000000,18000144,2,1000000,1e+06,1000000,9000072,9.00007e+06,9000072 {3,2} 'val3',2000000,160,2,1000000,1e+06,1000000,80,80,80 {3,3} 'EmptyTag',2000000,96,2,1000000,1e+06,1000000,48,48,48Remove the array:
AFL% remove(temp);