flatten
The flatten operator can convert arrays to data frames, or data frames to one-dimensional arrays.
Synopsis
flatten( source [, dimension_name ] [, cells_per_chunk: N ] )
Inputs
- source: An existing array or data frame.
- dimension_name: Name of the single dimension used in the output array, when flatten is used to produce an array from a data frame.
- cells_per_chunk: N This keyword parameter specifies the chunk size of the resulting array or data frame.
Summary
Converting Arrays to Data Frames
To convert an array of any number of dimensions to a data frame, use
flatten(ARRAY)
The resulting data frame will have the same cells as the source array, except that each cell will have additional int64 not null
attributes, one for each input array dimension, recording the position of the cell in the original input array. These attributes will come first in the attribute list, just as they do for the unpack operator.
SciDB will automatically choose a chunk size for the data frame result, but you can specify one using the cells_per_chunk:
keyword parameter.
flatten(ARRAY, cells_per_chunk: 5000000)
You can also use this form of the operator to change the chunk size of an existing data frame. When you provide a data frame as input, no additional cell position attributes are produced.
Converting Data Frames to Arrays
To convert a data frame to a one-dimensional array, specify a dimension name in the parameter list.
flatten(DATAFRAME, seqno)
The resulting array will have a single seqno=0:*
dimension. This is essentially the behaviour of the unpack operator, but unlike unpack, no coordinate values are added as attributes in the resulting 1-D array, because data frame cells are unordered and do not have coordinates.
You can use the cells_per_chunk:
keyword parameter to dictate the chunk size of the result array.
You can also convert a data frame to an array using redimension, if you provide a target schema with dimensions derived from the attributes of the input data frame.
Examples
Two-dimensional array to data frame
Create a two-dimensional array from a small data file:
$ cat /tmp/lines.tsv -1 3 The caged 0 5 bird sings 0 6 with a 2 0 fearful trill 4 -2 of things 4 2 unknown 5 1 but longed 6 0 for still $ iquery -a AFL% create array A <val:string> [i=-10:10:0:20; j=-10:10:0:20]; Query was executed successfully AFL% store(redimension(input(<i:int64, j:int64, val:string>, '/tmp/lines.tsv', format:'tsv'), A), A); Query was executed successfully AFL% AFL% scan(A); {i,j} val {-1,3} 'The caged' {0,5} 'bird sings' {0,6} 'with a' {2,0} 'fearful trill' {4,-2} 'of things' {4,2} 'unknown' {5,1} 'but longed' {6,0} 'for still' AFL%
Flatten the array, and store it as a data frame.
AFL% store(flatten(A), DF); Query was executed successfully AFL%
In the result data frame, the old cell positions are now attributes, not dimension coordinates. (Coordinates in curly braces are not displayed when you scan a data frame.)
AFL% scan(DF); i,j,val -1,3,'The caged' 0,5,'bird sings' 0,6,'with a' 2,0,'fearful trill' 4,-2,'of things' 4,2,'unknown' 5,1,'but longed' 6,0,'for still' AFL%
The list operator shows the schemas of the array input and data frame result. Notice that the data frame does not have a dimension specification.
AFL% list(); {No} name,uaid,aid,schema,availability,temporary,namespace,distribution {0} 'A',21,22,'A<val:string> [i=-10:10:0:20; j=-10:10:0:20]',true,false,'public','hashed' {1} 'DF',23,24,'DF<i:int64 NOT NULL,j:int64 NOT NULL,val:string>',true,false,'public','dataframe' AFL%
Data frame to one-dimensional array
Using the data frame created above, convert it to a one-dimensional array by using flatten with a dimension name parameter.
AFL% flatten(DF, row); {row} i,j,val {0} -1,3,'The caged' {1} 0,5,'bird sings' {2} 0,6,'with a' {3} 2,0,'fearful trill' {4} 4,-2,'of things' {5} 4,2,'unknown' {6} 5,1,'but longed' {7} 6,0,'for still' AFL%
Do the same conversion, but specify a chunk size, and store the result.
AFL% list(); {No} name,uaid,aid,schema,availability,temporary,namespace,distribution {0} 'A',21,22,'A<val:string> [i=-10:10:0:20; j=-10:10:0:20]',true,false,'public','hashed' {1} 'B',25,26,'B<i:int64 NOT NULL,j:int64 NOT NULL,val:string> [row=0:*:0:4]',true,false,'public','hashed' {2} 'DF',23,24,'DF<i:int64 NOT NULL,j:int64 NOT NULL,val:string>',true,false,'public','dataframe' AFL% AFL% scan(B); {row} i,j,val {0} -1,3,'The caged' {1} 0,5,'bird sings' {2} 0,6,'with a' {3} 2,0,'fearful trill' {4} 4,-2,'of things' {5} 4,2,'unknown' {6} 5,1,'but longed' {7} 6,0,'for still' AFL%
Clean up the example arrays.
AFL% remove(A); remove(B); remove(DF);