SciDB Database Arrays
SciDB databases are organized into arrays containing:
- A name. Each array in a SciDB database has an identifier that distinguishes it from all other arrays in the same database. An array name may include alphanumeric characters and the underscore (_) character, but must begin with a non-numeric character.
- A schema, which is the array structure. The schema contains array attributes and dimensions.
- attributes give the names and types of the data stored in the cells of an array. A cell can contain multiple attributes.
- dimensions give the names and types of index values. At the most basic level, a dimension value of an array is represented using a 64-bit signed integer. The number of index values in a dimension is referred to as the size of the dimension.
- A distribution. The array distribution determines where in the SciDB cluster the chunks of data are located. For details see Array Distribution.
- A compression algorithm for the empty tag attribute. Valid values are 'zlib' and 'bzlib'. The default is no compression. Most users should stay with the default value.
Creating Arrays
You create SciDB arrays and data frames with the create_array operator in AFL. Its syntax is
create_array ( new_array_name , schema , isTempArray [ , distribution [ , emptyTagCompression ] ] ) new_array_name ::= identifier schema ::= < attributes > [ \[ dimensions \] ] isTempArray ::= true | false emptyTagCompression ::= 'zlib' | 'bzlib'
- For the syntax of attributes, above, see Array Attributes.
- For the syntax of dimensions, above, see Array Dimensions.
- For the syntax of distribution, above, see Array Distribution.
- See also the create_array operator.
Square brackets [ ]
surround optional elements. Backquoted square brackets \[ \]
are literal square brackets.
You may also use the CREATE ARRAY statement (in either AFL or AQL), with the following restrictions:
- Its use in AFL is deprecated.
The CREATE ARRAY statement syntax is as follows:
create_array_statement ::= CREATE [ TEMP ] ARRAY new_array_name schema [ DISTRIBUTION distribution ] create_array_statement ::= CREATE [ TEMP ] ARRAY new_array_name schema DISTRIBUTION distribution [ EMPTYTAG COMPRESSION 'zlib' ] schema ::= < attributes > [ \[ dimensions \] ]
- For the syntax of attributes, above, see Array Attributes.
- For the syntax of dimensions, above, see Array Dimensions.
- For the syntax of distribution, above, see Array Distribution.
- See also the create_array operator.
The keywords CREATE, ARRAY, TEMP, DISTRIBUTION, and EMPTYTAG COMPRESSION are allowed in both AFL and AQL. They need not be in all caps.
Square brackets [ ]
surround optional elements. Backquoted square brackets \[ \]
are literal square brackets.
Temporary Arrays
Temporary arrays can improve performance but they do not offer the transactional guarantees of persistent arrays. Temporary arrays are like other arrays in that they are only visible to SciDB users who have permission to see them and arrays remain available until they are deleted. Temporary arrays are not persistent, that is, they do not save to disk. As such, temporary arrays become corrupted if a SciDB instance fails. When a SciDB cluster restarts, all temporary arrays are marked as unavailable (but not deleted; you must delete them explicitly). In addition, temporary arrays do not have versions. Any update to a temporary array overwrites existing attribute values.
Use a temporary array when you are willing to sacrifice "Atomicity, Consistency, Isolation, and Durability" (ACID) guarantees for speed. For example, let's say you are using SciDB to multiply two matrices from within a Python program, and you are sending the operand matrices to SciDB through SciDB-Py. You want the resulting matrix product sent back to the Python program, but you don't need the matrix product persisted in the SciDB database. In this case, a temporary array is appropriate. Similarly, use temporary arrays when performing iterative algorithms whose intermediate results are arrays.
Creating Temporary Arrays
To create a temporary array, use the optional TEMP keyword with the CREATE ARRAY statement syntax shown above, or set the isTempArray argument of the create_array operator to true
.
Example:
$ iquery -aq "create temp array tempArray <a:int32>[i=1:10]"
Data Frames
Data frames are SciDB arrays whose dimensions do not have to be specified. They are similar to relational tables. Data frames are a kind of SciDB array, but SciDB manages their dimensions implicitly and does not display the internal dimension coordinates. Think of a data frame as an unordered collection of SciDB cells.
Since data frames are a kind of SciDB array, most remarks in this documentation that refer to arrays also apply to data frames.
SciDB data frames can be used nearly anywhere a SciDB array can be used. They are primarily a notational convenience, simplifying import of linear data (such as from a CSV file) that will later be reshaped into an array.
Since data frame cells are unordered, operators that depend on cell position, such as between or slice, won't work with data frames.
Creating Data Frames
To create a data frame, use the same syntax for creating an array or temporary array, but omit the dimension portion of the schema.
Example:
$ iquery -a AFL% create array dataFrame <a:int32>; Query was executed successfully AFL% create temp array tempDataFrame <value:double>; Query was executed successfully AFL%
You can create a data frame from an array using the flattenoperator, and add cells to it using the appendoperator.