/
SciDB Database Arrays

SciDB Database Arrays

SciDB databases are organized into arrays containing:

  • A name. Each array in a SciDB database has an identifier that distinguishes it from all other arrays in the same database.
  • A schema, which is the array structure. The schema contains array attributes and dimensions.
    • Each attribute contains data being stored in the cells of the array. A cell can contain multiple attributes.
    • Each dimension consists of a list of index values. At the most basic level, the dimension of an array is represented using 64-bit signed integers. The number of index values in a dimension is referred to as the size of the dimension.
  • A distribution. The array distribution determines where in the SciDB cluster the chunks of data are located.  For details see Array Distribution.
  • compression algorithm for the empty tag attribute.  Valid values are 'zlib' and 'bzlib'.  The default is no compression.  Most users should stay with the default value.

Creating Arrays 

You create SciDB arrays and data frames with the CREATE ARRAY statement, available in both AFL and AQL. The CREATE ARRAY syntax is as follows:

create_array_statement ::= CREATE [ TEMP ] ARRAY
                                            new_array_name
                                            schema
                                            [ DISTRIBUTION <distribution> ]

create_array_statement ::= CREATE [ TEMP ] ARRAY
                                            new_array_name
                                            schema
                                            DISTRIBUTION <distribution>
                                            [ EMPTYTAG compression 'zlib' ]

schema                 ::= < attributes >  [ \[ dimensions \] ]

The keywords CREATE, ARRAY, TEMP, and DISTRIBUTION are allowed in both AFL and AQL.  They need not be in all caps.

Square brackets [ ] surround optional elements.  Backquoted square brackets \[ \] are literal square brackets.

Temporary Arrays

Temporary arrays can improve performance but they do not offer the transactional guarantees of persistent arrays. Temporary arrays are like other arrays in that they are only visible to SciDB users who have permission to see them and arrays remain available until they are deleted. Temporary arrays are not persistent, that is, they do not save to disk. As such, temporary arrays become corrupted if a SciDB instance fails. When a SciDB cluster restarts, all temporary arrays are marked as unavailable (but not deleted; you must delete them explicitly). In addition, temporary arrays do not have versions. Any update to a temporary array overwrites existing attribute values.

Use a temporary array when you are willing to sacrifice "Atomicity, Consistency, Isolation, and Durability" (ACID) guarantees for speed. For example, let's say you are using SciDB to multiply two matrices from within a Python program, and you are sending the operand matrices to SciDB through SciDB-Py. You want the resulting matrix product sent back to the Python program, but you don't need the matrix product persisted in the SciDB database. In this case, a temporary array is appropriate. Similarly, use temporary arrays when performing iterative algorithms whose intermediate results are arrays.

Creating Temporary Arrays 

To create a temporary array, use the optional TEMP keyword with the CREATE ARRAY statement syntax shown above.

Example:

 $ iquery -aq "create temp array tempArray <a:int32>[i=1:10]"

Data Frames

Data frames are SciDB arrays whose dimensions do not have to be specified.  They are similar to relational tables.  Data frames are a kind of SciDB array, but SciDB manages their dimensions implicitly and does not  display the internal dimension coordinates.  Think of a data frame as an unordered collection of SciDB cells.

Since data frames are a kind of SciDB array, most remarks in this documentation that refer to arrays also apply to data frames.

SciDB data frames can be used nearly anywhere a SciDB array can be used.  They are primarily a notational convenience, simplifying import of linear data (such as from a CSV file) that will later be reshaped into an array.

Since data frame cells are unordered, operators that depend on cell position, such as between or slice, won't work with data frames.

Creating Data Frames

To create a data frame, use the same syntax for creating an array or temporary array, but omit the dimension portion of the schema.

Example:

$ iquery -a
AFL% create array dataFrame <a:int32>;
Query was executed successfully
AFL% create temp array tempDataFrame <value:double>;
Query was executed successfully
AFL% 

You can create a data frame from an array using the flatten operator, and add cells to it using the append operator.

Related content

Array Attributes
More like this
The iquery Client
The iquery Client
Read with this
SciDB Database Arrays
SciDB Database Arrays
More like this
19.11 Release Notes
19.11 Release Notes
Read with this
create_array
More like this
Pre-Installation Tasks
Pre-Installation Tasks
Read with this