Array Attributes

SciDB arrays require at least one attribute. The attributes of the array are used to store individual data values in array cells.

Each attribute consists of:

  • A name.  The maximum length of an array name is 1024 bytes. Attribute names can contain only the alphanumeric characters and underscores (_), and cannot begin with a numeric character.
  • The data type. One of the data types supported by SciDB (can also be a user-defined type). Use the list('types') command to see the list of available data types.
  • Nullability (Optional). Specify 'NULL' or 'NOT NULL', to indicate whether null values are allowed for the attribute. The default is 'NULL', meaning null values are allowed. To turn a nullable attribute to a non-nullable attribute, use the Substitute Operator.
  • A default value (Optional): Specify the value to automatically substitute when you do not explicitly supply a value. If unspecified, the system chooses a value to substitute using the following rules:
    • If the attribute is nullable, use null.
    • Otherwise, use 0 for numeric types, or an empty string "" for string type.
  • A compression type (Optional): Specify either of two widely-used compression techniques: zlib or bzlib. In general, bzlib yields more densely compressed data but consumes more CPU time than zlib. Results depend heavily on the nature of the data you're compressing.

Specifying Attributes

The CREATE ARRAY statement includes a list of attributes, whose syntax is as follows:

attributes                 ::= attribute [, attributes ]
attribute                  ::= attribute_name : attribute_type 
                             [ nullable ] [ default ] [ compression ]
nullable                   ::= NULL | NOT NULL
default                    ::= DEFAULT default_value
compression                ::= COMPRESSION compression-type
compression-type           ::= 'zlib' | 'bzlib'

Examples

SciDB provides functionality to work with missing data. This functionality includes special handling for empty cells, null values, and default values.

Suppose the content of /tmp/m2x2.scidb is:

[
[ (1,1), (   ) ],
[ ( ,2), (3, ) ]
]


The cell at {0,1} is empty, that is, it does not exist at all. The cells at {1,0} and {1,1} are not empty, but each has a missing field, thus each receives a default value. Note the difference in default values: val1 receives a default value of 0 because it is not nullable, and val2 receives a default value of null because it is nullable.

AFL% input(<val1:double NOT NULL,val2:double>[x=0:1; y=0:1], '/tmp/m2x2.scidb');


The output is:

{x,y} val1,val2
{0,0} 1,1
{1,0} 0,2
{1,1} 3,null

 

If you specify default values, SciDB uses your values:

AFL% input(<val1:double NOT NULL DEFAULT double(9999),val2:double DEFAULT missing(30)>[x=0:1; y=0:1], '/tmp/m2x2.scidb');


The output is:

{x,y} val1,val2
{0,0} 1,1
{1,0} 9999,2
{1,1} 3,?30


In the above example, a function missing() generates a null value with a missing reason. Such a null value appears as a question mark followed by the missing reason.

The related function, missing_reason() takes a null value and returns its integer missing reason:

AFL% apply(build(<v:double>[i=0:0], missing(30)), reason, missing_reason(v));


The output is:

{i} v,reason
{0} ?30,30