Binary Files

SciDB binary files represent a 1-dimensional SciDB array. The 1-dimensional array is dense. It has no empty cells (though it can have null values for nullable attributes). The binary file represents each cell of the 1-dimensional array in turn.  Within each cell, the file represents each attribute in turn.

Binary File Format

Several SciDB operators and system macros – load(), input(), and save() – can ingest or produce files that conform to the SciDB binary file format. The rules for binary formatting of a SciDB attribute are:

  • Attributes in a binary file appear in the same left-to-right order as the attributes of the corresponding array's schema.
  • A fixed-size attribute of length n is represented by n bytes (in little-endian order).
  • A variable-size attribute of length n is represented by a four-byte unsigned integer of value n, followed by the n data bytes of the attribute. String attributes include the terminating NUL character (ASCII 0x00) in both the length count and the data bytes.
  • Whether fixed or variable length, a nullable attribute is preceded by a single byte. If a particular attribute value is null, the prefix byte contains the "missing reason code", a value between 0 and 127 inclusive. If not null, the prefix byte must contain 0xFF (-1).
  • Even if a nullable attribute value is in fact null, the prefix byte is still followed by a representation of the attribute: n zero bytes for fixed-length attributes of size n, or four zero bytes for variable-length attributes (representing a length count of zero).
  • Binary data does not contain attribute separators or cell separators.

Sample Array and Corresponding Binary File

The first diagram shows a very simple array: one dimension, four attributes, and only two cells:

Here is the AFL statement that created the array.  It shows which attributes disallow null values.

AFL% create array flatArray <A:int8 NOT NULL,B:int16,C:string,D:string NOT NULL>[row=0:1];


This diagram shows the layout of this array within the corresponding binary load file:

The layout reveals the following characteristics of a binary load file:

  • Each cell of the array is represented in contiguous bytes.
  • Remember, some programs that create binary files pad certain values so they align on word boundaries. This figure does not show such values. Use the SKIP keyword to skip over such padding. For more information, see The SKIP Directive in Binary Format Strings.
  • There are no end-of-cell delimiters. The first byte of the representation of the first attribute value of cell n begins immediately after the last byte of the last attribute of cell n-1.
  • A fixed-length data type that allows null values always consumes one more byte than the data type requires, regardless of whether the value is null or non-null. For example, an int8 requires two bytes and an int64 requires nine bytes. (In the figure, see bytes 2-4 or 18-20.)
  • A fixed-length data type that disallows null values always consumes exactly as many bytes as that data type requires. For example, an int8 consumes 1 byte and an int64 consumes 8 bytes. (See byte 1 or 17.)
  • A string data type that disallows nulls is always preceded by four bytes indicating the string length. (See bytes 10-13 or 28-31.)
  • A string data type that allows nulls is always preceded by five bytes: a null byte indicating whether a value is present and four bytes indicating the string length. For values that are null, the string length is zero. (See bytes 5-9 or 21-25.)
  • The length of a null string is recorded as zero. (See bytes 5-9.)
  • Every non-null string value is terminated by the NUL character. (See bytes 16, 27, and 35.) Even a zero-length string includes this character.
  • The length of a non-null string value includes the terminating NUL character. (See bytes 10-13, 22-25, and 28-31.) Consequently, the length of a zero-length string is recorded as 1.
  • If a nullable attribute contains a non-null value, the preceding null byte is -1. (See byte 2 or 21.)
  • If a nullable attribute contains a null value, the preceding null byte contains the missing reason code, which must be between 0 and 127 inclusive. (See byte 5 or 18.)
  • The file does not contain index values for the dimension of the array that the LOAD command populates. The command reads the file sequentially and creates the cells of the array accordingly. The first cell is assigned the first index value of the dimension, and each successive cell receives the next index value.


SciDB assumes storage for a given type is in the x86_64 little-endian format.

All values in the file must conform to SciDB-recognized data types. This includes native types, types defined in SciDB extensions, and user-defined types. To list the types recognized by your SciDB installation, run the following query:

AFL% list('types');