/
Array Distribution

Array Distribution

Specifying the Distribution

The CREATE ARRAY statement may be followed by an optional distribution clause. The syntax is:

distribution_phrase ::=  [ DISTRIBUTION  <distribution_name>];
distribution_name   ::=  'hashed' | 'replicated' | 'row_cyclic' | 'col_cyclic'

If no distribution phrase is supplied, the array will have hashed distribution.

Properties of Specific Distributions


  • hashed  Each primary chunk is stored on a single instance.  The instance is determined by hashing the chunk coordinates.
  • replicated  Each primary chunk is duplicated onto all instances.  This gives a speed advantage for operators that would otherwise have to replicate the data each time they use it, but the storage cost is higher.  This is a classic time/space tradeoff. See operator documentation for details about the time advantage for particular operators.  The operators that can take special advantage of the replicated distribution are:
  • row_cyclic – Each primary chunk is stored on a single instance.  The instance is determined by the first coordinate of the chunk, R, its low value, Lr, its chunk length, Cr, and the number of instances, NI. The assigned instance is I = ((R - Lr) / Cr) % NI
  • col_cyclic – Each primary chunk is stored on a single instance.  The instance is determined by the second coordinate of the chunk, C, its low value, Lc, its chunk length, Cc, and the number of instances, NI. The assigned instance is I = ((C - Lc) / Cc) % NI
    • the spgemm operator can take advantage of col_cyclic.

Example

AFL% CREATE ARRAY A <v: int64> [i=0:99] DISTRIBUTION replicated;

creates an array named A where the chunks of A will exist on every instance in the cluster.

Related content

Array Attributes
Read with this
SciDB Database Arrays
SciDB Database Arrays
Read with this