Array Distribution
Specifying the Distribution
The CREATE ARRAY statement may be followed by an optional distribution clause. The syntax is:
distribution_phrase ::= [ DISTRIBUTION <distribution_name>]; distribution_name ::= 'hashed' | 'replicated' | 'row_cyclic' | 'col_cyclic' |
If no distribution phrase is supplied, the array will have hashed distribution.
Properties of Specific Distributions
- hashed – Each primary chunk is stored on a single instance. The instance is determined by hashing the chunk coordinates.
- replicated – Each primary chunk is duplicated onto all instances. This gives a speed advantage for operators that would otherwise have to replicate the data each time they use it, but the storage cost is higher. This is a classic time/space tradeoff. See operator documentation for details about the time advantage for particular operators. The operators that can take special advantage of the replicated distribution are:
- row_cyclic – Each primary chunk is stored on a single instance. Given a chunk at position {R, C, ...}, the instance is determined by the first coordinate of the chunk R, its corresponding dimension's low value Lr and chunk length Cr, and the number of instances, NI. The assigned instance is I = ((R - Lr) / Cr) % NI
- col_cyclic – Each primary chunk is stored on a single instance. Given a chunk at position {R, C, ...}, the instance is determined by the second coordinate of the chunk C, its corresponding dimension's low value Lc and chunk length Cc, and the number of instances, NI. The assigned instance is I = ((C - Lc) / Cc) % NI
- the spgemm operator can take advantage of col_cyclic.
Example
AFL% CREATE ARRAY A <v: int64 > [i=0:99] DISTRIBUTION replicated; |
creates an array named A where the chunks of A will exist on every instance in the cluster.