cross_join

 The cross_join operator provides a cross-product join with equality predicates.

Synopsis

cross_join(left_array [ as left-alias], right_array [ as right-alias ], [left-alias.]left_dim1, [right-alias.]right_dim1,...);

Summary

The cross_join operator calculates the cross product join of two arrays, A (with dimensions a1 ... am) and B (with dimensions b1 ... bn) while applying equality predicates to pairs of dimensions, one from each input.

Assume p such predicates in the cross_join specifying equality of p pairs of dimensions.  Then the result of a cross_join is an m+n-p dimensional array in which each of the m dimensions of A is present, as well as each of the n-p dimensions of B which are not named in the p equality predicates.  For each cell in the output array, it is possible to identify two source cells, one in each of the input arrays.  The position of the source cell in the left input array is given by the values of the output cell position for each of the m dimensions from A.  The position of the source cell in the right input array is given by the values of the output cell position for each of the n-p dimensions from B, as well as the p dimensions of A that were named as equal to p dimensions in B.  The contents of the output cell is the concatenation of the attributes of the two source cells.

For example, for a 2-dimensional array A with dimensions i, j, and a 1-dimensional array B with dimension k, cross_join(A, B, j, k) results in a 2-dimensional array with coordinates {i, j} in which the cell at coordinate position {i, j} of the output is computed as the concatenation of cells {i, j} of A with cell at coordinate {k=j} of B.

If the join dimensions are different lengths, the cross-join returns the smaller dimension for the join points.

The space and time efficiency of the cross_join operator is sensitive to the order of arguments: always pass the larger array as the first argument.

If right_array has replicated distribution, cross_join can be faster.

Note the following:

  • cross_join performs an inner join on selected dimensions.
  • SciDB Dimensions do not match by order (as in join) or by name, but by pairings that you explicitly provide.
  • Array operands can have unmatched dimensions.
  • Matching dimensions must have the same:
    • Chunk overlap
    • Starting dimension value
    • Chunk sizes may differ, and if so, SciDB reconciles them by inserting a repart operator into the query plan.  This has a performance cost.

Examples

To return the cross-join of a 3×3 array with a vector of length 3, do the following:

  1. Create an array called left_array:

    AFL% CREATE ARRAY left_array<val:double>[i=0:2; j=0:2]; 
  2. Store values of 0–8 into left array:

    AFL% store(build(left_array,i*3+j),left_array); 


    The output is:

    [(0),(1),(2)],[(3),(4),(5)],[(6),(7),(8)] 
  3. Create an array called right_array:

    AFL% CREATE ARRAY right_array<val:double>[k=0:5];  
  4. Store values of 101–106 into right_array:

    AFL% store(build(right_array,k+101),right_array);  


    The output is:

    [(101),(102),(103),(104),(105),(106)]  


  5. Perform a cross-join on left_array and right_array along dimension j of left_array:

    AFL% cross_join(left_array,right_array,j,k);  

     
    The output is:

    [
    [(0,101),(1,102),(2,103)],
    [(3,101),(4,102),(5,103)],
    [(6,101),(7,102),(8,103)]
    ]  

    Finally some clean-up

    AFL% remove(left_array);
    AFL% remove(right_array);

Example showing cross_join on multiple dimensions

  1. Create an array called left_array with three dimensions:

    AFL% CREATE ARRAY left_array<val1:double>[i=0:1; j=0:2; k=0:1]; 
  2. Store some random values into left_array:

    AFL% store(build(left_array,random()),left_array); 


    The output is:

    {i,j,k} val1
    {0,0,0} 1.06335e+09
    {0,0,1} 2.42378e+08
    {0,1,0} 2.98593e+08
    {0,1,1} 7.7831e+08
    {0,2,0} 9.85542e+08
    {0,2,1} 1.88051e+09
    {1,0,0} 1.31616e+09
    {1,0,1} 4.75694e+08
    {1,1,0} 2.02149e+08
    {1,1,1} 4.67378e+08
    {1,2,0} 6.27106e+08
    {1,2,1} 3.9883e+07
  3. Create an array called right_array with two dimensions:

    AFL% CREATE ARRAY right_array<val:double>[dim1=0:0; dim2=0:3];
  4. Store some values into right_array:

    AFL% store(build(right_array, dim1*100+dim2), right_array);


    The output is:

    {dim1,dim2} val
    {0,0} 0
    {0,1} 1
    {0,2} 2
    {0,3} 3


  5. Perform a cross-join on left_array and right_array along dimensions j, k of left_array and dimensions dim1, dim2 of right_array:

    AFL% cross_join(left_array as A,right_array as B, A.j, B.dim1, A.k, B.dim2);

     
    The output is:

    {i,j,k} val,val
    {0,0,0} 1.06335e+09,0
    {0,0,1} 2.42378e+08,1
    {1,0,0} 1.31616e+09,0
    {1,0,1} 4.75694e+08,1

    Note how dimension names of right_array are not carried over to the final result. 
     

  6. Also you can vary the order of dimensions in the cross_join operation e.g. perform a cross_join on left_array and right_array along dimensions i, k of left_array and dimensions dim1, dim2 of right_array:

    AFL% cross_join(left_array as A,right_array as B, A.i, B.dim1, A.k, B.dim2);

    The output is:

    {i,j,k} val,val
    {0,0,0} 1.06335e+09,0
    {0,0,1} 2.42378e+08,1
    {0,1,0} 2.98593e+08,0
    {0,1,1} 7.7831e+08,1
    {0,2,0} 9.85542e+08,0
    {0,2,1} 1.88051e+09,1
  7. Remove the arrays.

    AFL% remove(left_array);
    AFL% remove(right_array);