Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

By default, both of these are set to false. Setting both left_outer:true and right_outer:true will result in a full outer join.

Output names

If desired, user can set a list of output names to disambiguate:

  • out_names:(a,b,c,...)
    The number of provided tokens must match the number of attributes in the output (num left attrs + num right attrs - num join keys). By default, the names are copied from the input arrays, left array taking precedence for join keys.

Additional filter on the output:

...

Note, builtin_equi_join(..., 'filter:expression') is equivalent to filter(builtin_equi_join(...), expression) except the operator is materializing and the former will apply filtering prior to materialization. This is an efficiency improvement in cases where the join on keys increases the size of the data before filtering. If out_names: is set, then the expression will refer to the names provided in out_names.

Other settings:

  • chunk_size:S: for the output

  • keep_dimensions:false/true: true if the output should contain all the input dimensions, converted to attributes. 0 is default, meaning dimensions are only retained if they are join keys.

  • hash_join_threshold:MB: a threshold on the array size used to choose the algorithm; see next section for details; defaults to the merge-sort-buffer config

  • bloom_filter_size:bits: the size of the bloom filters to use, in units of bits; TBD: clean this up

  • algorithm:name: a hard override on how to perform the join, currently supported values are below; see next section for details

    • hash_replicate_left: copy the entire left array to every instance and perform a hash join

    • hash_replicate_right: copy the entire right array to every instance and perform a hash join

    • merge_left_first: redistribute the left array by hash first, then perform either merge or hash join

    • merge_right_first: redistribute the right array by hash first, then perform either merge or hash join

...