...
By default, both of these are set to false. Setting both left_outer:true
and right_outer:true
will result in a full outer join.
Output names
If desired, user can set a list of output names to disambiguate:
out_names:(a,b,c,...)
The number of provided tokens must match the number of attributes in the output (num left attrs + num right attrs - num join keys). By default, the names are copied from the input arrays, left array taking precedence for join keys.
Additional filter on the output:
...
Note, builtin_equi_join(..., 'filter:expression')
is equivalent to filter(builtin_equi_join(...), expression)
except the operator is materializing and the former will apply filtering prior to materialization. This is an efficiency improvement in cases where the join on keys increases the size of the data before filtering. If out_names:
is set, then the expression will refer to the names provided in out_names
.
Other settings:
chunk_size:S
: for the outputkeep_dimensions:false/true
:true
if the output should contain all the input dimensions, converted to attributes. 0 is default, meaning dimensions are only retained if they are join keys.hash_join_threshold:MB
: a threshold on the array size used to choose the algorithm; see next section for details; defaults to themerge-sort-buffer
configbloom_filter_size:bits
: the size of the bloom filters to use, in units of bits; TBD: clean this upalgorithm:name
: a hard override on how to perform the join, currently supported values are below; see next section for detailshash_replicate_left
: copy the entire left array to every instance and perform a hash joinhash_replicate_right
: copy the entire right array to every instance and perform a hash joinmerge_left_first
: redistribute the left array by hash first, then perform either merge or hash joinmerge_right_first
: redistribute the right array by hash first, then perform either merge or hash join
...