index_lookup

The index_lookup operator uses an attribute from input array A to look up the coordinate value of that attribute in the index array B.  It applies a new attribute containing the result of the lookup.

Synopsis

index_lookup (input_array, index_array, input_array.attribute_name
              [,output_attribute_name] [,index_sorted: BOOLEAN ]
              [,memory_limit: MEMORY_LIMIT] );

Summary

The required parameters for the index_lookup operator are:

  • The input_array can have any attributes or dimensions.
  • The index_array must have a single dimension and a single non-nullable attribute. The index array data must contain unique values. Empty cells are permitted, either between populated cells, or beyond all populated cells.
  • The input attribute, input_array.attribute_name, must correctly refer to one of the attributes of the input array (the attribute to look up). This attribute must have the same data type as the sole attribute of the index array. The comparison function "<" (less than) must be registered in SciDB for this data type.

The optional parameters are:

  • Use output_attribute_name to provide a name for the output attribute. The index_lookup operator returns all of the attributes in the input_array, followed by the newly created output attribute. The new attribute is named input_attribute_name_index by default, or takes the optional, provided name.
  • The index_sorted named parameter is FALSE by default.  That is, the index array is assumed to be unsorted, and index_lookup will perform an internal sort.  But if the index array is known to be already sorted and dense, then set the index_sorted parameter to TRUE, and index_lookup will forego the internal sort. ("Dense" means that there are no empty cells between the populated cells of the index array.)
  • Set the memory_limit named parameter to increase or decrease the size of the memory cache.  The value you give is the number of mebibytes for the cache.  The index_lookup operator uses memory to cache a part of the index_array for fast lookup of values. By default, the size of this cache is set to MEM_ARRAY_THRESHOLD mebibytes. This is in addition to the memory already consumed by cached MemArrays as the query is running.  If specified, MEMORY_LIMIT must be greater than or equal to 1.

If specified, the output attribute must:

  • Have data type int64 and be nullable.
  • Contain the respective coordinates of the corresponding input attribute in the index_array.
  • If the corresponding input attribute is null, or if no value for the input attribute exists in the index_array, the output attribute at that position is set to null.

If index_array has replicated distribution, index_lookup can be faster.


Examples

Setup an Example Input Array and Index Array

To setup an example Input Array and Index Array, do the following:

  1. Create a simple example of an array containing information about stock trades by entering:

    AFL% store(
                       join(
                             join(
                                   build(
                                           <symbol:string>[i=0:9],
                                           '[BAC, BAC, AAPL, C, BAC, ZNGA, C, MSFT, FB, JPM]',
                                           true),
                                   build(
                                           <ms:int64>[i=0:9],
                                           '[ 34665774, 36774769, 56512800, 55403661, 56395968,
                                                   30741156, 56377439, 40979647, 40515039, 39816561]',
                                           true)),
                             join(
                                   build(
                                           <volume:uint64>[i=0:9],
                                           '[900, 11300, 100, 100, 100, 500, 200, 100, 100, 100]',
                                           true),
                                   build(
                                           <price:double>[i=0:9],
                                           '[ 12.7, 19.7, 438.7, 46.5, 18.6,
                                           7.1, 44.85, 36.65, 27.9, 55.5]',
                                           true))),
                            trades);


    The output is:

    {i} symbol,ms,volume,price
    {0} 'BAC',34665774,900,12.7
    {1} 'BAC',36774769,11300,19.7
    {2} 'AAPL',56512800,100,438.7
    {3} 'C',55403661,100,46.5
    {4} 'BAC',56395968,100,18.6
    {5} 'ZNGA',30741156,500,7.1
    {6} 'C',56377439,200,44.85
    {7} 'MSFT',40979647,100,36.65
    {8} 'FB',40515039,100,27.9
    {9} 'JPM',39816561,100,55.5 
  2. Use the uniq() operator to create an index of the Stock symbols by entering:

    AFL% store(uniq(sort(project(trades,symbol))),stock_symbols); 


    The output is:

    {i} symbol
    {0} 'AAPL'
    {1} 'BAC'
    {2} 'C'
    {3} 'FB'
    {4} 'JPM'
    {5} 'MSFT'
    {6} 'ZNGA' 

Return the Index Value for the Stock Symbol Attribute

To return the index value for the stock symbol attribute of a small array populated with stock trading data, use index_lookup() to return the index value for each row of the trades array.  Note the new final attribute named "index" in the output array.


AFL% index_lookup(trades, stock_symbols, trades.symbol, index); 


The output is:

{i} symbol,ms,volume,price,index
{0} 'BAC',34665774,900,12.7,1
{1} 'BAC',36774769,11300,19.7,1
{2} 'AAPL',56512800,100,438.7,0
{3} 'C',55403661,100,46.5,2
{4} 'BAC',56395968,100,18.6,1
{5} 'ZNGA',30741156,500,7.1,6
{6} 'C',56377439,200,44.85,2
{7} 'MSFT',40979647,100,36.65,5
{8} 'FB',40515039,100,27.9,3
{9} 'JPM',39816561,100,55.5,4 


Â