{"type":"doc","content":[{"type":"paragraph","content":[{"text":"The ","type":"text"},{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" operator selects cells of an input array according to coordinates specified by one or more secondary arrays, called ","type":"text"},{"text":"pick arrays","type":"text","marks":[{"type":"em"}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":1},"content":[{"text":"Synopsis","type":"text"}]},{"type":"paragraph","content":[{"text":"subarray ( INPUT, PICK_0 [ , PICK_n ]* [ , join:","type":"text","marks":[{"type":"strong"}]},{"text":"BOOLEAN","type":"text","marks":[{"type":"strong"},{"type":"em"}]},{"text":" ] [, strict:","type":"text","marks":[{"type":"strong"}]},{"text":"BOOLEAN","type":"text","marks":[{"type":"strong"},{"type":"em"}]},{"text":" ] [, inverse:","type":"text","marks":[{"type":"strong"}]},{"text":"BOOLEAN","type":"text","marks":[{"type":"strong"},{"type":"em"}]},{"text":" ]","type":"text","marks":[{"type":"strong"}]},{"type":"hardBreak"},{"text":" ","type":"text"},{"text":" [ , algorithm:","type":"text","marks":[{"type":"strong"}]},{"text":"STRING ","type":"text","marks":[{"type":"strong"},{"type":"em"}]},{"text":"] )","type":"text","marks":[{"type":"strong"}]}]},{"type":"heading","attrs":{"level":1},"content":[{"text":"Summary","type":"text"}]},{"type":"paragraph","content":[{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" produces a sparse result array with the same dimension specification as its input array (but without overlaps). Only input cells selected by the pick array(s) appear in the output.","type":"text"}]},{"type":"bulletList","content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"If the ","type":"text"},{"text":"join:true","type":"text","marks":[{"type":"code"}]},{"text":" option is present, the output cells will have additional attributes from the pick array cell(s) that selected them.","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"If the ","type":"text"},{"text":"strict:true","type":"text","marks":[{"type":"code"}]},{"text":" option is present:","type":"text"}]},{"type":"bulletList","content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"Out-of-bounds or null pick values will cause an error. By default, null or out-of-bounds picks are ignored.","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"If ","type":"text"},{"text":"join:true","type":"text","marks":[{"type":"code"}]},{"text":" is also set, an ambiguous joined attribute will cause an error. See ","type":"text"},{"text":"Duplicated Or Out-of-order Picks ","type":"text","marks":[{"type":"em"}]},{"text":"below.","type":"text"}]}]}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"If the ","type":"text"},{"text":"inverse:true","type":"text","marks":[{"type":"code"}]},{"text":" option is present, the result contains those input cells ","type":"text"},{"text":"not","type":"text","marks":[{"type":"em"}]},{"text":" selected by the pick array(s). If set, ","type":"text"},{"text":"join:true","type":"text","marks":[{"type":"code"}]},{"text":" is not allowed.","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"The ","type":"text"},{"text":"algorithm:","type":"text","marks":[{"type":"code"}]},{"text":" option forces use of a particular algorithm regardless of system configuration. Ordinarily you should not need to specify this option. See ","type":"text"},{"text":"Algorithm Configuration ","type":"text","marks":[{"type":"em"}]},{"text":"below.","type":"text"}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Pick Arrays","type":"text"}]},{"type":"paragraph","content":[{"text":"A pick array specifies input cell coordinates by:","type":"text"}]},{"type":"bulletList","content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"Having an ","type":"text"},{"text":"int64","type":"text","marks":[{"type":"code"}]},{"text":" attribute with the same name as a dimension of the input array. A list of coordinates is built from the (non-null) values of the attribute.","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"Having a dimension with the same name as a dimension of the input array. A list of coordinates is built from the locations of non-empty cells along that pick array dimension.","type":"text"}]}]}]},{"type":"paragraph","content":[{"text":"A particular input array dimension name may be matched by zero or one pick arrays. If the name appears in more than one pick array schema, an error occurs.","type":"text"}]},{"type":"paragraph","content":[{"text":"For example, given an input array with schema ","type":"text"},{"text":"A[row=0:*; col=0:*]","type":"text","marks":[{"type":"code"}]},{"text":", the pick array ","type":"text"},{"text":"P[i=0:*]","type":"text","marks":[{"type":"code"}]}]},{"type":"codeBlock","content":[{"text":"AFL% scan(P);\n{i} row\n{0} 5\n{1} 17","type":"text"}]},{"type":"paragraph","content":[{"text":"selects rows five and seventeen by matching an attribute name with one of the dimensions of the input array. Likewise, the sparse pick array ","type":"text"},{"text":"Q[row=0:*]","type":"text","marks":[{"type":"code"}]}]},{"type":"codeBlock","content":[{"text":"AFL% scan(Q);\n{row} s\n{5} 'Hello'\n{17} 'world'","type":"text"}]},{"type":"paragraph","content":[{"text":"selects the same rows by having non-empty cells along a matching dimension name.","type":"text"}]},{"type":"paragraph","content":[{"text":"Each pick array is processed in turn to build up internal lists of coordinates, called ","type":"text"},{"text":"picks","type":"text","marks":[{"type":"em"}]},{"text":", for each input array dimension. If no picks are given for a particular input array dimension, then all coordinates along that input dimension are treated as picked. An input array dimension with no picks is called a ","type":"text"},{"text":"wildcard dimension","type":"text","marks":[{"type":"em"}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Interpreting Picks: By Grid vs. By Cell","type":"text"}]},{"type":"paragraph","content":[{"text":"The ","type":"text"},{"text":"fields","type":"text","marks":[{"type":"em"}]},{"text":" (that is, the attributes and dimensions) of a pick array can match one or more dimensions of the input array. When each of the provided pick arrays matches only a single input array dimension, ","type":"text"},{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" selects input cells ","type":"text"},{"text":"by grid","type":"text","marks":[{"type":"em"}]},{"text":". If any of the pick arrays matches more than one input dimension, then ","type":"text"},{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" selects input cells ","type":"text"},{"text":"by cell position","type":"text","marks":[{"type":"em"}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Grid Selection","type":"text"}]},{"type":"paragraph","content":[{"text":"Selection by grid means that the positions of the selected input cells are elements of the grid formed by the Cartesian product of the per-dimension picks.","type":"text"}]},{"type":"paragraph","content":[{"text":"For example, consider the input array ","type":"text"},{"text":"A[row=0:*; col=0:*]","type":"text","marks":[{"type":"code"}]}]},{"type":"mediaSingle","attrs":{"layout":"center"},"content":[{"type":"media","attrs":{"width":1432,"id":"2054b554-322f-41cc-a16a-9836064a392b","collection":"contentId-3395881545","type":"file","height":281}}]},{"type":"paragraph","content":[{"text":"Given a pick array ","type":"text"},{"text":"R","type":"text","marks":[{"type":"em"}]},{"text":" for the rows","type":"text"}]},{"type":"codeBlock","content":[{"text":"AFL% show(R);\n{i} schema,distribution,etcomp\n{0} 'R [i=0:*:0:1000000]','hashed','none'\nAFL% scan(R);\n{i} row\n{0} 1\n{1} 2","type":"text"}]},{"type":"paragraph","content":[{"text":"and ","type":"text"},{"text":"C","type":"text","marks":[{"type":"em"}]},{"text":" for the columns","type":"text"}]},{"type":"codeBlock","content":[{"text":"AFL% show(C);\n{i} schema,distribution,etcomp\n{0} 'C [col=0:*:0:4]','hashed','none'\nAFL% scan(C);\n{col} s\n{0} 'AT'\n{2} 'GAT'\nAFL%","type":"text"}]},{"type":"paragraph","content":[{"text":"then ","type":"text"},{"text":"subarray(A, R, C)","type":"text","marks":[{"type":"em"}]},{"text":" uses ``grid selection'' to choose the input cells:","type":"text"}]},{"type":"mediaSingle","attrs":{"layout":"center"},"content":[{"type":"media","attrs":{"width":888,"id":"20a1a235-cbda-4226-8e4b-0caff719373b","collection":"contentId-3395881545","type":"file","height":619}}]},{"type":"paragraph","content":[{"text":"R","type":"text","marks":[{"type":"em"}]},{"text":" picks rows 1 and 2 by ","type":"text"},{"text":"row","type":"text","marks":[{"type":"code"}]},{"text":" attribute value. ","type":"text"},{"text":"C","type":"text","marks":[{"type":"em"}]},{"text":" picks columns 0 and 2 because cells are present at those coordinates along the ","type":"text"},{"text":"col","type":"text","marks":[{"type":"code"}]},{"text":" dimension. So ","type":"text"},{"text":"subarray(A, R, C)","type":"text","marks":[{"type":"em"}]},{"text":" yields","type":"text"}]},{"type":"mediaSingle","attrs":{"layout":"center"},"content":[{"type":"media","attrs":{"width":1432,"id":"49aa0d56-dfa8-483c-9839-1345d356802e","collection":"contentId-3395881545","type":"file","height":281}}]},{"type":"paragraph","content":[{"type":"hardBreak"},{"text":"where ε denotes an empty cell position.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Cell-wise Selection","type":"text"}]},{"type":"paragraph","content":[{"text":"When a pick array has fields matching more than one input array dimension, each cell of the pick array is interpreted as a fixed tuple of picks for those dimensions. These pick tuples don't define a grid, instead they enumerate particular full or partial coordinates of the desired input cells.","type":"text"}]},{"type":"paragraph","content":[{"text":"Consider the two pick arrays ","type":"text"},{"text":"R","type":"text","marks":[{"type":"em"}]},{"text":" and ","type":"text"},{"text":"C","type":"text","marks":[{"type":"em"}]},{"text":" from the previous example. Used independently in ","type":"text"},{"text":"subarray(A, R, C)","type":"text","marks":[{"type":"em"}]},{"text":" (or ","type":"text"},{"text":"subarray(A, C, R)","type":"text","marks":[{"type":"em"}]},{"text":"), they invoke grid selection using the Cartesian product of the picked indices:","type":"text"}]},{"type":"mediaSingle","attrs":{"layout":"center"},"content":[{"type":"media","attrs":{"width":1432,"id":"0947be10-b589-4d5a-b9b7-e68b0556a9fd","collection":"contentId-3395881545","type":"file","height":60}}]},{"type":"paragraph","content":[{"text":"However, if the picks are combined in the same array, they denote a set of input cell coordinates rather than a Cartesian product. Here the single array ","type":"text"},{"text":"RC","type":"text","marks":[{"type":"em"}]},{"text":" is ","type":"text"},{"text":"not","type":"text","marks":[{"type":"underline"}]},{"text":" equivalent to the two arrays ","type":"text"},{"text":"R","type":"text","marks":[{"type":"em"}]},{"text":" and ","type":"text"},{"text":"C","type":"text","marks":[{"type":"em"}]},{"text":" above:","type":"text"}]},{"type":"codeBlock","content":[{"text":"AFL% show(RC);\n{i} schema,distribution,etcomp\n{0} 'RC [col=0:*:0:4]','hashed','none'\nAFL% scan(RC);\n{col} s,row\n{0} 'AT',1\n{2} 'GAT',2\nAFL% subarray(A, RC);\n{row,col} v\n{1,0} 4\n{2,2} 10\nAFL%","type":"text"}]},{"type":"paragraph","content":[{"text":"Since the pick fields ","type":"text"},{"text":"row","type":"text","marks":[{"type":"code"}]},{"text":" and ","type":"text"},{"text":"col","type":"text","marks":[{"type":"code"}]},{"text":" of ","type":"text"},{"text":"RC","type":"text","marks":[{"type":"em"}]},{"text":" cover all of the dimensions of the input array, ","type":"text"},{"text":"RC","type":"text","marks":[{"type":"em"}]},{"text":" serves as an explicit list of input cells to be selected.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Hybrid Grid/Cell Selection","type":"text"}]},{"type":"paragraph","content":[{"text":"Pick fields within the same pick array may constitute a set of full coordinates, as in the previous section. They may also constitute a set of partial coordinates.","type":"text"}]},{"type":"paragraph","content":[{"text":"For example, given an input array ","type":"text"},{"text":"VARIANT","type":"text","marks":[{"type":"em"}]},{"text":" with schema ","type":"text"},{"text":"[varid; chrom; pos]","type":"text","marks":[{"type":"code"}]},{"text":", a pick array ","type":"text"},{"text":"CHROM_POS","type":"text","marks":[{"type":"em"}]},{"text":" with schema ","type":"text"},{"text":"[i]","type":"text","marks":[{"type":"code"}]},{"text":" can be used to select particular ","type":"text"},{"text":"(chrom, pos)","type":"text","marks":[{"type":"em"}]},{"text":" pairs, either across all variant ids (no other pick arrays specified, so ","type":"text"},{"text":"varid","type":"text","marks":[{"type":"code"}]},{"text":" is a wildcard dimension) or across a selection of ","type":"text"},{"text":"varid","type":"text","marks":[{"type":"code"}]},{"text":" values provided in another pick array.","type":"text"}]},{"type":"paragraph","content":[{"text":"This hybrid approach selects input array cells from the Cartesian product of the pick fields ","type":"text"},{"text":"as grouped by pick array","type":"text","marks":[{"type":"em"}]},{"text":". In this example the product is ","type":"text"}]},{"type":"mediaSingle","attrs":{"layout":"center"},"content":[{"type":"media","attrs":{"width":1432,"id":"00f81922-7f41-49d8-a030-1e74f3736137","collection":"contentId-3395881545","type":"file","height":60}}]},{"type":"paragraph","content":[{"text":"and not","type":"text"}]},{"type":"mediaSingle","attrs":{"layout":"center"},"content":[{"type":"media","attrs":{"width":1432,"id":"d530aa2e-a974-41cf-9992-dce336132e82","collection":"contentId-3395881545","type":"file","height":60}}]},{"type":"paragraph","content":[{"text":"If no pick array with a ","type":"text"},{"text":"varid","type":"text","marks":[{"type":"code"}]},{"text":" field is provided, these products are the same: the {varid} set is just larger, because the dimension is wildcarded.","type":"text"}]},{"type":"paragraph","content":[{"text":"Given a ","type":"text"},{"text":"CHROM_POS","type":"text","marks":[{"type":"em"}]},{"text":" array and ","type":"text"},{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" invocation like this","type":"text"}]},{"type":"codeBlock","content":[{"text":"AFL% scan(CHROM_POS);\n{i} chrom,pos\n{0} 0,4\n{1} 1,3\n{2} 3,1\nAFL% subarray(VARIANT, CHROM_POS);\n...","type":"text"}]},{"type":"paragraph","content":[{"text":"the resulting selection would be wildcarded on the ","type":"text"},{"text":"varid","type":"text","marks":[{"type":"code"}]},{"text":" dimension, and look something like","type":"text"}]},{"type":"mediaSingle","attrs":{"layout":"center"},"content":[{"type":"media","attrs":{"width":1081,"id":"7a68de66-00a3-45b3-a94b-8384e7f68ccd","collection":"contentId-3395881545","type":"file","height":788}}]},{"type":"paragraph","content":[{"text":"CHROM_POS","type":"text","marks":[{"type":"em"}]},{"text":" selects particular points in the ","type":"text"},{"text":"chrom-pos","type":"text","marks":[{"type":"em"}]},{"text":" plane, and the ","type":"text"},{"text":"varid","type":"text","marks":[{"type":"em"}]},{"text":" dimension is a wildcarded term in the Cartesian product, so all variant ids in the input array are selected.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Duplicated Or Out-of-order Picks","type":"text"}]},{"type":"paragraph","content":[{"text":"During ","type":"text"},{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" execution, pick arrays are scanned to obtain the pick values prior to any scanning of the input array. The index values (or, for hybrid selection, value tuples) picked for each input array dimension are sorted and duplicate values removed. Unlike array indexing in R or NumPy, ","type":"text"},{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" cannot be used to introduce duplicate cells, or to rearrange the rows or columns of the input array.","type":"text"}]},{"type":"paragraph","content":[{"text":"When there are duplicate picks and the ","type":"text"},{"text":"join:true","type":"text","marks":[{"type":"code"}]},{"text":" option is present, the choice of pick array cell to use to obtain the joined attributes is non-deterministic. For example, suppose a pick dataframe ","type":"text"},{"text":"","type":"text","marks":[{"type":"code"}]},{"text":" contains two cells ","type":"text"},{"text":"(17, 'Alice)","type":"text","marks":[{"type":"code"}]},{"text":" and ","type":"text"},{"text":"(17, 'Bob')","type":"text","marks":[{"type":"code"}]},{"text":" that both pick the ","type":"text"},{"text":"x","type":"text","marks":[{"type":"em"}]},{"text":" dimension value 17. There is no way to know which ","type":"text"},{"text":"who","type":"text","marks":[{"type":"code"}]},{"text":" value, ","type":"text"},{"text":"'Alice'","type":"text","marks":[{"type":"code"}]},{"text":" or ","type":"text"},{"text":"'Bob'","type":"text","marks":[{"type":"code"}]},{"text":", will be used to generate the result array. If the ","type":"text"},{"text":"strict:true","type":"text","marks":[{"type":"code"}]},{"text":" option is present, this situation is considered an error and the query is aborted.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Dataframes As Pick Arrays","type":"text"}]},{"type":"paragraph","content":[{"text":"SciDB dataframes can be used as pick arrays, but not as input arrays. Like ordinary pick arrays, pick indices and tuples taken from dataframe attributes are sorted and de-duplicated prior to scanning the input array. (Since dataframe cells are unordered, this behavior is especially necessary to avoid non-deterministic results.)","type":"text"}]},{"type":"paragraph","content":[{"text":"Dataframe dimensions are a hidden internal detail and cannot be used for pick indices.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Input Array Overlaps Are Ignored","type":"text"}]},{"type":"paragraph","content":[{"text":"If the input array contains overlap areas, they are ignored. No overlap areas appear in ","type":"text"},{"text":"subarray","type":"text","marks":[{"type":"em"}]},{"text":" output, and the overlap parameter for all ","type":"text"},{"text":"subarray ","type":"text","marks":[{"type":"em"}]},{"text":"result schema dimensions is always zero. For example, the three-cell overlap areas from an input array with dimensions ","type":"text"},{"text":"[x=0:*:3:1024; y=0:*:3:128]","type":"text","marks":[{"type":"code"}]},{"text":" are dropped, and the result schema dimensions will be ","type":"text"},{"text":"[x=0:*:0:1024; y=0:*:0:128]","type":"text","marks":[{"type":"code"}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Algorithm Configuration","type":"text"}]},{"type":"paragraph","content":[{"text":"Subarray ","type":"text","marks":[{"type":"em"}]},{"text":"chooses its algorithm based on an estimate of whether or not the pick array data will fit in memory. By default, ","type":"text"},{"text":"subarray ","type":"text","marks":[{"type":"em"}]},{"text":"computes the total estimated in-memory size of all pick array data and compares it to the configured value of ","type":"text"},{"text":"subarray-arena-limit","type":"text","marks":[{"type":"code"}]},{"text":" or, if that value isn’t configured, to ","type":"text"},{"text":"merge-sort-buffer","type":"text","marks":[{"type":"code"}]},{"text":". If the total is over the allowed limit, then the largest pick array will be loaded into a “MemArray” data structure that can spill to disk, and a new total is computed for the remaining pick arrays. When all remaining pick arrays are within the limit, they will be loaded into faster in-memory hash tables. This hybrid approach is suitable for most queries.","type":"text"}]},{"type":"paragraph","content":[{"text":"You can force all pick arrays to load in a particular way by using the ","type":"text"},{"text":"algorithm:","type":"text","marks":[{"type":"code"}]},{"text":" option. If specified, the value of this option must be one of the following strings:","type":"text"}]},{"type":"bulletList","content":[{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"'hash'","type":"text","marks":[{"type":"code"}]},{"text":" – All pick arrays will be loaded into in-memory hash tables regardless of size. If they do not actually fit in memory, an error will occur and the query will abort. You may need to adjust the ","type":"text"},{"text":"subarray-arena-limit","type":"text","marks":[{"type":"code"}]},{"text":" configuration parameter to use this algorithm.","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"'memarray'","type":"text","marks":[{"type":"code"}]},{"text":" – All pick arrays will be loaded into spill-to-disk MemArray data structures. Used primarily during testing for ensuring correctness of the MemArray representation. ","type":"text"}]}]},{"type":"listItem","content":[{"type":"paragraph","content":[{"text":"'shapedpickarray'","type":"text","marks":[{"type":"code"}]},{"text":" – A legacy algorithm used during early development. Not recommended.","type":"text"}]}]}]},{"type":"paragraph"}],"version":1}

Browser not supported