load

load

The load operator is a macro that loads data from a file into a named array. It combines the functionality of the input operator and the store operator.

In Enterprise Edition, file I/O restrictions may apply.

Synopsis

load( output_array, input_file
      [,instance_id [,format [,max_errors [,isStrict ]]]] )

load( output_array, input_file
      [, instance: instance_id ]  [, format: format ]
      [, max_errors: max_errors ]  [, strict: isStrict ] )

Summary

The AFL load macro loads data from input_file into the cells of a SciDB array, output_array. The output_array can be either a temporary array or a persistent array.

Use the second form to specify parameters with named keywords.  For example, you can say load(A, '/tmp/mydata', format: 'tsv') to load TSV data without having to remember that -2 is the default value of instance_id

load() is not technically an operator, but rather a macro that expands to store(input()). Because macro expansion precedes other query processing, you can use load() as you would any operator. The output of list('macros') includes load and the output of list('operators') does not.

Inputs

The load operator takes the following arguments: 

  • output_array: the name of a SciDB array to hold the data.

  • input_file: the complete path to the file containing the source data for the array.

  • instance_id: Optional. Specifies the instance or instances for performing the load. The default is to load all data from the coordinator instance, that is, the instance to which the client program is connected. The value must be one of the following:

For CSV and TSV formats, the second part of the format string consists of a colon followed by one or two characters that controls how the input file is interpreted.

 

When loading in CSV format, the load() macro guesses the correct quote character based on the first buffer of input data. If the macro finds no quote character (either single-quote or double-quote) in the first buffer, it assumes single-quoting. To force a particular quote character, use :d or :s.

 

  • max_errors: Optional. Specifies the maximum number of errors before the operator fails. The default value is 0, meaning that if any errors are encountered, the operation fails.

  • isStrict: Optional. If true, this flag does two things. It restricts the incoming data to contain no out-of-order cell values, where order is row-major as defined by the left-to-right declaration of dimensions in the array schema. It also prevents collisions. If the flag is true and either of these conditions occurs, the load operation fails. By default, this flag is true.

Example

  1. Create a text file /tmp/intensity_data.csv with the following content:

    'High',0,100 'High',2,97 'Medium',2,null 'Medium',3,67 'Low',3,null 'Low',4,55
  2. Create an array to hold the array data:

    AFL% CREATE ARRAY intensityFlat <exposure:string,elapsedTime:int64,measuredIntensity:int64>[i=0:5];
  3. Load the data:

    AFL% load(intensityFlat ,'/tmp/intensity_data.csv', -2, 'CSV'); 

     

  4. Scan the loaded array:

    AFL% scan(intensityFlat); {i} exposure,elapsedTime,measuredIntensity {0} 'High',0,100 {1} 'High',2,97 {2} 'Medium',2,null {3} 'Medium',3,67 {4} 'Low',3,null {5} 'Low',4,55 

     

  5. Suppose your schema has a measuredIntensity attribute that does not allow nulls.  In that case, you would expect errors when loading /tmp/intensity_data.csv.  You can add a value for the max_errors parameter.

    AFL% CREATE ARRAY intensityFlatNotNull<exposure:string,elapsedTime:int64,measuredIntensity:int64 NOT NULL>[i=0:5];
    AFL% load( intensityFlatNotNull, '/tmp/intensity_data.csv', -2, 'csv', 99);
  6. Scan the loaded array:

    AFL% scan(intensityFlatNotNull); {i} exposure,elapsedTime,measuredIntensity {0} 'High',0,100 {1} 'High',2,97 {2} 'Medium',2,0 {3} 'Medium',3,67 {4} 'Low',3,0 {5} 'Low',4,55 

     

  7. Remove the arrays:

    AFL% remove(intensityFlat); remove(intensityFlatNotNull);