save

The save operator saves array data to a file.

In Enterprise Edition, file I/O restrictions may apply.

Synopsis

save(src_array, file_path [, instance_id [, format]] )

save(src_array, file_path [, instance: instance_id ] [, format: format ] )

Summary

The AFL save operator saves the data from the cells of a SciDB array into a file. By default, it saves the data in SciDB text format. To specify a different output format, use the format parameter.

Use the second form to specify parameters with named keywords.  For example, you can say save(A, '/tmp/mydata', format: 'tsv') to save TSV data without having to remember that -2 is the default value of instance_id

 The save operator saves the latest version of the array by default.  To save a particular version, use arrayName@versionNumber for the source array, for example MYARRAY@5.   To save all versions of an array, refer to the -a/--all-versions option of scidb_backup.py.

Inputs

  • src_array:  The source array containing the data you want to save to a file.
  • file_path:  The complete path to the file to receive the data.
  • instance_id:   Optional. Specifies the instance for performing the save. The default saves all data using the query coordinator instance, that is, the instance to which the client program is connected. The value must be one of the following:
    • -2  save all data on the coordinator instance of the query.  This is the default.
    • -1  save data as it is distributed, that is, each instance concurrently saves its own portion of data to a file in the per-instance data directory.  With instance_id of -1, the file_path argument must be a relative path.
    • 0, 1, ...  save all data via the specified physical instance.
    • (x, y) – Load data using the instance specified by the (server_id, server_instance_id) pair (x, y).  For example, if the cluster config.ini file contains the lines server-1=srv42.example.com,2-3 and base-path=/vdisk/scidb you can save data to the data directory /vdisk/scidb/1/3 using save(ARRAY, 'filename', (1,3)) .
  • format:   Optional. The format string lets you specify how to save the data. The default format is SciDB-formatted text. Note that you must include the instance_id parameter to specify an output format. The format string has two parts. The first part indicates the type of file to create.  These may be:
    • Binary save: When saving a one-dimensional array into a binary file, SciDB uses the format string as a guide for organizing the data in the file. For a complete description of the binary file format and binary format strings, see Binary Files
    • Opaque save: The string requires: "opaque".  The opaque format is a proprietary format that is likely to be the fastest format for saving the data and is not transportable to other environments.
    • Text saveThe text formats are considered deprecated, see warning below. You can save the data any of several variations of the SciDB text format.  For a complete list of formats, enter iquery --help at the Linux command prompt and look for the --format option.  The main text formats are:
      • text – SciDB text format.  This is the default.
      • sparse – Do not write "()" placeholders for empty cells.
      • dense – The input array is presumed to contain no empty cells, so do not write "{x,y,...}" coordinate information.
      • store – Uses maximum floating point precision and records overlap regions.
    • CSV save: The string must be csv or csv+ (not case sensitive).  SciDB saves data in comma-separated-value format.  When '+' is used, cell coordinates save as the first fields of each data record.
    • TSV save: The string must be tsv or tsv+ (not case sensitive).  SciDB saves data in the tab-separated value LinearTSV dialect.  When '+' is used, cell coordinates are saved as the first fields of each data record.

Even though text is the default format, use a CSV or TSV format to save large amounts of data in non-binary form.  The text family of formats has subtle dependencies on array chunk sizes, and is not suitable for data interchange.



To preserve the binary value of floating point numbers, saving your output in opaque and/or binary formats is recommended. Saving floating point values in any other format can be lossy. Files saved in text, CSV, and TSV formats from prior releases used only six significant decimal digits. SciDB 16.9 saves double values with 15 significant decimal digits in text, CSV, and TSV formats.


For CSV and TSV formats, the second (optional) part of the format string consists of a colon followed by one or two option specifier characters.  Most of these options control how null values appear in the saved file. Note that these options affect only null values whose missing reason code is zero (?0). For null values with other missing reason codes (?1 - ?127), the output is unaffected. For example, a null value of ?42 will always appear in the TSV or CSV file as ?42.  See Special Values for Attributes for more about missing reason codes.
The specifier characters are:
    • E  the null value ?0 appears in the TSV or CSV file as an unquoted empty string.
    • N  the null value ?0 appears in the TSV or CSV file as \N .  This is the default for TSV formats.
    • n  the null value ?0 appears in the TSV or CSV file as the unquoted lowercase token "null".  This is the default for CSV formats.
    • ?  the null value ?0 appears in the TSV or CSV file as ?0
    • l (lowercase L, not a numeral)  the save() operator writes a label line (sometimes called a nameline or header line) before writing the rest of the data. The default behavior is to omit label lines.
    • d – use double quotes around string attributes (CSV only).

Examples

For these examples, create a sparse two-dimensional array from a data file.  Some of the attributes are null (?0, ?42).

$ cat /tmp/example.tsv
0|0|?0|5.0
1|1|Marie Curie|3.14159265358979
2|2|Carl Sagan|?42
$ iquery -a
AFL% create array saveMe<who:string,val:double>[i=0:2; j=0:2];
Query was executed successfully
AFL% store(
        redimension(
            input(<i:int64,j:int64,who:string,val:double>[row=0:*],
                  '/tmp/example.tsv', -2, 'tsv:p'), -- :p means pipe '|' is field separator
            saveMe),
     saveMe);
Query was executed successfully
AFL% scan(saveMe);
{i,j} who,val
{0,0} null,5
{1,1} 'Marie Curie',3.14159
{2,2} 'Carl Sagan',?42
AFL% exit;

Save in SciDB 'text' Format

The default text format:

$ iquery -naq "save(saveMe, '/tmp/text.out')"
Query was executed successfully
$ cat /tmp/text.out
{0,0}[[(null,5),(),()],[(),('Marie Curie',3.14159265358979),()],[(),(),('Carl Sagan',?42)]]

Save in SciDB 'sparse' Format

The sparse format.  No placeholders; each non-contiguous cell is marked with {i,j} coordinate pairs.

$ iquery -naq "save(saveMe, '/tmp/sparse.out', -2, 'sparse')"
Query was executed successfully
$ cat /tmp/sparse.out
{0,0}[[{0,0}(null,5),{1,1}('Marie Curie',3.14159265358979),{2,2}('Carl Sagan',?42)]]

Save in CSV Format

The csv format.  Notice that cell coordinates do not save.

$ iquery -naq "save(saveMe, '/tmp/csv.out', -2, 'csv')"
Query was executed successfully
$ cat /tmp/csv.out
null,5
'Marie Curie',3.14159265358979
'Carl Sagan',?42

Save in TSV+ Format

The tsv+ format records the cell coordinates.  The :l ("colon ell") option specifier produces a nameline.  Notice that the ordinary null appears as \N, but the missing reason 42 null appears as ?42 (you can control this behavior with option specifiers).

$ iquery -naq "save(saveMe, '/tmp/tsvplus.out', -2, 'tsv+:l')"
Query was executed successfully
$ cat /tmp/tsvplus.out
i	j	who	val
0	0	\N	5
1	1	Marie Curie	3.14159265358979
2	2	Carl Sagan	?42

Save in Binary Format

Schema attributes are nullable by default, but types in binary format strings are not.  See Binary Files for how to interpret the od program output.

$ iquery -naq "save(saveMe, '/tmp/binary.out', -2, '(string null, double null)')"
Query was executed successfully
$ od -c /tmp/binary.out 
0000000  \0  \0  \0  \0  \0 377  \0  \0  \0  \0  \0  \0 024   @ 377  \f
0000020  \0  \0  \0   M   a   r   i   e       C   u   r   i   e  \0 377
0000040 021   -   D   T 373   !  \t   @ 377  \v  \0  \0  \0   C   a   r
0000060   l       S   a   g   a   n  \0   *  \0  \0  \0  \0  \0  \0  \0
0000100  \0
0000101


Remove the array:

$ iquery -naq "remove(saveMe)"