23.10 Release Notes

Known Issues, Limitations, Behavior Changes

Disable pushdown optimization

Versions affected: SciDB 23.10.0–4

Pushdown optimization should be manually disabled in versions 23.10.0 through 23.10.4. If left on, it can lead to various issues including server crashes and high memory use.

To disable it manually, edit the SciDB config file to include the line:

enable-optimize-pushdown=0

It is disabled by default in SciDB 23.10.5.

Avoid curl 8.6 in HTTP API clients

Versions affected: SciDB 21.x–

There is a bug in version 8.6 of the “curl” library that causes HTTP clients to fail with this error when connecting to SciDB’s Client API over HTTPS:

curl: (56) OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 0

(See SDB-8459 for details.)

A fix for this bug was merged into the curl source code on GitHub within days of the ship date of curl 8.6; however, it hasn’t been included in a curl release so far (as of 2023-03-21). The fix should be included in curl 8.7 whenever that gets released.

If you have a client using curl 8.6, you should upgrade to curl 8.7 if available; if not, downgrade to curl 8.5.

You should instruct package managers and dependency frameworks (aptitude, conda, dnf, yum, etc.) to exclude version 8.6 when installing curl/libcurl.

Patch History

Patch notes for SciDB 23.10.8

Release date: July 18, 2024

SciDB SHA: 11b236b

Changes include:

Patch notes for SciDB 23.10.7

Release date: June 12, 2024

SciDB SHA: 0769f38

Changes include:

Patch notes for SciDB 23.10.6

Release date: April 29, 2024

SciDB SHA: cc615d4

Changes include:

Patch notes for SciDB 23.10.5

Release date: March 27, 2024

SciDB SHA: 56b2ad5

Changes include:

Patch notes for SciDB 23.10.4

Release date: March 6, 2024

SciDB SHA: bf30ffc

Fixes include:

Patch notes for SciDB 23.10.3

Release date: February 14, 2024

SciDB SHA: 1a41e7e

Major changes:

  • https://paradigm4.atlassian.net/browse/SDB-8360

    • The namespaces library is always loaded by default. load_library('namespaces') still works, but does nothing.

    • You can now start up scidb directly in security=password mode — you no longer have to start it in trust mode, load the namespaces library, and restart in password mode.

    • The old security=trust mode is deprecated.

Other fixes include:

Patch notes for SciDB 23.10.2

This release was rejected and should not be used.

Release date: January 29, 2024

SciDB SHA: 2603c1c

Patch notes for SciDB 23.10.1

Release date: November 21, 2023

SciDB SHA: 3fb47aa

Fixes include:


Release Notes for SciDB 23.10.0

Release date: October 31, 2023

SciDB commit SHA: 5d0895e

These release notes apply to version 23.10.0 of SciDB; they cover all features and changes since version 21.8.

Release notes for version 21.8 cover all features and changes since version 20.10 and can be found here.

Supported Operating Systems

SciDB 23.5 supports the following operating systems:

  • RedHat8

  • Rocky8

Support for CentOS7 and RedHat7 has been discontinued in this release.

SciDB Features and Changes

HTTP API

SciDB now has an HTTP/HTTPS interface allowing full querying and data transfer support. When SciDB has security mode enabled, only secure HTTPS connections are allowed and you must configure an X.509 certificate to enable the new interface. See https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3395882601 for instructions.

Log file rotation behavior and new log file location

In previous versions, log files were written to each instance’s data directory. Now, by default, they are in a logs/ subdirectory of the data directory (i.e. ${base_path}/${server_number}/${instance_number}/logs/scidb.log), where ${base_path} is defined in the configuration file).

The log configuration file is now named log4cxx-conf.xml instead of log4cxx.properties; it uses an XML configuration format. See the XML examples on https://logging.apache.org/log4cxx/latest_stable/configuration-samples.html for guidance with using these settings.

By default, log files are rotated hourly and compressed. The current log file has the name scidb.log; rotated logs are in the same directory and are named scidb.log.<timestamp>.gz. To change this behavior, edit the log configuration file. Examples of different settings can be found in /opt/scidb/${SCIDB_VER}/share/scidb/logconf-examples/*.xml.

All timestamps in log files and filenames now use the UTC timezone and are formatted according to ISO-8601, with a Z suffix to indicate UTC (e.g., 2023-05-04T15:40:03Z). This does not affect how dates and times are printed within SciDB.

The builtin_equi_join operator

See also https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3032940583

The CS-developed plugin equi_join has been ported into the SciDB with minor changes:

  • support for filter-pushdown and projection-pushdown optimizations.

  • the output is a SciDB dataframe.

  • keys are specified as “left_keys:” and “right_keys:” using attribute/dimension names interpreted in the context of the left or right input array only, reducing the need for array aliases and cast’s.

  • the “out_names:” keyword is not allowed, as this relies on the positions of attributes and thus interacts confusingly with projection-pushdown which eliminates unused attributes.

  • the implementation uses different operator name and symbol names so that it can coexist with the plugin equi_join() without confusion.

The builtin_grouped_aggregate operator

See also https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3033333783

The CS-developed plugin grouped_aggregate has been ported into core SciDB with minor changes:

  • support for filter-pushdown and projection-pushdown optimizations.

  • the output is a SciDB dataframe.

  • the implementation uses different operator name and symbol names so that it can coexist with the plugin grouped_aggregate() without confusion.

Support for subarray operator

See also https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3395881545

The subarray() operator selects elements of an input array according to coordinates specified by the contents of one or more secondary input arrays, called pick arrays.

This operator is intended to add support for these use cases:

  • to produce a sparse subset of cells from an input array for feeding to downstream linear algebra operations, and

  • to produce a sparse or dense subset of cells, sometimes with attached pick array attributes, for conversion to R or NumPy arrays within REVEALTM applications.

Please refer to the SciDB Reference Guide for details of how to use subarray(), including syntax and examples, here: https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3395881545 .

The subdelete operator

See also https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3395881600

The subdelete() operator deletes cells from an array using the same cell selection criteria as subarray().

Please refer to the SciDB Reference Guide for details of how to use subdelete(), including syntax and examples, here: https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3395881600 .

Deprecation of mquery() operator

The mquery() operator was deprecated in release 22.5 and has been removed in this release. Instead of mquery(), users are recommended to use transactions via the begin(), commit(), and rollback() operators. Please refer to the section “Transactions and Transaction Operators” in the SciDB Reference Guide for more details of how to use these operators.

Added dimension keyword to some operators

Several operators which output an array with a new single dimension now allow the user to override the default dimension name. These operators include:

  • aggregate() (for grand aggregate only)

  • help()

  • list()

  • show()

  • uniq()

Example:

AFL% limit(list('operators', dimension: MyDimName), 5); {MyDimName} name,library {0} 'add_attributes','scidb' {1} 'add_instances','system' {2} 'aggregate','scidb' {3} 'apply','scidb' {4} 'attributes','scidb'

Improvement to the remove_versions() operator

See also https://paradigm4.atlassian.net/wiki/spaces/SD/pages/2828830356/remove+versions

The remove_versions() operator can now remove an arbitrary half-open interval [first, last) of array versions. Here last can be max_version, allowing the most recent version(s) to be removed. Please refer to the SciDB Reference Guide for details of how to use remove_versions(), including syntax and examples.

Low disk space warnings

This version of SciDB introduces a new configuration parameter, low-disk-space-threshold-mb. Units are MiB, and the default is 1024 MiB == 1 GiB.

Before SciDB enlarges any datastore file, it will check the available free space on the device hosting the data store. If there is less than low-disk-space-threshold-mb mebibytes available on the device, SciDB will prevent further WRITE queries by taking the global array lock (GAL). This is the same catalog lock used to implement the lock_arrays operator, but unlike lock_arrays, the lock will be taken using a flag that causes subsequent WRITE queries to abort rather than block. When the low disk space condition has been addressed by a system administrator, WRITE queries can be re-enabled using lock_arrays(false).

By “WRITE query” we mean store, insert, delete, subdelete, add_attributes, and the like. However, remove and remove_versions are still permitted, since these operators can free up disk space.

From a user’s perspective, the first WRITE query to cross the threshold receives this error (edited for width):

Other in-progress WRITE queries may receive the same error, or they may complete successfully so long as they do not try to grow a datastore file.

Once the condition is detected, subsequent WRITE queries are failed immediately with this error:

WRITE queries will continue to fail in this way until an administrative user re-enables them with lock_arrays(false). Ideally, before that the problem storage device will have been reprovisioned with more space, or existing space will have been freed.

See also https://paradigm4.atlassian.net/wiki/spaces/scidb/pages/3395883236

Discontinued the de_rle plugin

Starting in 23.10, SciDB no longer ships with the de_rle plugin.

Performance enhancements and bug fixes

Filter pushdown optimization of logical query plan.

Operators which filter down to a subset of the input cells, known as “cell-filters”, are eligible for pushdown optimization, to be executed early in a query. The optimization is enabled by default, but can be globally disabled by the boolean configuration option “enable-optimize-pushdown”.

Due to outstanding issues we have uncovered with filter pushdown, we recommend that this feature be disabled wherever possible when deploying at customer sites.

Edit the config file (/opt/scidb/23.10/service/config-0-mydb if running SciDB as a service, /opt/scidb/23.10/etc/config.ini otherwise) and make sure there is a line enable-optimize-pushdown=0. You may need to restart SciDB if you needed to change this value. This feature will be disabled by default in SciDB 23.10.5.

This applies to the following operators:

  • between()

  • cross_between()

  • filter()

  • subarray() without the join keyword

Projection pushdown optimization of logical query plan.

With knowledge of the flow of dimension/attribute data through each operator, the optimizer can do a top-down analysis to find unused attributes, and then can rewrite the plan to eliminate those attributes early in the query. Ideally, this will reduce the quantity of data used in relatively expensive operators such as join, redimension, and aggregate which require data shuffling.

Changes to LogicalOperator API to support pushdown optimizations.

The pushdown optimization framework relies on an abstract model of dataflow through each operator. Operators with the default behavior will be treated pessimistically, inhibiting opportunities for optimization. The commonly-used builtin operators fully support optimization; plugin operators can also be enhanced to support optimization, but this requires detailed operator-specific knowledge about the relationships between input attrs/dims and output attrs/dims for the operator.

Relaxed restrictions for creating io-paths-list subdirectories

Non-privileged users can now create subdirectories of the io-paths-list directories when saving files there. Formerly only users with admin privilege could do this.

Miscellaneous performance improvements

  • The secure_scan() operator now caches “permissions array” information, resulting in better performance for sites with large permissions arrays.