Configuring SciDB

SciDB configuration is achieved through a config.ini file.

File Format

The SciDB configuration file uses the INI file format. The values assigned to the keys in config.ini should contain only upper and lower case letters, numbers, and the following characters:  .,/-_  (period, comma, forward slash, hyphen, underscore).

Basic SciDB Configuration

This table describes the basic configuration file settings for SciDB.

Key

Scope

Value

Key

Scope

Value

[<clustername>]

Cluster

Name of the SciDB cluster. The cluster name must appear as a section heading in the config.ini file, e.g. [cluster1] To avoid possible problems, using all lowercase for the cluster name is recommended. (The square brackets are literal.  <clustername> is the name you choose.)

db-user

Cluster

User name to use in the catalog connection string. This example uses test1user.  To avoid possible problems, using all lowercase for the db_user parameter is recommended.

install-root

Cluster

Full path to the SciDB installation directory.

logconf

Cluster

Full path to the log4xx logging configuration file.

pluginsdir

Cluster

Full path to the SciDB plugins directory containing all server plugins.

requests

Cluster

The maximum number of client query requests queued for execution on any given instance. Any requests in excess of the limit return to the client with an error. The default value is 1,000.

security

Cluster

Sometimes called the "security mode", this parameter can have one of these values:

  • trust means that user authorization, or "namespaces mode", is disabled.

  • password means that password-based user authentication is required to use the cluster.

  • pam means that pluggable authentication modules will be used for user authentication.

See Enabling Security Mode for a complete description of how to switch between security modes.

server-N

Server

The host name or IP address of server N, where N = 0, 1, 2, ..., followed by a comma, followed by the index of the last  instance to launch on the server.  Instance indices are zero-based.  For example, a server directive

server-2=host.example.com,3

says that server-2 has a hostname of host.example.com and will launch four SciDB instances, with instance numbers 0 through 3.

Note that the total number of instances on all servers can not exceed the maximum number of Postgres connections allowed.  This number is specified in the postgressql.conf file.

In releases prior to 15.12, the meaning of the number after the comma was different for server-1 and higher.  For server-0 the number was as described above, but for other servers it represented a count rather than an index, and instances were numbered starting from one rather than from zero.

he full format of this setting is:

server-N=IP|Hostname,[n,]m-p,q-s, ...

It specifies a set of instances with instance indices 0-n,m-p,q-s on server N identified by IP or Hostname where n,m,p,q,s are positive integers such that 0 <= n < m <= p < q <= s.

Cluster Configuration

This table describes the cluster configuration file parameters and how to set them.

Key

Scope

Value

Key

Scope

Value

base-path

Cluster

The root data directory for each SciDB instance. Each SciDB instance uses an enumerated data directory below the base-path. The list('instances') command shows all instances and their data directories for a running SciDB cluster.

base-port (optional)

Cluster

Base port number. The SciDB instances communicate via the TCP port = base-port + instance index. Clients can connect to any of the instances on their corresponding ports. The default base-port is 1239.

data-dir-prefix (optional)

SciDB Instance

The SciDB administrator can provide file system directories for reference to multiple disks connected to a single server. The advantage to using the data-dir-prefix parameter is that you can arbitrarily assign physical storage and the filesystem locations to SciDB instances.

For example, if there are 4 disks and 8 instances on server-0, your configuration could be as follows:

data-dir-prefix-0-0=/datadisk1/myserver.000.0
data-dir-prefix-0-1=/datadisk2/myserver.000.1
data-dir-prefix-0-2=/datadisk3/myserver.000.2
data-dir-prefix-0-3=/datadisk4/myserver.000.3
data-dir-prefix-0-4=/datadisk1/myserver.000.4
data-dir-prefix-0-5=/datadisk2/myserver.000.5
data-dir-prefix-0-6=/datadisk3/myserver.000.6
data-dir-prefix-0-7=/datadisk4/myserver.000.7

You need not to specify this parameter for each instance. For any omitted instance, SciDB creates a folder using the default naming scheme. If you do specify a value for this parameter, you must ensure that the specified folder exists and that it is completely empty. Otherwise errors occur when you try to initialize SciDB.

If a server has multiple storage disks, and you want to assign more than one instance to each disk, you must set the data-dir-prefix parameter for the instances on that server.

In release 15.12 and later, instance numbers are always zero-based.  In the example above, the eight instances on server-0 have data-dir-prefix-<server>-<instance>=/path directives with instance numbers from zero to seven.  In prior releases, instance numbers for instances on servers other than server-0 were one-based.  See the description of the server-N configuration key above.



io-paths-list (optional)

Cluster

A colon-separated list of absolute directory paths that non-administrative users are allowed to access.  The list is empty by default.  A typical setting might be /tmp:/dev/shm .

Ordinarily, non-administrative users must load and save data using relative pathnames that refer to per-user subdirectories of the instance data directories.  To allow load and save from absolute path names, those path names must be "covered" by entries on the io-paths-list.  See File I/O Restrictions.

key-file-list (optional)

Server



Comma-separated list of filenames that include keys for ssh authentication. Default: /home/scidb/.ssh/id_rsa and id_dsa.

If you do not use the default name and path for the key, you must set a value for this parameter.

low-disk-space-threshold-mb

Cluster

SciDB will go into read-only mode if the free disk space for any datastore device drops below this threshold. Default: 1024 MiB.

If the low disk space threshold is crossed, a warning message is logged with the name of the device. A system administrator can free space or provision more space, and then restore normal operation using the lock_arrays operator.

 no-watchdog (optional)

Cluster

Set this to true to avoid automatic restart of the SciDB instance on an OS process failure. Default: false.

By default, each SciDB instance spawns two Linux processes; one for the instance itself, and one as a watchdog. The watchdog process detects if the instance process fails. If it detects a failure, it forks a new, replacement process, to bring the instance back online.

 pg-port (optional)

 Cluster

The listening TCP port of Postgres—the port on which Postgres accepts incoming connections. Default: 5432.

redundancy (optional)

 Cluster

Indicates the number of replications of array data. The value of this parameter specifies the number of extra copies of array data to store. Setting this parameter to a positive value is part of creating a fault-tolerant SciDB cluster. Default: 0 (meaning SciDB stores only one copy of data).

The number of data copies equals 1 + the redundancy value. Maximum value for the redundancy parameter is one less than the number of servers in your SciDB cluster and is at most eight. The number and identities of the servers are specifies by the server-N configuration entry.

You can change the value throughout the lifetime of a cluster.

In SciDB versions 16.x and later, the redundancy refers to the data copies maintained by the servers in a SciDB cluster rather than by the instances.



secure-scan-config
(optional)

Cluster

Special configuration for the secure_scan operator.   See Configuring The secure_scan Operator.

ssh-port (optional)

 Cluster

The TCP port ssh uses for communications within the cluster. Default:22.

HTTP and HTTPS Configuration

Important HTTPS configuration parameters

Key

Scope

Default

Value

Key

Scope

Default

Value

http-base-port
(optional)

Cluster

8239

Port number for HTTP or HTTPS connections to the first SciDB instance. Subsequent instances use successive port numbers in sequence.

This port uses HTTPS if SciDB has security mode enabled (security=password or security=pam) or if it has an X.509 certificate; it uses HTTP if SciDB is running with security=trust and no X.509 certificate is configured.

https-cert
(required in security mode)

Cluster

Path to an X.509 certificate file. This is required if SciDB is running with security mode enabled (security=password or security=pam).

When you set this, you must also set https-key. If your certificate file contains both a certificate and a private key, you need to set both https-cert and https-key to the same path, and the file must have its permissions restricted so that only the scidb user can read it (chown scidb:scidb; chmod 600). Otherwise, SciDB will refuse to start (see the log file for informational messages if this happens).

See Configuring HTTPS access for instructions on creating an X.509 certificate or migrating one from an existing Shim installation.

https-key
(required in security mode)

Cluster

Path to an X.509 private key file. If you are using a single file that contains both the certificate and the private key, this must be set to the same path as https-cert.

See Configuring HTTPS access for instructions on creating an X.509 certificate or migrating one from an existing Shim installation.

Javascript security parameters (CORS)

In order for a Javascript application running in a browser to connect to SciDB, the server must authorize the browser to run Javascript from the origin site where the Javascript file is hosted. This is known as the CORS (Cross-Origin Resource Sharing) browser scripting security model.

If you have Javascript applications that you want to run against SciDB, you must set the http-allow-origins configuration parameter to authorize the sites that host the Javascript files:

Key

Scope

Default

Value

Key

Scope

Default

Value

http-allow-origins

Cluster

(none)

A JSON list of server URLs (origins) that are allowed to provide Javascript that a web browser can run on this SciDB cluster.

For example: ['http://my.example.com', 'https://my.other.example.com'].

The default setting is empty, meaning that Javascript won't work against this cluster from a normal browser until this option is set.

Use * to accept all origins, but this is not recommended as it leaves the server open to unauthorized Javascript clients and cross-origin attacks.

If you don’t have any applications that connect to SciDB using Javascript that runs in a user’s web browser, you don’t need to set this parameter.

Network parameters

In most cases, you shouldn’t need to set these. However, you might need to modify these parameters in a few cases:

  • if network security settings are preventing users from connecting to SciDB

  • if a user is connecting to SciDB from a browser or an application or that has specific security requirements

Key

Scope

Default

Value

Key

Scope

Default

Value

dns-suffix

Cluster

(none)

DNS suffix for the subnet that the SciDB instances reside in. A client should be able to use
the instance name followed by this suffix to refer to an instance from a wide-area network.

If not provided, clients from outside of the subnet might not be able to access some URLs generated by SciDB.

http-bind-hostname

Cluster

::

 

http-auth-cookie-name

Cluster

__Host-SciDB-Auth

Name of the authorization cookie used to continue an HTTP session. Prefix with __Host- to prevent browsers from sharing the cookie with other hosts or over unencrypted connections; prefix with __Secure- to allow sharing with other hosts
but prevent browsers from sending the cookie over unencrypted
connections.

http-auth-cookie-attributes

Cluster

SameSite=None; Secure; HttpOnly; Path=/

Attributes for the authorization cookie issued by SciDB, separated with semicolons (;). See Using HTTP cookies - HTTP | MDN for possible values. These might need to be adjusted for web applications that connect to SciDB from certain browsers.

http-headers

Cluster

(none)

A JSON structure specifying additional HTTP headers to include in every response. Extra headers might be needed to get past proxy servers, caches, or firewalls on the network, or to satisfy security requirements in a client application or browser.

In the SciDB configuration file, the JSON must be surrounded by double or single quotes, and all lines except the first must be indented. For example, this is how a two-line configuration for http-headers might appear in the configuration file:

http-headers='{"Cache-Control": "no-store", "X-Scidb-Cluster-Name": "scidb-101"}'

Timeout settings

You might need to adjust these parameters if queries or user sessions are timing out before they are finished.

Key

Scope

Default

Value

Key

Scope

Default

Value

http-startup-timeout

Cluster

20

Maximum number of seconds to wait while attempting to start up the HTTP server listener. If this is exceeded, the port is likely blocked or busy running another listener. In general, you should not need to change this setting.

http-connection-idle-timeout

Cluster

600
(10 minutes)

Timeout (in seconds) for disconnecting an inactive HTTP client. This just manages how long connections are cached in the connection pool — clients can transparently reconnect and resume their sessions without reauthenticating, as long as their authorization cookie is still valid (see http-auth-cookie-expiration).

http-auth-cookie-expiration

Cluster

900
(15 minutes)

Default time (in seconds) after which an authorization cookie expires. A client receives a new authorization cookie with a new expiration timestamp every time it interacts with the server; the only way for the client’s authorization to expire is if it doesn’t interact with the server at all for at least this long.

http-idle-session-timeout

Cluster

7200
(2 hours)

An HTTP session expires if the client doesn't interact with it for this many seconds and if no query is actively executing in the session. This means that interim query results and temporary arrays created in the session will be lost.

http-access-control-max-age

Cluster

3600
(1 hour)

Value for the Access-Control-Max-Age header (how long browsers can cache the Access-Control headers from CORS preflight requests against this server, in seconds.

This is only relevant for web applications running in Javascript in a user’s browser.

http-default-lock-timeout

Cluster

60
(1 minute)

Default timeout (in seconds) for the server to prepare a query on the HTTP API. If this timeout is exceeded, it is likely because another query has a lock on an array that the query needs to use.

http-default-first-page-timeout

Cluster

3600
(1 hour)

Default timeout for the server to return the first page of results from a query on the HTTP API, measured in seconds since the query was prepared (for non-paged queries) or since the client requested the page (for paged queries). This timeout includes query execution and fetching the first page of results.

http-default-next-page-timeout

Cluster

600
(10 minutes)

Default timeout for the server to return a page other than the first, measured in seconds since the client requested the page. (This is set to a relatively small value because the fetch time for each page is expected to be small after the first page is fetched.)

http-default-query-inactivity-timeout

Cluster

300
(5 minutes)

Amount of time that the client has to request the next page of query results before the query times out, measured in seconds since the last response was sent to the client. (This is deliberately set to a small value to ensure that the query gets cleaned up quickly if the SciDBR client crashes.)

HTTP performance-related settings

Key

Scope

Default

Value

Key

Scope

Default

Value

http-threads

Cluster

2

Number of HTTP server threads per instance.

http-aio-save-buffer-size-bytes

Cluster

1 MB

When the output format uses the aio_save() operator of the accelerated_io_tools plugin (for example: Apache Arrow format), this is the size of the buffer that HTTP API uses to read from aio_save, in bytes.

http-aio-save-flush-bytes

Cluster

5 MB

When the output format uses the aio_save() operator of the accelerated_io_tools plugin (for example: Apache Arrow format), data is sent to client in batches of this many bytes.

 

Query Optimization Settings

Key

Scope

Default

Value

Key

Scope

Default

Value

enable-optimize-pushdown

Cluster

true

Enables filter-pushdown and projection-pushdown

 

 

 

 

Performance Configuration

This table describes the configuration file elements for tuning your system performance.

Key

Applies To

Value

Key

Applies To

Value

admin-queries (optional)

SciDB Instance

The number of administrative queries you can schedule to execute in parallel. The number must be less than execution-threads. Default: 1.

buffermgr-threads-write

(optional)

SciDB Instance

Determines the total number of DB and TMP buffer writes which are processed concurrently. (Prior to 20.10.0, controlled by result-prefetch-queue-size). Default: 4.

chunk-load-look-ahead

(optional)

Query

Changes ChunkLoader::getPrefetch() which controls the maximum number of chunks read from a load file concurrently in one thread while those chunks are loaded into the system in another. (Prior to 20.10.0, controlled by result-prefetch-queue-size). SciDB text format loading has a known bug if set below 3.  Default: 3.

client-queries (optional)

SciDB Instance

The number of queries you can schedule to execute in parallel. The number must be less than (execution-threads - admin-queries). Default:  execution-threads - admin-queries - 1.

execution-threads (optional)

SciDB Instance

Size of thread pool available for query execution. Shared pool of threads used by all queries for doing IO and various other query execution tasks. Set this value to two more than the number maximum number of concurrent queries you want. This parameter value should exceed the value of result-prefetch-threads. Default: 5.

large-memalloc-limit (optional)

SciDB Instance

Threshold limit on the maximum number of simultaneous large allocations for glibc malloc(). It has the same effect as the M_MMAP_MAX setting for malloc. Default: 65,536. Removed as of 23.x due to change to a different malloc library.

max-memory-limit (optional)

but required in practice

SciDB Instance

Roughly "Maximum amount of memory the scidb process can take up (mebibytes)"

When a query and its operators attempt to allocate memory that would cause the total amount of memory malloced to exceed max-memory-limit the query is cancelled.

Setting this to a value (somewhat smaller than) the total amount of physical memory per server / number of instances,  usually avoids scidb or other processes from being killed by the kernel's OOM killer.   Instead, the scidb query is cancelled.

It has historically been set reasonably by our Customer Services group, who have experience determining how large “somewhat smaller than” needs to been in practice.

Though scidb will run without it being set, in practice it should not be left unset in a production environment.

Default: unlimited.

materialized-cache-size

(optional)

SciDB Instance

Sets MaterializedArray::_cacheSize. (Prior to 20.10.0, controlled by result-prefetch-queue-size.) Default: 4.

mem-array-threshold (optional)

SciDB Instance

Maximum size in MB of temporary data to cache in memory before writing to temporary disk files. Default: 1024 MB.

merge-sort-buffer (optional)

Thread

Size of memory buffer used for sorting, in MB. Default: 512 MB.

operator-job-queue-threads (optional)

Query

Controls the amount of concurrent processing by  load(), and input() and SortArray.  Note that SortArray is used by sort() and redimension().  (Prior to 20.11.0, was controlled by result-prefetch-threads.)  Default: 1.

replication-receive-queue-size (optional)

SciDB Instance

The maximum size – in number of chunks – of the receive queue for the replica chunks on each SciDB instance. Default: 16

replication-send-queue-size (optional)

SciDB Instance

The maximum size – in number of chunks – of the send queue for the replica chunks on each SciDB instance. Default: 4.

result-prefetch-queue-size (optional)

Query

Eliminated in 20.10.0. But see materialized-cache-size, chunk-load-look-ahead.

result-prefetch-threads (optional)

SciDB Instance

Eliminated in 20.10.0.  But see operator-job-threads.

sg-receive-queue-size (optional)

Query

A limit – in number of chunks – on how much of any SciDB instance's receive queue can be occupied by the chunks of an individual query. Default: 8.

sg-send-queue-size (optional)

Query

A limit – in number of chunks – how much of any SciDB instance's send queue the chunks of an individual query can occupy. Default: 16.

small-memalloc-size (optional)

SciDB Instance

Small allocation threshold size in bytes for glibc memory allocator, i.e. malloc(). malloc() treats all memory allocations larger than this size as "large" and pass through to Linux mmap. It has the same effect as the M_MMAP_THRESHOLD setting for malloc. Default: 268,435,456 bytes (256 MB). Removed as of 23.x due to change to a different malloc library.

smgr-cache-size (optional)

SciDB Instance

Size of memory in MB allocated to the shared cache of array chunks. The cache is used only for the chunks belonging to persistent arrays. Default: 256 MB.