Configuring SciDB

SciDB configuration is achieved through a config.ini file.

File Format

The SciDB configuration file uses the INI file format. The values assigned to the keys in config.ini should contain only upper and lower case letters, numbers, and the following characters:  .,/-_  (period, comma, forward slash, hyphen, underscore).

Basic SciDB Configuration

This table describes the basic configuration file settings for SciDB.

Key ScopeValue
[<clustername>]ClusterName of the SciDB cluster. The cluster name must appear as a section heading in the config.ini file, e.g. [cluster1] To avoid possible problems, using all lowercase for the cluster name is recommended. (The square brackets are literal.  <clustername> is the name you choose.)
db-userClusterUser name to use in the catalog connection string. This example uses test1user.  To avoid possible problems, using all lowercase for the db_user parameter is recommended.
install-rootClusterFull path to the SciDB installation directory.
logconf ClusterFull path to the log4cxx logging configuration file. As of scidb 22.6, this can be an XML file. A sample of how to write your XML log file to retain logs indefinitely, one file per hour: log1.xml
pluginsdir ClusterFull path to the SciDB plugins directory containing all server plugins.
requestsClusterThe maximum number of client query requests queued for execution on any given instance. Any requests in excess of the limit return to the client with an error. The default value is 1,000.
securityCluster

Sometimes called the "security mode", this parameter can have one of these values:

  • trust means that user authorization, or "namespaces mode", is disabled.
  • password means that password-based user authentication is required to use the cluster.
  • pam means that pluggable authentication modules will be used for user authentication.

See Enabling Security Mode for a complete description of how to switch between security modes.

server-NServer

The host name or IP address of server N, where N = 0, 1, 2, ..., followed by a comma, followed by the index of the last  instance to launch on the server.  Instance indices are zero-based.  For example, a server directive

server-2=host.example.com,3

says that server-2 has a hostname of host.example.com and will launch four SciDB instances, with instance numbers 0 through 3.

Note that the total number of instances on all servers can not exceed the maximum number of Postgres connections allowed.  This number is specified in the postgressql.conf file.

In releases prior to 15.12, the meaning of the number after the comma was different for server-1 and higher.  For server-0 the number was as described above, but for other servers it represented a count rather than an index, and instances were numbered starting from one rather than from zero.

he full format of this setting is:

server-N=IP|Hostname,[n,]m-p,q-s, ...

It specifies a set of instances with instance indices 0-n,m-p,q-s on server N identified by IP or Hostname where n,m,p,q,s are positive integers such that 0 <= n < m <= p < q <= s.

Cluster Configuration

This table describes the cluster configuration file parameters and how to set them.

Key ScopeValue
base-pathClusterThe root data directory for each SciDB instance. Each SciDB instance uses an enumerated data directory below the base-path. The list('instances') command shows all instances and their data directories for a running SciDB cluster.
base-port (optional)Cluster Base port number. The SciDB instances communicate via the TCP port = base-port + instance index. Clients can connect to any of the instances on their corresponding ports. The default base-port is 1239.
data-dir-prefix (optional)SciDB Instance

The SciDB administrator can provide file system directories for reference to multiple disks connected to a single server. The advantage to using the data-dir-prefix parameter is that you can arbitrarily assign physical storage and the filesystem locations to SciDB instances.

For example, if there are 4 disks and 8 instances on server-0, your configuration could be as follows:

data-dir-prefix-0-0=/datadisk1/myserver.000.0
data-dir-prefix-0-1=/datadisk2/myserver.000.1
data-dir-prefix-0-2=/datadisk3/myserver.000.2
data-dir-prefix-0-3=/datadisk4/myserver.000.3
data-dir-prefix-0-4=/datadisk1/myserver.000.4
data-dir-prefix-0-5=/datadisk2/myserver.000.5
data-dir-prefix-0-6=/datadisk3/myserver.000.6
data-dir-prefix-0-7=/datadisk4/myserver.000.7

You need not to specify this parameter for each instance. For any omitted instance, SciDB creates a folder using the default naming scheme. If you do specify a value for this parameter, you must ensure that the specified folder exists and that it is completely empty. Otherwise errors occur when you try to initialize SciDB.

If a server has multiple storage disks, and you want to assign more than one instance to each disk, you must set the data-dir-prefix parameter for the instances on that server.

In release 15.12 and later, instance numbers are always zero-based.  In the example above, the eight instances on server-0 have data-dir-prefix-<server>-<instance>=/path directives with instance numbers from zero to seven.  In prior releases, instance numbers for instances on servers other than server-0 were one-based.  See the description of the server-N configuration key above.

io-paths-list (optional)Cluster

A colon-separated list of absolute directory paths that non-administrative users are allowed to access.  The list is empty by default.  A typical setting might be /tmp:/dev/shm .

Ordinarily, non-administrative users must load and save data using relative pathnames that refer to per-user subdirectories of the instance data directories.  To allow load and save from absolute path names, those path names must be "covered" by entries on the io-paths-list.  See File I/O Restrictions.

key-file-list (optional)Server

Comma-separated list of filenames that include keys for ssh authentication. Default: /home/scidb/.ssh/id_rsa and id_dsa.

If you do not use the default name and path for the key, you must set a value for this parameter.

 no-watchdog (optional)

Cluster

Set this to true to avoid automatic restart of the SciDB instance on an OS process failure. Default: false.

By default, each SciDB instance spawns two Linux processes; one for the instance itself, and one as a watchdog. The watchdog process detects if the instance process fails. If it detects a failure, it forks a new, replacement process, to bring the instance back online.

 pg-port (optional) ClusterThe listening TCP port of Postgres—the port on which Postgres accepts incoming connections. Default: 5432.
redundancy (optional) Cluster

Indicates the number of replications of array data. The value of this parameter specifies the number of extra copies of array data to store. Setting this parameter to a positive value is part of creating a fault-tolerant SciDB cluster. Default: 0 (meaning SciDB stores only one copy of data).

The number of data copies equals 1 + the redundancy value. Maximum value for the redundancy parameter is one less than the number of servers in your SciDB cluster and is at most eight. The number and identities of the servers are specifies by the server-N configuration entry.

You can change the value throughout the lifetime of a cluster.

In SciDB versions 16.x and later, the redundancy refers to the data copies maintained by the servers in a SciDB cluster rather than by the instances.

secure-scan-config
(optional)

ClusterSpecial configuration for the secure_scan operator.   See Configuring The secure_scan Operator.
ssh-port (optional) Cluster The TCP port ssh uses for communications within the cluster. Default:22.

Performance Configuration

This table describes the configuration file elements for tuning your system performance.

KeyScopeApplies ToValue
admin-queries (optional)ClusterSciDB InstanceThe number of administrative queries you can schedule to execute in parallel. The number must be less than execution-threads. Default: 1.

buffermgr-threads-write

(optional)

ClusterSciDB InstanceDetermines the total number of DB and TMP buffer writes which are processed concurrently. (Prior to 20.10.0, controlled by result-prefetch-queue-size). Default: 4.

chunk-load-look-ahead

(optional)

ClusterQueryChanges ChunkLoader::getPrefetch() which controls the maximum number of chunks read from a load file concurrently in one thread while those chunks are loaded into the system in another. (Prior to 20.10.0, controlled by result-prefetch-queue-size). SciDB text format loading has a known bug if set below 3.  Default: 3.
client-queries (optional)ClusterSciDB Instance

The number of queries you can schedule to execute in parallel. The number must be less than (execution-threads - admin-queries). Default:  execution-threads - admin-queries - 1.

execution-threads (optional) ClusterSciDB InstanceSize of thread pool available for query execution. Shared pool of threads used by all queries for doing IO and various other query execution tasks. Set this value to two more than the number maximum number of concurrent queries you want. This parameter value should exceed the value of result-prefetch-threads. Default: 5.
large-memalloc-limit (optional)ClusterSciDB InstanceThreshold limit on the maximum number of simultaneous large allocations for glibc malloc(). It has the same effect as the M_MMAP_MAX setting for malloc. Default: 65,536.
max-memory-limit (optional)ClusterSciDB InstanceThe hard-limit maximum amount of memory in MB that the SciDB instance can allocate. If the instance requests more memory from the operating system the allocation fails with an exception. Default: No limit.

materialized-cache-size

(optional)

ClusterSciDB InstanceSets MaterializedArray::_cacheSize. (Prior to 20.10.0, controlled by result-prefetch-queue-size.) Default: 4.
mem-array-threshold (optional)ClusterSciDB InstanceMaximum size in MB of temporary data to cache in memory before writing to temporary disk files. Default: 1024 MB.
merge-sort-buffer (optional)ClusterThread Size of memory buffer used for sorting, in MB. Default: 512 MB.
operator-job-queue-threads (optional) ClusterQuery

Controls the amount of concurrent processing by  load(), and input() and SortArray.  Note that SortArray is used by sort() and redimension().  (Prior to 20.11.0, was controlled by result-prefetch-threads.)  Default: 1.

replication-receive-queue-size (optional)ClusterSciDB InstanceThe maximum size – in number of chunks – of the receive queue for the replica chunks on each SciDB instance. Default: 16
replication-send-queue-size (optional)ClusterSciDB InstanceThe maximum size – in number of chunks – of the send queue for the replica chunks on each SciDB instance. Default: 4.
result-prefetch-queue-size (optional) ClusterQueryEliminated in 20.10.0. But see materialized-cache-size, chunk-load-look-ahead.
result-prefetch-threads (optional) ClusterSciDB InstanceEliminated in 20.10.0.  But see operator-job-threads.
sg-receive-queue-size (optional)ClusterQueryA limit – in number of chunks – on how much of any SciDB instance's receive queue can be occupied by the chunks of an individual query. Default: 8.
sg-send-queue-size (optional)ClusterQueryA limit – in number of chunks – how much of any SciDB instance's send queue the chunks of an individual query can occupy. Default: 16.
small-memalloc-size (optional)ClusterSciDB InstanceSmall allocation threshold size in bytes for glibc memory allocator, i.e. malloc(). malloc() treats all memory allocations larger than this size as "large" and pass through to Linux mmap. It has the same effect as the M_MMAP_THRESHOLD setting for malloc. Default: 268,435,456 bytes (256 MB).
smgr-cache-size (optional)ClusterSciDB InstanceSize of memory in MB allocated to the shared cache of array chunks. The cache is used only for the chunks belonging to persistent arrays. Default: 256 MB.