/
Configuring SciDB

Configuring SciDB

SciDB configuration is achieved through a config.ini file.

File Format

The SciDB configuration file uses the INI file format. The values assigned to the identifiers in config.ini should contain only upper and lower case letters, numbers, and the following characters:  .,/-_  (period, comma, forward slash, hyphen, underscore).

Basic SciDB Configuration

This table describes the basic configuration file settings for SciDB.

Key ScopeValue
Cluster nameClusterName of the SciDB cluster. The cluster name must appear as a section heading in the config.ini file, e.g. [cluster1] To avoid possible problems, using all lowercase for the cluster name is recommended.
securityCluster

Sometimes called the "security mode", this parameter can have one of these values:

  • trust means that user authorization, or "namespaces mode", is disabled.
  • password means that password-based user authentication is required to use the cluster.
  • pam means that pluggable authentication modules will be used for user authentication.

See Enabling Security Mode for a complete description of how to switch between security modes.

server-NServer

The host name or IP address of server N, where N = 0, 1, 2, ..., followed by a comma, followed by the index of the last  instance to launch on the server.  Instance indices are zero-based.  For example, a server directive

server-2=host.example.com,3

says that server-2 has a hostname of host.example.com and will launch four SciDB instances, with instance numbers 0 through 3.

Note that the total number of instances on all servers can not exceed the maximum number of Postgres connections allowed.  This number is specified in the postgressql.conf file.


In releases prior to 15.12, the meaning of the number after the comma was different for server-1 and higher.  For server-0 the number was as described above, but for other servers it represented a count rather than an index, and instances were numbered starting from one rather than from zero.

The full format of this setting is:

server-N=IP|Hostname,[n,]m-p,q-s, ...

It specifies a set of instances with instance indices 0-n,m-p,q-s on server N identified by IP or Hostname where n,m,p,q,s are positive integers such that 0 <= n < m <= p < q <= s


db_userClusterUser name to use in the catalog connection string. This example uses test1user.  To avoid possible problems, using all lowercase for the db_user parameter is recommended.
install_rootClusterFull path to the SciDB installation directory.
pluginsdir ClusterFull path to the SciDB plugins directory containing all server plugins.
logconf ClusterFull path to the log4xx logging configuration file.
requests ClusterThe maximum number of client query requests queued for execution on any given instance. Any requests in excess of the limit return to the client with an error. The default value is 1,000.

Cluster Configuration

This table describes the cluster configuration file parameters and how to set them.

Key ScopeValue
base-pathClusterThe root data directory for each SciDB instance. Each SciDB instance uses an enumerated data directory below the base-path. The list('instances') command shows all instances and their data directories for a running SciDB cluster.
base-port (optional)Cluster Base port number. The SciDB instances communicate via the TCP port = base-port + instance index. Clients can connect to any of the instances on their corresponding ports. The default base-port is 1239.
data-dir-prefix (optional)SciDB Instance

The SciDB administrator can provide file system directories for reference to multiple disks connected to a single server. The advantage to using the data-dir-prefix parameter is that you can arbitrarily assign physical storage and the filesystem locations to SciDB instances.

For example, if there are 4 disks and 8 instances on server-0, your configuration could be as follows:

data-dir-prefix-0-0=/datadisk1/myserver.000.0
data-dir-prefix-0-1=/datadisk2/myserver.000.1
data-dir-prefix-0-2=/datadisk3/myserver.000.2
data-dir-prefix-0-3=/datadisk4/myserver.000.3
data-dir-prefix-0-4=/datadisk1/myserver.000.4
data-dir-prefix-0-5=/datadisk2/myserver.000.5
data-dir-prefix-0-6=/datadisk3/myserver.000.6
data-dir-prefix-0-7=/datadisk4/myserver.000.7

You need not to specify this parameter for each instance. For any omitted instance, SciDB creates a folder using the default naming scheme. If you do specify a value for this parameter, you must ensure that the specified folder exists and that it is completely empty. Otherwise errors occur when you try to initialize SciDB.

If a server has multiple storage disks, and you want to assign more than one instance to each disk, you must set the data-dir-prefix parameter for the instances on that server.

In release 15.12 and later, instance numbers are always zero-based.  In the example above, the eight instances on server-0 have data-dir-prefix-<server>-<instance>=/path directives with instance numbers from zero to seven.  In prior releases, instance numbers for instances on servers other than server-0 were one-based.  See the description of the server-N configuration key above.

io-paths-list (optional)Cluster

A colon-separated list of absolute directory paths that non-administrative users are allowed to access.  The list is empty by default.  A typical setting might be /tmp:/dev/shm .

Ordinarily, non-administrative users must load and save data using relative pathnames that refer to per-user subdirectories of the instance data directories.  To allow load and save from absolute path names, those path names must be "covered" by entries on the io-paths-list.  See File I/O Restrictions.

 pg-port (optional) ClusterThe listening TCP port of Postgres—the port on which Postgres accepts incoming connections. Default: 5432.
redundancy (optional) Cluster

Indicates the number of replications of array data. The value of this parameter specifies the number of extra copies of array data to store. Setting this parameter to a positive value is part of creating a fault-tolerant SciDB cluster. Default: 0 (meaning SciDB stores only one copy of data).

The number of data copies equals 1 + the redundancy value. Maximum value for the redundancy parameter is one less than the number of servers in your SciDB cluster and is at most eight. The number and identities of the servers are specifies by the server-N configuration entry.

You can change the value throughout the lifetime of a cluster.

In SciDB versions 16.x and later, the redundancy refers to the data copies maintained by the servers in a SciDB cluster rather than by the instances.

ssh-port (optional) Cluster The TCP port ssh uses for communications within the cluster. Default:22.
key-file-list (optional) Server

Comma-separated list of filenames that include keys for ssh authentication. Default: /home/scidb/.ssh/id_rsa and id_dsa.

If you do not use the default name and path for the key, you must set a value for this parameter.

 no-watchdog (optional)ClusterSet this to true to avoid automatic restart of the SciDB instance on an OS process failure. Default: false.

By default, each SciDB instance spawns two Linux processes; one for the instance itself, and one as a watchdog. The watchdog process detects if the instance process fails. If it detects a failure, it forks a new, replacement process, to bring the instance back online.

Performance Configuration

This table describes the configuration file elements for tuning your system performance.

KeyScopeApplies ToValue
mem-array-threshold (optional)ClusterSciDB InstanceMaximum size in MB of temporary data to cache in memory before writing to temporary disk files. Default: 1024 MB.
smgr-cache-size (optional)ClusterSciDB InstanceSize of memory in MB allocated to the shared cache of array chunks. The cache is used only for the chunks belonging to persistent arrays. Default: 256 MB.
max-memory-limit (optional)ClusterSciDB InstanceThe hard-limit maximum amount of memory in MB that the SciDB instance can allocate. If the instance requests more memory from the operating system the allocation fails with an exception. Default: No limit.
merge-sort-buffer (optional)ClusterThread Size of memory buffer used for sorting, in MB. Default: 512 MB.
small-memalloc-size (optional)ClusterSciDB InstanceSmall allocation threshold size in bytes for glibc memory allocator, i.e. malloc(). malloc() treats all memory allocations larger than this size as "large" and pass through to Linux mmap. It has the same effect as the M_MMAP_THRESHOLD setting for malloc. Default: 268,435,456 bytes (256 MB)
large-memalloc-limit (optional)ClusterSciDB InstanceThreshold limit on the maximum number of simultaneous large allocations for glibc malloc(). It has the same effect as the M_MMAP_MAX setting for malloc. Default: 65,536.
replication-receive-queue-size (optional)ClusterSciDB InstanceThe maximum size – in number of chunks – of the receive queue for the replica chunks on each SciDB instance. Default: 16
replication-send-queue-size (optional)ClusterSciDB InstanceThe maximum size – in number of chunks – of the send queue for the replica chunks on each SciDB instance. Default: 4
sg-receive-queue-size (optional)ClusterQueryA limit – in number of chunks – on how much of any SciDB instance's receive queue can be occupied by the chunks of an individual query. Default: 8
sg-send-queue-size (optional)ClusterQueryA limit – in number of chunks – how much of any SciDB instance's send queue the chunks of an individual query can occupy. Default: 16

execution-threads (optional) ClusterSciDB InstanceSize of thread pool available for query execution. Shared pool of threads used by all queries for doing IO and various other query execution tasks. Set this value to two more than the number maximum number of concurrent queries you want. This parameter value should exceed the value of result-prefetch-threads. Default: 5.
admin-queries (optional)ClusterSciDB InstanceThe number of administrative queries you can schedule to execute in parallel. The number must be less than execution-threads. Default: 1.

client-queries (optional)

ClusterSciDB InstanceThe number of queries you can schedule to execute in parallel. The number must be less than (execution-threads - admin-queries). Default:  execution-threads - admin-queries - 1.
operator-threads (optional) ClusterQueryNumber of threads used per query. Limit the number of threads allocated per (multi-threaded) operator in a query. If operator-threads is unspecified, SciDB automatically detects the number of CPU cores and uses that value. When running multiple instances on each server, you must set operator-threads lower than the number of CPU cores since multiple SciDB instances share the same set of CPU cores. Default: Number of CPU cores.
result-prefetch-threads (optional) ClusterSciDB InstancePer-instance threads available for certain asynchronous tasks including data sorting. Default: 4.
result-prefetch-queue-size (optional) ClusterQueryPer-query number of asynchronous tasks that can be submitted to the prefetch thread-pool in parallel. Default: 4.






Related content

Configuring SciDB
Configuring SciDB
More like this
Starting and Stopping SciDB
Starting and Stopping SciDB
More like this
SciDB Administration Guide
SciDB Administration Guide
More like this