Configuring SciDB
SciDB configuration is achieved through a config.ini file.
File Format
The SciDB configuration file uses the INI file format. The values assigned to the keys in config.ini should contain only upper and lower case letters, numbers, and the following characters: .,/-
_ (period, comma, forward slash, hyphen, underscore).
Basic SciDB Configuration
This table describes the basic configuration file settings for SciDB.
Key | Scope | Value |
---|---|---|
[<clustername>] | Cluster | Name of the SciDB cluster. The cluster name must appear as a section heading in the config.ini file, e.g. [cluster1] To avoid possible problems, using all lowercase for the cluster name is recommended. (The square brackets are literal. <clustername> is the name you choose.) |
db-user | Cluster | User name to use in the catalog connection string. This example uses test1user. To avoid possible problems, using all lowercase for the db_user parameter is recommended. |
install-root | Cluster | Full path to the SciDB installation directory. |
logconf | Cluster | Full path to the log4xx logging configuration file. |
pluginsdir | Cluster | Full path to the SciDB plugins directory containing all server plugins. |
requests | Cluster | The maximum number of client query requests queued for execution on any given instance. Any requests in excess of the limit return to the client with an error. The default value is 1,000. |
security | Cluster | Sometimes called the "security mode", this parameter can have one of these values:
See Enabling Security Mode for a complete description of how to switch between security modes. |
server-N | Server | The host name or IP address of server N, where N = 0, 1, 2, ..., followed by a comma, followed by the index of the last instance to launch on the server. Instance indices are zero-based. For example, a server directive
says that server-2 has a hostname of host.example.com and will launch four SciDB instances, with instance numbers 0 through 3. Note that the total number of instances on all servers can not exceed the maximum number of Postgres connections allowed. This number is specified in the postgressql.conf file. In releases prior to 15.12, the meaning of the number after the comma was different for server-1 and higher. For server-0 the number was as described above, but for other servers it represented a count rather than an index, and instances were numbered starting from one rather than from zero.
It specifies a set of instances with instance indices 0-n,m-p,q-s on server N identified by IP or Hostname where n,m,p,q,s are positive integers such that 0 <= n < m <= p < q <= s. |
Cluster Configuration
This table describes the cluster configuration file parameters and how to set them.
Key | Scope | Value |
---|---|---|
base-path | Cluster | The root data directory for each SciDB instance. Each SciDB instance uses an enumerated data directory below the base-path. The list('instances') command shows all instances and their data directories for a running SciDB cluster. |
base-port (optional) | Cluster | Base port number. The SciDB instances communicate via the TCP port = base-port + instance index. Clients can connect to any of the instances on their corresponding ports. The default base-port is 1239. |
data-dir-prefix (optional) | SciDB Instance | The SciDB administrator can provide file system directories for reference to multiple disks connected to a single server. The advantage to using the data-dir-prefix parameter is that you can arbitrarily assign physical storage and the filesystem locations to SciDB instances. For example, if there are 4 disks and 8 instances on server-0, your configuration could be as follows: data-dir-prefix-0-0=/datadisk1/myserver.000.0 You need not to specify this parameter for each instance. For any omitted instance, SciDB creates a folder using the default naming scheme. If you do specify a value for this parameter, you must ensure that the specified folder exists and that it is completely empty. Otherwise errors occur when you try to initialize SciDB. If a server has multiple storage disks, and you want to assign more than one instance to each disk, you must set the data-dir-prefix parameter for the instances on that server. In release 15.12 and later, instance numbers are always zero-based. In the example above, the eight instances on server-0 have |
io-paths-list (optional) | Cluster | A colon-separated list of absolute directory paths that non-administrative users are allowed to access. The list is empty by default. A typical setting might be Ordinarily, non-administrative users must load and save data using relative pathnames that refer to per-user subdirectories of the instance data directories. To allow load and save from absolute path names, those path names must be "covered" by entries on the io-paths-list. See File I/O Restrictions. |
key-file-list (optional) | Server | Comma-separated list of filenames that include keys for ssh authentication. Default: /home/scidb/.ssh/id_rsa and id_dsa. If you do not use the default name and path for the key, you must set a value for this parameter. |
low-disk-space-threshold-mb | Cluster | SciDB will go into read-only mode if the free disk space for any datastore device drops below this threshold. Default: 1024 MiB. If the low disk space threshold is crossed, a warning message is logged with the name of the device. A system administrator can free space or provision more space, and then restore normal operation using the lock_arrays operator. |
no-watchdog (optional) | Cluster | Set this to true to avoid automatic restart of the SciDB instance on an OS process failure. Default: false. By default, each SciDB instance spawns two Linux processes; one for the instance itself, and one as a watchdog. The watchdog process detects if the instance process fails. If it detects a failure, it forks a new, replacement process, to bring the instance back online. |
pg-port (optional) | Cluster | The listening TCP port of Postgres—the port on which Postgres accepts incoming connections. Default: 5432. |
redundancy (optional) | Cluster | Indicates the number of replications of array data. The value of this parameter specifies the number of extra copies of array data to store. Setting this parameter to a positive value is part of creating a fault-tolerant SciDB cluster. Default: 0 (meaning SciDB stores only one copy of data). The number of data copies equals 1 + the redundancy value. Maximum value for the redundancy parameter is one less than the number of servers in your SciDB cluster and is at most eight. The number and identities of the servers are specifies by the server-N configuration entry. You can change the value throughout the lifetime of a cluster. In SciDB versions 16.x and later, the redundancy refers to the data copies maintained by the servers in a SciDB cluster rather than by the instances. |
secure-scan-config | Cluster | Special configuration for the secure_scan operator. See Configuring The secure_scan Operator. |
ssh-port (optional) | Cluster | The TCP port ssh uses for communications within the cluster. Default:22. |
HTTP and HTTPS Configuration
Important HTTPS configuration parameters
Key | Scope | Default | Value |
---|---|---|---|
http-base-port | Cluster | 8239 | Port number for HTTP or HTTPS connections to the first SciDB instance. Subsequent instances use successive port numbers in sequence. This port uses HTTPS if SciDB has security mode enabled ( |
https-cert | Cluster | — | Path to an X.509 certificate file. This is required if SciDB is running with security mode enabled ( When you set this, you must also set See Configuring HTTPS access for instructions on creating an X.509 certificate or migrating one from an existing Shim installation. |
https-key | Cluster | — | Path to an X.509 private key file. If you are using a single file that contains both the certificate and the private key, this must be set to the same path as See Configuring HTTPS access for instructions on creating an X.509 certificate or migrating one from an existing Shim installation. |
Javascript security parameters (CORS)
In order for a Javascript application running in a browser to connect to SciDB, the server must authorize the browser to run Javascript from the origin site where the Javascript file is hosted. This is known as the CORS (Cross-Origin Resource Sharing) browser scripting security model.
If you have Javascript applications that you want to run against SciDB, you must set the http-allow-origins
configuration parameter to authorize the sites that host the Javascript files:
Key | Scope | Default | Value |
---|---|---|---|
http-allow-origins | Cluster | (none) | A JSON list of server URLs (origins) that are allowed to provide Javascript that a web browser can run on this SciDB cluster. For example: The default setting is empty, meaning that Javascript won't work against this cluster from a normal browser until this option is set. Use |
If you don’t have any applications that connect to SciDB using Javascript that runs in a user’s web browser, you don’t need to set this parameter.
Network parameters
In most cases, you shouldn’t need to set these. However, you might need to modify these parameters in a few cases:
if network security settings are preventing users from connecting to SciDB
if a user is connecting to SciDB from a browser or an application or that has specific security requirements
Key | Scope | Default | Value |
---|---|---|---|
dns-suffix | Cluster | (none) | DNS suffix for the subnet that the SciDB instances reside in. A client should be able to use If not provided, clients from outside of the subnet might not be able to access some URLs generated by SciDB. |
http-bind-hostname | Cluster |
|
|
http-auth-cookie-name | Cluster |
| Name of the authorization cookie used to continue an HTTP session. Prefix with |
http-auth-cookie-attributes | Cluster |
| Attributes for the authorization cookie issued by SciDB, separated with semicolons (;). See Using HTTP cookies - HTTP | MDN for possible values. These might need to be adjusted for web applications that connect to SciDB from certain browsers. |
http-headers | Cluster | (none) | A JSON structure specifying additional HTTP headers to include in every response. Extra headers might be needed to get past proxy servers, caches, or firewalls on the network, or to satisfy security requirements in a client application or browser. In the SciDB configuration file, the JSON must be surrounded by double or single quotes, and all lines except the first must be indented. For example, this is how a two-line configuration for http-headers='{"Cache-Control": "no-store",
"X-Scidb-Cluster-Name": "scidb-101"}' |
Timeout settings
You might need to adjust these parameters if queries or user sessions are timing out before they are finished.
Key | Scope | Default | Value |
---|---|---|---|
http-startup-timeout | Cluster | 20 | Maximum number of seconds to wait while attempting to start up the HTTP server listener. If this is exceeded, the port is likely blocked or busy running another listener. In general, you should not need to change this setting. |
http-connection-idle-timeout | Cluster | 600 | Timeout (in seconds) for disconnecting an inactive HTTP client. This just manages how long connections are cached in the connection pool — clients can transparently reconnect and resume their sessions without reauthenticating, as long as their authorization cookie is still valid (see |
http-auth-cookie-expiration | Cluster | 900 | Default time (in seconds) after which an authorization cookie expires. A client receives a new authorization cookie with a new expiration timestamp every time it interacts with the server; the only way for the client’s authorization to expire is if it doesn’t interact with the server at all for at least this long. |
http-idle-session-timeout | Cluster | 7200 | An HTTP session expires if the client doesn't interact with it for this many seconds and if no query is actively executing in the session. This means that interim query results and temporary arrays created in the session will be lost. |
http-access-control-max-age | Cluster | 3600 | Value for the This is only relevant for web applications running in Javascript in a user’s browser. |
http-default-lock-timeout | Cluster | 60 | Default timeout (in seconds) for the server to prepare a query on the HTTP API. If this timeout is exceeded, it is likely because another query has a lock on an array that the query needs to use. |
http-default-first-page-timeout | Cluster | 3600 | Default timeout for the server to return the first page of results from a query on the HTTP API, measured in seconds since the query was prepared (for non-paged queries) or since the client requested the page (for paged queries). This timeout includes query execution and fetching the first page of results. |
http-default-next-page-timeout | Cluster | 600 | Default timeout for the server to return a page other than the first, measured in seconds since the client requested the page. (This is set to a relatively small value because the fetch time for each page is expected to be small after the first page is fetched.) |
http-default-query-inactivity-timeout | Cluster | 300 | Amount of time that the client has to request the next page of query results before the query times out, measured in seconds since the last response was sent to the client. (This is deliberately set to a small value to ensure that the query gets cleaned up quickly if the SciDBR client crashes.) |
HTTP performance-related settings
Key | Scope | Default | Value |
---|---|---|---|
http-threads | Cluster | 2 | Number of HTTP server threads per instance. |
http-aio-save-buffer-size-bytes | Cluster | 1 MB | When the output format uses the |
http-aio-save-flush-bytes | Cluster | 5 MB | When the output format uses the |
Query Optimization Settings
Key | Scope | Default | Value |
---|---|---|---|
enable-optimize-pushdown | Cluster | true | Enables filter-pushdown and projection-pushdown |
|
|
|
|
Performance Configuration
This table describes the configuration file elements for tuning your system performance.
Key | Applies To | Value |
---|---|---|
admin-queries (optional) | SciDB Instance | The number of administrative queries you can schedule to execute in parallel. The number must be less than execution-threads. Default: 1. |
buffermgr-threads-write (optional) | SciDB Instance | Determines the total number of DB and TMP buffer writes which are processed concurrently. (Prior to 20.10.0, controlled by result-prefetch-queue-size). Default: 4. |
chunk-load-look-ahead (optional) | Query | Changes ChunkLoader::getPrefetch() which controls the maximum number of chunks read from a load file concurrently in one thread while those chunks are loaded into the system in another. (Prior to 20.10.0, controlled by result-prefetch-queue-size). SciDB text format loading has a known bug if set below 3. Default: 3. |
client-queries (optional) | SciDB Instance | The number of queries you can schedule to execute in parallel. The number must be less than (execution-threads - admin-queries). Default: execution-threads - admin-queries - 1. |
execution-threads (optional) | SciDB Instance | Size of thread pool available for query execution. Shared pool of threads used by all queries for doing IO and various other query execution tasks. Set this value to two more than the number maximum number of concurrent queries you want. This parameter value should exceed the value of result-prefetch-threads. Default: 5. |
large-memalloc-limit (optional) | SciDB Instance | Threshold limit on the maximum number of simultaneous large allocations for glibc malloc(). It has the same effect as the M_MMAP_MAX setting for malloc. Default: 65,536. Removed as of 23.x due to change to a different malloc library. |
max-memory-limit (optional) but required in practice | SciDB Instance | Roughly When a query and its operators attempt to allocate memory that would cause the total amount of memory malloced to exceed max-memory-limit the query is cancelled. Setting this to a value (somewhat smaller than) the total amount of physical memory per server / number of instances, usually avoids scidb or other processes from being killed by the kernel's OOM killer. Instead, the scidb query is cancelled. It has historically been set reasonably by our Customer Services group, who have experience determining how large “somewhat smaller than” needs to been in practice. Though scidb will run without it being set, in practice it should not be left unset in a production environment. Default: unlimited. |
materialized-cache-size (optional) | SciDB Instance | Sets MaterializedArray::_cacheSize. (Prior to 20.10.0, controlled by result-prefetch-queue-size.) Default: 4. |
mem-array-threshold (optional) | SciDB Instance | Maximum size in MB of temporary data to cache in memory before writing to temporary disk files. Default: 1024 MB. |
merge-sort-buffer (optional) | Thread | Size of memory buffer used for sorting, in MB. Default: 512 MB. |
operator-job-queue-threads (optional) | Query | Controls the amount of concurrent processing by load(), and input() and SortArray. Note that SortArray is used by sort() and redimension(). (Prior to 20.11.0, was controlled by result-prefetch-threads.) Default: 1. |
replication-receive-queue-size (optional) | SciDB Instance | The maximum size – in number of chunks – of the receive queue for the replica chunks on each SciDB instance. Default: 16 |
replication-send-queue-size (optional) | SciDB Instance | The maximum size – in number of chunks – of the send queue for the replica chunks on each SciDB instance. Default: 4. |
result-prefetch-queue-size (optional) | Query | Eliminated in 20.10.0. But see materialized-cache-size, chunk-load-look-ahead. |
result-prefetch-threads (optional) | SciDB Instance | Eliminated in 20.10.0. But see operator-job-threads. |
sg-receive-queue-size (optional) | Query | A limit – in number of chunks – on how much of any SciDB instance's receive queue can be occupied by the chunks of an individual query. Default: 8. |
sg-send-queue-size (optional) | Query | A limit – in number of chunks – how much of any SciDB instance's send queue the chunks of an individual query can occupy. Default: 16. |
small-memalloc-size (optional) | SciDB Instance | Small allocation threshold size in bytes for glibc memory allocator, i.e. malloc(). malloc() treats all memory allocations larger than this size as "large" and pass through to Linux mmap. It has the same effect as the M_MMAP_THRESHOLD setting for malloc. Default: 268,435,456 bytes (256 MB). Removed as of 23.x due to change to a different malloc library. |
smgr-cache-size (optional) | SciDB Instance | Size of memory in MB allocated to the shared cache of array chunks. The cache is used only for the chunks belonging to persistent arrays. Default: 256 MB. |