Configuring SciDB
SciDB configuration is achieved through a config.ini file.
File Format
The SciDB configuration file uses the INI file format. The values assigned to the identifiers in config.ini should contain only upper and lower case letters, numbers, and the following characters: .,/-
_ (period, comma, forward slash, hyphen, underscore).
Basic SciDB Configuration
This table describes the basic configuration file settings for SciDB.
Key | Scope | Value |
---|---|---|
Cluster name | Cluster | Name of the SciDB cluster. The cluster name must appear as a section heading in the config.ini file, e.g. [cluster1] To avoid possible problems, using all lowercase for the cluster name is recommended. |
security | Cluster | Sometimes called the "security mode", this parameter can have one of these values:
See Enabling Security Mode for a complete description of how to switch between security modes. |
server-N | Server | The host name or IP address of server N, where N = 0, 1, 2, ..., followed by a comma, followed by the index of the last instance to launch on the server. Instance indices are zero-based. For example, a server directive
says that server-2 has a hostname of host.example.com and will launch four SciDB instances, with instance numbers 0 through 3. Note that the total number of instances on all servers can not exceed the maximum number of Postgres connections allowed. This number is specified in the postgressql.conf file. In releases prior to 15.12, the meaning of the number after the comma was different for server-1 and higher. For server-0 the number was as described above, but for other servers it represented a count rather than an index, and instances were numbered starting from one rather than from zero.
It specifies a set of instances with instance indices 0-n,m-p,q-s on server N identified by IP or Hostname where n,m,p,q,s are positive integers such that 0 <= n < m <= p < q <= s |
db_user | Cluster | User name to use in the catalog connection string. This example uses test1user. To avoid possible problems, using all lowercase for the db_user parameter is recommended. |
install_root | Cluster | Full path to the SciDB installation directory. |
pluginsdir | Cluster | Full path to the SciDB plugins directory containing all server plugins. |
logconf | Cluster | Full path to the log4xx logging configuration file. |
requests | Cluster | The maximum number of client query requests queued for execution on any given instance. Any requests in excess of the limit return to the client with an error. The default value is 1,000. |
Cluster Configuration
This table describes the cluster configuration file parameters and how to set them.
Key | Scope | Value |
---|---|---|
base-path | Cluster | The root data directory for each SciDB instance. Each SciDB instance uses an enumerated data directory below the base-path. The list('instances') command shows all instances and their data directories for a running SciDB cluster. |
base-port (optional) | Cluster | Base port number. The SciDB instances communicate via the TCP port = base-port + instance index. Clients can connect to any of the instances on their corresponding ports. The default base-port is 1239. |
data-dir-prefix (optional) | SciDB Instance | The SciDB administrator can provide file system directories for reference to multiple disks connected to a single server. The advantage to using the data-dir-prefix parameter is that you can arbitrarily assign physical storage and the filesystem locations to SciDB instances. For example, if there are 4 disks and 8 instances on server-0, your configuration could be as follows: data-dir-prefix-0-0=/datadisk1/myserver.000.0 You need not to specify this parameter for each instance. For any omitted instance, SciDB creates a folder using the default naming scheme. If you do specify a value for this parameter, you must ensure that the specified folder exists and that it is completely empty. Otherwise errors occur when you try to initialize SciDB. If a server has multiple storage disks, and you want to assign more than one instance to each disk, you must set the data-dir-prefix parameter for the instances on that server. In release 15.12 and later, instance numbers are always zero-based. In the example above, the eight instances on server-0 have |
io-paths-list (optional) | Cluster | A colon-separated list of absolute directory paths that non-administrative users are allowed to access. The list is empty by default. A typical setting might be Ordinarily, non-administrative users must load and save data using relative pathnames that refer to per-user subdirectories of the instance data directories. To allow load and save from absolute path names, those path names must be "covered" by entries on the io-paths-list. See File I/O Restrictions. |
pg-port (optional) | Cluster | The listening TCP port of Postgres—the port on which Postgres accepts incoming connections. Default: 5432. |
redundancy (optional) | Cluster | Indicates the number of replications of array data. The value of this parameter specifies the number of extra copies of array data to store. Setting this parameter to a positive value is part of creating a fault-tolerant SciDB cluster. Default: 0 (meaning SciDB stores only one copy of data). The number of data copies equals 1 + the redundancy value. Maximum value for the redundancy parameter is one less than the number of servers in your SciDB cluster and is at most eight. The number and identities of the servers are specifies by the server-N configuration entry. You can change the value throughout the lifetime of a cluster. In SciDB versions 16.x and later, the redundancy refers to the data copies maintained by the servers in a SciDB cluster rather than by the instances. |
ssh-port (optional) | Cluster | The TCP port ssh uses for communications within the cluster. Default:22. |
key-file-list (optional) | Server | Comma-separated list of filenames that include keys for ssh authentication. Default: /home/scidb/.ssh/id_rsa and id_dsa. If you do not use the default name and path for the key, you must set a value for this parameter. |
no-watchdog (optional) | Cluster | Set this to true to avoid automatic restart of the SciDB instance on an OS process failure. Default: false. By default, each SciDB instance spawns two Linux processes; one for the instance itself, and one as a watchdog. The watchdog process detects if the instance process fails. If it detects a failure, it forks a new, replacement process, to bring the instance back online. |
Performance Configuration
This table describes the configuration file elements for tuning your system performance.
Key | Scope | Applies To | Value |
---|---|---|---|
mem-array-threshold (optional) | Cluster | SciDB Instance | Maximum size in MB of temporary data to cache in memory before writing to temporary disk files. Default: 1024 MB. |
smgr-cache-size (optional) | Cluster | SciDB Instance | Size of memory in MB allocated to the shared cache of array chunks. The cache is used only for the chunks belonging to persistent arrays. Default: 256 MB. |
max-memory-limit (optional) | Cluster | SciDB Instance | The hard-limit maximum amount of memory in MB that the SciDB instance can allocate. If the instance requests more memory from the operating system the allocation fails with an exception. Default: No limit. |
merge-sort-buffer (optional) | Cluster | Thread | Size of memory buffer used for sorting, in MB. Default: 512 MB. |
small-memalloc-size (optional) | Cluster | SciDB Instance | Small allocation threshold size in bytes for glibc memory allocator, i.e. malloc(). malloc() treats all memory allocations larger than this size as "large" and pass through to Linux mmap. It has the same effect as the M_MMAP_THRESHOLD setting for malloc. Default: 268,435,456 bytes (256 MB) |
large-memalloc-limit (optional) | Cluster | SciDB Instance | Threshold limit on the maximum number of simultaneous large allocations for glibc malloc(). It has the same effect as the M_MMAP_MAX setting for malloc. Default: 65,536. |
replication-receive-queue-size (optional) | Cluster | SciDB Instance | The maximum size – in number of chunks – of the receive queue for the replica chunks on each SciDB instance. Default: 16 |
replication-send-queue-size (optional) | Cluster | SciDB Instance | The maximum size – in number of chunks – of the send queue for the replica chunks on each SciDB instance. Default: 4 |
sg-receive-queue-size (optional) | Cluster | Query | A limit – in number of chunks – on how much of any SciDB instance's receive queue can be occupied by the chunks of an individual query. Default: 8 |
sg-send-queue-size (optional) | Cluster | Query | A limit – in number of chunks – how much of any SciDB instance's send queue the chunks of an individual query can occupy. Default: 16 |
execution-threads (optional) | Cluster | SciDB Instance | Size of thread pool available for query execution. Shared pool of threads used by all queries for doing IO and various other query execution tasks. Set this value to two more than the number maximum number of concurrent queries you want. This parameter value should exceed the value of result-prefetch-threads. Default: 5. |
admin-queries (optional) | Cluster | SciDB Instance | The number of administrative queries you can schedule to execute in parallel. The number must be less than execution-threads. Default: 1. |
client-queries (optional) | Cluster | SciDB Instance | The number of queries you can schedule to execute in parallel. The number must be less than (execution-threads - admin-queries). Default: execution-threads - admin-queries - 1. |
operator-threads (optional) | Cluster | Query | Number of threads used per query. Limit the number of threads allocated per (multi-threaded) operator in a query. If operator-threads is unspecified, SciDB automatically detects the number of CPU cores and uses that value. When running multiple instances on each server, you must set operator-threads lower than the number of CPU cores since multiple SciDB instances share the same set of CPU cores. Default: Number of CPU cores. |
result-prefetch-threads (optional) | Cluster | SciDB Instance | Per-instance threads available for certain asynchronous tasks including data sorting. Default: 4. |
result-prefetch-queue-size (optional) | Cluster | Query | Per-query number of asynchronous tasks that can be submitted to the prefetch thread-pool in parallel. Default: 4. |