This guide covers the common SciDB system administrative tasks required to provide high availability to your Enterprise Edition SciDB database. These include:

Before continuing, familiarize yourself with the SciDB terms in the glossary below.

Administrative Glossary

The following are key SciDB administrative concepts.

Redundancy Parameter

A configuration parameter that specifies the number of extra data copies to maintain in the SciDB cluster. You must set this parameter for SciDB to maintain extra copies of array data. The maximum value that you can set this to is one less than the number of servers in your cluster regardless of the number of instances. For example, for eight servers, you can set the parameter from 0–7. In this case, a setting of 7 makes SciDB store all array data on each server.

Reconnect-timeout Parameter

A configuration parameter that sets the time in seconds to wait before re-connecting to peer instances upon restart (default value: 3 seconds)

Liveness-timeout Parameter

A configuration parameter that sets the time in seconds to wait before declaring a silent instance as down (default value: 120 seconds)

Read Quorum

If a SciDB instance fails, you can still execute read-only queries, as long as you have a Read quorum. The Read quorum is based on the redundancy value, thus once you have more failures than the number of copies, SciDB cannot guarantee access to all of the chunks for an array. For example, if you set redundancy=1, and then 2 servers are affected by one or more instance failures, you no longer have a READ quorum.

Single Point of Failure

SciDB clusters have a single point of failure: the system catalog (your Postgres server) . If the Postgres server is unavailable, you can no longer use your cluster. Taking steps to mitigate such failure possibility is recommended. Postgres contains its own replication mechanisms: explore the Postgres documentation for details.