Managing SciDB Instances

This topic discusses the steps for provisioning, initializing, and eventually retiring sets of SciDB instances. 




This section assumes that you are running SciDB as a service.  See Running SciDB for details.

SciDB Instance Concepts and Terms 

To effectively manage a cluster of SciDB instances, you must understand the difference between configuration, membership, and liveness.

  • Configuration – refers to server machines and SciDB instances that the system catalog knows about.  It is added to the catalog by the scidbctl.py init-cluster command during cluster initialization, based on the contents of the config.ini file.  See Configuring SciDB for details.
  • Membership  –  refers to the subset of configured SciDB instances which currently participate in query processing. An instance in the configuration might not belong to the current membership: it might have been taken offline administratively. SciDB provides tools for updating and tracking the current membership. (When you issue scidbctl.py start-server ... , SciDB starts all of the instances specified in config.ini, but only the instances in the current membership will process client queries. Note that scidbctl.py stop-server ... stops all instances in config.ini, but does not affect either the membership or the configuration.)
  • Liveness refers to the online state of the SciDB instances in the current membership. The Paradigm4 system plugin implements a scalable, peer-to-peer keepalive mechanism between instances within the SciDB cluster. An instance is considered online if the instance process is running and successfully exchanging messages with other SciDB instances. When a member instance is added, removed, started, stopped, or crashes, other instances detect and propagate a liveness change event. Soon thereafter, all instances converge on a consistent view of liveness for the current membership. If the set of live instances is smaller than the set of active members, SciDB's level of service degrades to read-only mode. All in-flight transactions abort and a rollback reverts any pending changes to array storage and the system catalog, restoring the database to a prior consistent state.

Initializing an Entire Installation

In the examples the follow, text in angle brackets <> denotes values you supply.  For example, for <cluster_name> you might substitute prod1 .

To initializing a SciDB installation, do the following:

  1. Create a config.ini file containing a list of Server IDs, and for each server, a list of Server Instance ID entries. See Configuring SciDB for details.
  2. Initialize the installation by provisioning the SciDB instances and adding them to the Membership by running:

    $ scidbctl.py --config <config.ini> init-cluster <cluster_name>


    The scidbctl.py init-cluster command performs all necessary setup tasks, checks that referenced directories exist, that links to the executable are correct, and that file permissions are set appropriately. 

     

  3. Push the configuration to and enable the SciDB service on all servers by running:

    $ scidbctl.py --config <config.ini> register-service <cluster_name>

    Steps 1,2 and 3 are usually part of the installation process and can be repeated.

  4. Start the SciDB instances:

    $ scidbctl.py --config <config.ini> start-server <cluster_name>
  5. Load the P4 system library, which contains the system management operators:

    $ iquery -aq "load_library('system')"
  6. To confirm that all the instances are alive, run:

    $ until iquery -naq 'sync()' > /dev/null 2>&1; do sleep 1; done

Adding Additional Instances

Adding instances to an existing SciDB installation is a three stage process.

  • Create a delta file to specify the new servers and server instances you're adding to the configuration. 
  • Use the delta file to modify the configuration by provisioning the new instances, configuring the data directories and registering the new instances with the SciDB catalog (but not yet activating them).
  • Bring the new instances online and incorporate them into the membership.

To add additional SciDB Instances, do the following:

  1. Create a configuration delta file for the new servers and instances.  The delta file should contain one or more server entries specifying server number, host, and instance numbers.  The format is

    [<cluster_name>]
    server-<X>=<IP_or_HOST>,0-n,m-p,q-s, ...
    

    To make particular instances use specific data directories, the delta file can optionally contain lines formatted

    data-dir-prefix-<X>-<Y>=<per_instance_dir>

    where <X> is the server number as before, and <Y> is one of the values in the ranges 0-n, m-p, q-s, etc., that is, the server instance id numbers listed in the server entry.

  2. Add the delta file config_add_delta.ini to the configuration as follows:

    $ scidbctl.py --config <old_config.ini> config-server --add config_add_delta.ini --output <new_config.ini> <cluster_name>

    This generates the new_config.ini file and provisions, configures, and registers the new instances.  Move the new config file into place, keeping the old one as a backup.  (We'll continue to use <new_config.ini> and <old_config.ini> in this example, just to be explicit.)
     

  3. Start each new instance in the configuration:

    $ scidbctl.py --config <new_config.ini> start-server --server-id <server_id> --only <new_server_instance_ids> <cluster_name>

    Here, <new_server_instance_ids> is a comma-separated list of server instance ids or ranges of server instance ids.  Issue this command once for each server line in the config_add_delta.ini file, specifying only the server instance ids for that server line.  (The --only option was formerly called --filter .)

    For a list of all SciDB instances, run:

    $ iquery -aq "list_instances()" 
  4. Record the new configuration and reset each P4 service to use the new config file.

    $ scidbctl.py --config <old_config.ini> unregister-service <cluster_name>
    $ scidbctl.py --config <new_config.ini> register-service <cluster_name>

    The new instances are now part of the configuration, but they are not yet part of the membership.

  5. Adding the new instances to the membership allows them to process queries.  To add them, you must first determine their physical instance ids.   These are not the same as the server instance ids used in the various .ini files.  To find out the physical instance ids of the instances you have added, use a list_instances() query to examine the configuration:

    $ cat instinfo.afl
    project(
      apply(
        apply(list_instances(),
              (iid_hi, instance_id / 4294967296),
              (iid_lo, instance_id % 4294967296)),
          (phys_iid, 's' + string(iid_hi) + '-i' + string(iid_lo))),
      server_id,
      server_instance_id,
      membership,
      host,
      port,
      instance_id,
      phys_iid)
    $ cat instinfo.afl | iquery -o tsv:l -a
    server_id	server_instance_id	membership	host	port	instance_id	phys_iid
    0	0	member	host1.example.com	1239	0	s0-i0
    1	1	member	host1.example.com	1240	4294967297	s1-i1
    2	3	registered	host2.example.com	1242	8589934594	s2-i2
    2	2	registered	host2.example.com	1241	8589934595	s2-i3
    $

      Use the "instance_id" column.  If you have just added instances to the configuration using the entry server-2-2=host2.example.com,2-3 then you can add them to the membership using

    $ iquery -aq "add_instances(8589934594, 8589934595)"

    (The "phys_id" column shows a representation of the physical instance_id split into its high and low parts.  SciDB error messages use this notation to indicate which instance raised the error.  For an explanation of instance_id vs. server_instance_id, see Finding the instance directory (Mapping Instance IDs.)

  6. Verify that the instances are now in the membership.

    $ until iquery -naq 'sync()' > /dev/null 2>&1; do sleep 1; done
    $ iquery -aq "list_instances()" 

    Now the newly added instances should be listed as members.

  • Optional: copy any existing arrays to the new membership as a manual step.  This rebalances the array data across the new membership.

    $ iquery -naq "store ( Array, Array_tmp ); remove ( Array ); rename ( Array_tmp, Array )"


    The newly-added instance(s) can now participate in queries involving Array.

Detecting Dead Instances or Entire Dead Servers

Instances may be temporarily dead due to power or network outages, software defects, or administrative actions. They might be permanently dead due to hardware failures or similar events. 

 To list dead instances, run the following query: 

$ iquery -aq "filter(list_instances(), membership='member' and liveness=false)" 


Removing Dead Instances or Entire Dead Servers

Recovering data from dead instances is easy if the cluster was configured with a redundancy value greater than the number of servers hosting the dead instances.  In that case, the cluster can still operate in read-only mode, and you can access and save the data.  However, non-zero redundancy has a cost in both disk space and in time to store results.

If you did not configure redundancy beforehand, you must rely on some SciDB-external means of recovering the data, such as backups or highly available vdisks.  Enterprise Edition customers should consult with Paradigm4 concerning data recovery strategies.

To remove unwanted instances, do the following:

  1. Stop the instances (or entire servers) you are removing (if they are still running).

    $ scidbctl.py --config <old_config.ini> stop-server --server-id <server_id> --only <dead_server_instance_ids> <cluster_name>
  2. Get the list of dead instances:

    $ iquery -aq "filter(list_instances(), membership='member' and liveness=false)" 
  3. Remove the dead/offline instances from the membership, all at once : 

    $ iquery -aq "remove_instances(<instance_id_list>)" 

     

  4. Wait for the new membership to settle:

    $ until iquery -naq 'sync()' > /dev/null 2>&1; do sleep 1; done

     

  5. If your cluster has redundancy configured you can copy degraded arrays to the new membership to make them writable again, but you should delay doing so until any replacement instance(s) are online.

    $ iquery -naq "store ( oldArray, newArray ); remove ( oldArray ); rename ( newArray, oldArray )"


    The following query lets you identify arrays you may want to migrate to the new membership for maintaining accessibility and/or the desired redundancy level. For each array in the system the output contains:

    • its redundancy (redundancy_num),
    • the number of instances to which the array is distributed, i.e. its residency (num_in_residency)
    • the number of instances from the residency which are currently active members (num_in_membership)
    • the number of the active members which are alive (num_live).


    $ cat find_migratable.afl
    join(
      redimension(
        cross_join(
          list_array_residency(),
          redimension(
            project(
              apply(list_instances(),
                    (mem, iif(membership='member', int64(1), int64(0))),
                    (iid, int64(instance_id)),
                    (liv, iif(liveness, int64(1), int64(0)))),
              iid, mem, liv),
            <mem:int64, liv:int64>[iid=0:*,4611686018427,0]),
          instance_id, iid),
        <redundancy_num:int64,
         num_in_residency:uint64,
         num_in_membership:int64,
         num_live:int64>[array_id=0:*,1000000,0],
        min(redundancy) as redundancy_num,
        count(redundancy) as num_in_residency,
        sum(mem) as num_in_membership,
        sum(liv) as num_live),
      redimension(
        filter(list('arrays',true), uaid=aid),
        <name:string> [uaid=0:*,1000000,0]))
    $ cat find_migratable.afl | iquery -a
    ...
    
    



  6. Create a configuration delta file listing the instances that should be removed.  The delta file should contain one or more server entries specifying server number, host, and instance numbers.  The format is:

    server-<X>=<IP_or_HOST>,0-n,m-p,q-s, ...
  7. Remove the deleted SciDB instances from the configuration using the delta file.

    $ scidbctl.py --config <old_config.ini> config-server --remove config_remove_delta.ini --output <new_config.ini> <cluster_name> 


    This generates t
    he new_config.ini file and unregisters the dead instances.

    IMPORTANT: You must remove the instances specified in config_remove_delta.ini from the membership, and the instances must not contain resident arrays.


  8.  Record the new configuration and reset each P4 service to use the new config file:

    $ scidbctl.py --config <new_config.ini> register-service <cluster_name>