Troubleshooting

MPI Issues

MPICH is a high performance and widely portable implementation of the Message Passing Interface (MPI) standard. SciDB depends upon the installation and configuration of mpich2-1.2. This section provides a checklist for ensuring that MPICH is communicating with SciDB.

  • If you encounter any errors or problems related to MPI, try running mpi_init(). For details, see mpi_init.
  • SSH connectivity must be set for the scidb user from each server to 0.0.0.0, 127.0.0.1, localhost, and all the servers. For each pair of servers:
    • Make sure that you copy the authorization key. If you need assistance, contact Paradigm4 support. 
    • Log in once: ssh scidb@<server>. Answer yes when prompted to continue.
    • Confirm it works by running: scidb@<server> again. A shell prompt appears on the server. (If not, SSH is not yet set up on this server).
  • You must configure DNS on all SciDB servers. In particular, verify that each server is able to resolve every other server's host name to the correct IP.
  • If a host has a static IP, replace 127.0.1.1 in the /etc/hosts file with the static IP. In general, the /etc/hosts file should not contain multiple IPs mapped to the same name. For example, consider the following section of the /etc/hosts file (for a machine with hostname test-u1204-c2-vm1):

    127.0.0.1 localhost
    127.0.1.1 test-u1204-c2-vm1

     

    The second line can confuse MPI. Comment it out or enter the actual static IP for test-u1204-c2-vm1. The practice of using 127.0.1.1 is specific to Ubuntu. For details, see the Ubuntu documentation's Debian Reference.

  • Configuration with multiple Network Interface Cards (NICs), such as eth0, eth1, and so on, has not been tested. Disabling all but one of your NICs is recommended. If you cannot do this, make sure that the DNS names resolve to the correct IPs.
  • Make sure there is enough shared memory available. On each SciDB server, run the following command to see your shared memory usage:

    $ df -h /dev/shm


    The output appears similar to the following:

    Filesystem    Size   Used    Avail   Use%   Mounted on
    none              3.8G   320K    3.8G    1%       /run/shm


    SciDB requires available shared memory of at least 512 MB * #instances_per_host. You can change the size of shared memory by adding a line to the /etc/fstab file:

    # shared memory device
    none      /dev/shm         tmpfs    defaults,size=48G   0 0


    Running out of the shared memory from /dev/shm usually manifests itself by the SciDB process being killed with the SIGBUS signal as reported in the SciDB error log (scidb-stderr.log):

    2013-5-13 23:17:10 (ppid=23581): Started.
    2013-5-14 0:2:0 (ppid=23581): SciDB child (pid=23604) terminated by signal = 7, core dumped

     

    Set the max-memory-limit parameter in your SciDB config.ini file sufficiently high to enable the SciDB processes to allocate the necessary shared memory. Remember that the max-memory-limit parameter is a per-instance (i.e. per process) parameter.

Configuration Issues

When setting up the config.ini file, using absolute IP addresses (rather than localhost) is recommended whenever possible, especially on multi-server SciDB installations.

Out of Memory Issues

If you encounter out-of-memory issues, try changing the value for the max-memory-limit configuration parameter.

Other Issues

If you notice odd behavior or any performance degradation, try restarting your SciDB cluster. From the scidb user account, run the following commands from your Linux prompt to restart (replace scidb_cluster_name with the actual name of your SciDB cluster):

$ scidb.py stopall scidb_cluster_name

 

After the cluster stops, start it again:

$ scidb.py startall scidb_cluster_name

 

Â