Collecting log files with `scidbctl.py collect-diags`

The scidbctl.py script can collect SciDB log files and other diagnostic information into a tar(1) archive that can be sent to Paradigm4 for analysis. Use the -h option to see the syntax for this subcommand:

[$ scidbctl.py collect-diags -h usage: scidbctl.py collect-diags [-h] [-l] [cluster] positional arguments: cluster SciDB cluster name. Must name a section in the config.ini file (see -c/--config option). If not specified, use SCIDB_NAME environment variable if set, else use the first cluster in config.ini. optional arguments: -h, --help show this help message and exit -l, --light Skip large objects such as binaries and core files $

If you are running SciDB as a service, you must also specify the location of the SciDB configuration file, for example scidbctl.py --config /opt/scidb/23.10/service/config-0-mydb collect-diags.

The collect-diags subcommand collects the following information:

  • All scidb.log* files from all instances. At present there is no ability to select a time range.

  • Contents of the etc and share subdirectories under /opt/scidb/23.10.

  • A “system report” that includes output from the following commands:

    "sysctl -a", # Kernel parameters "ip a", # Interfaces and addresses "netstat -i", # NIC statistics "netstat -r -n", # Routes "arp -an", # ARP cache "vmstat -s", # Memory statistics "vmstat -a", # Active/inactive memory "sudo vmstat -m", # Slab stats "vmstat -d -w", # Disk usage "dmesg", # Kernel ring buffer
  • MD5 checksums of all files in the base installation directory, for example /opt/scidb/23.10.

  • Stack traces of all currently running SciDB instances (if Gdb is installed and gstack(1) is available).

  • Stack traces of any core* files found in the instance data directories.

  • Unless the -l/--light option is specified, the core* files themselves will also be collected.

Current best practice is to configure creation of SciDB core dumps in the /var/crash/scidb directory. scidbctl.py collect-diags will not find these core files.

A sample collect-diags run on a very small cluster looks like this:

$ scidbctl.py collect-diags [scidbctl] Collecting diagnostics at 2024-03-22T220505 [scidbctl-0-0-mydb] Producing diagnostics... [scidbctl-0-1-mydb] Producing diagnostics... [scidbctl-0-1-mydb] Diagnostics generated in <datadir>/diags/2024-03-22T220505 [scidbctl-0-0-mydb] Tracing stack for running SciDB pid 215092 ... [scidbctl-0-0-mydb] Tracing stack for running SciDB pid 215093 ... [scidbctl-0-0-mydb] Diagnostics generated in <datadir>/diags/2024-03-22T220505 [scidbctl] Gathering collected diagnostics [scidbctl] Diagnostics in /data/scidb/0/0/diags/all-2024-03-22T220505.tar $

Each individual instance’s diagnostics are placed into a compressed tar archive, and those are placed in turn into an uncompressed all-cluster archive (since running compression twice can actually expand data):