/
Configuring The secure_scan Operator

Configuring The secure_scan Operator

This section describes how to configure secured arrays and permissions arrays for installations that use the secure_scan operator.

This section describes the secure_scan operator integrated with the core SciDB database engine, not the deprecated external plugin version. If you are upgrading a cluster that used the plugin version, YOU MUST unload_library('secure_scan') BEFORE PERFORMING THE UPGRADE. See the Release Notes.

Secured Arrays

Secured arrays are ordinary stored arrays that are divided into datasets along a dataset dimension. By default the dataset dimension is called dataset_id, but other names can be configured (see below).

Best practice places the dataset dimension as the first dimension of the secured arrays' schema, and uses a chunk size of 1. For example,

<red:int16, green:int16, blue:int16>[dataset_id=0:*:0:1; x=0:1919:0:1920; y=0:1079:0:1080]


This way, each physical chunk of the secured array represents data from exactly one dataset, and SciDB does not need to filter individual chunks to separate cells from different datasets.

Since the public namespace is world-readable, secured arrays must reside in a non-public namespace. Ordinary cluster users should be granted list access (but not read access) to the secured array namespace. See Namespaces and Permissions .

There are three circumstances when a secure_scan user can access all datasets in a secured array:

  • User scidbadmin can read all of a secured array.

  • Any user with the admin role can read all of a secured array. See Roles .

  • Any user with read access to the secured array's namespace can read all arrays in that namespace.

Granting namespace read access to a user allows the user to read the entire array. This may be desirable for certain privileged users who are permitted full access to all datasets in the secured namespace. However, it is contrary to the motivation for using secure_scan in
the first place.

Permissions Arrays

A permissions array is a two-dimensional array with user and dataset dimensions. The first bool attribute in a permissions array cell determines access. Here is an example permissions array schema:

AFL% create array perms.permarray <allow:bool> [ user_id=-1:*; dataset_id=0:* ] ;

If allow is true in the cell at location {12,5}, then user 12 will be able to read dataset 5 when she calls secure_scan on any secured array bound to permissions array perms.permarray. (You can discover the user ids of particular users with the list('users') operator.)

The user id -1 is special. If the permissions array cell at {-1,7} is true, then dataset 7 is a public dataset accessible to all secure_scan users.

In SciDB release 22.5 and earlier, create permissions arrays with distribution replicated to save overhead for secure_scan calls that reference it:

AFL% create array perms.permarray <allow:bool> [ user_id=-1:*; dataset_id=0:* ] CON> distribution replicated;

Since permissions arrays are typically small, the storage cost of keeping them replicated is also small. If the permissions array is not stored replicated, secure_scan will replicate (but not store) it during each invocation. (In later releases, permissions arrays are cached in memory, so the performance penalty for not replicating is eliminated for most queries.)

Permissions arrays should reside in a namespace where only administrators have access.

Neither secured arrays nor permissions arrays may reside in the public namespace.

Permissions arrays may not be temp arrays. They should not be empty.

secure-scan-config Configuration Parameter

The secure_scan operator can work only on secured arrays that are bound to permissions arrays.

Secured arrays are bound to permissions arrays using the secure-scan-config SciDB configuration parameter in the cluster config.ini file. The value of secure-scan-config is a JSON object describing all permissions arrays and secured array bindings.

You must restart the SciDB cluster for any change to the secure-scan-config parameter to take effect. Changes made with _setopt will have no effect.

After restarting the cluster, run a few secure_scan test queries and examine the coordinator instance scidb.log file for errors. Some configuration errors can't be detected until secure_scan tries to use the permissions array binding.

Here is an example config.ini secure-scan-config setting:

  • The entire multi-line JSON string is enclosed in single quotes.

  • There are two sections, permissions and secured. Each section contains a JSON array of descriptor objects.

  • Descriptor objects contain an array entry with the fully qualified name of the referent secured array or permissions array.

  • The perms.permissions array on line 3 is the default permissions array. If a secure_scan secured array is not mentioned in the secured section, this permissions array will be used. There are no dataset-dim or user-dim qualifiers on this entry, so perms.permissions uses the default names for these dimensions, dataset_id and user_id.

  • The perms.sis_id permissions array on line 4 has a custom dataset dimension name, sis_id. Secured arrays that use this permissions array must having a matching dataset dimension name.

  • The perms.uid array on line 5 has a custom user dimension name, uid. The secure_scan operator will use this dimension rather than user_id to match against the calling user's actual id. You can customize either or both dimension names for each permissions array.

  • The secured section starts on line 6 and contains bindings from secured data arrays to permission arrays. When secure_scan is called on an array that doesn't have an entry in the secured section, secure_scan will use the default permissions array if one is defined, otherwise it will abort the query with an error.

  • Wildcarding the array name portion of a secured entry's fully qualified name using * (lines 7 and 9) gives all arrays in that namespace access using the permissions array named in the perm element.

  • Even if a namespace-wide binding is in effect, individual arrays within a namespace can always independently specify their own permissions array binding (lines 8 and 10).

  • No permissions array or secured array can be in the public namespace. Those secure-scan-config entries will be ignored.