Introduction

Provisioning is the process by which the Delphix Engine creates a new, virtual copy of a data source.

To initiate a provision, a Delphix user first selects a snapshot of data on the Delphix Engine that they want to copy: this snapshot is called the parent snapshot for the provision. The Delphix Engine clones the data in the parent snapshot to create a new copy called a virtual dataset. A virtual dataset is cheap to make, fully readable and writable, and requires no extra storage to maintain until changes are made to it. It is referred to as "virtual" because most of the data that appears to belong to a virtual dataset actually belongs to its parent snapshot. 

At the same time as a Delphix user selects a parent snapshot, they also select a target environment for the provision. A target environment is an environment suitable for hosting a virtual copy of a data source.The Delphix Engine mounts the virtual dataset to the target environment after creating the virtual dataset.

The mounted virtual dataset is then configured on the target environment. Configuration is the process by which the Delphix Engine takes the raw copy of the data stored in the parent snapshot and transforms into a useful copy of the original data source. For a database, this process involves bringing data files back to consistency and administering the database so that it can accept queries.

After a successful provision, a virtual dataset mimics the original data source. It is accessible on the target environment, but all reads and writes performed against it will access mounted storage provided by the Delphix Engine.

Virtual Dataset Operations

The Delphix Engine provides storage for virtual datasets and therefore must have the ability to manage virtual datasets. Note that this management differs from that of data sources because the Delphix Engine does not provides storage for data sources.

The operations available to manage a virtual dataset are referred to collectively as virtual dataset operations. The Delphix Engines consumes these operations internally to provide Delphix Data-as-a-Service (DaaS) and intelligently handle advanced scenarios like Delphix Engine reboot, target environment reboot, etc. A subset of these operations are also available to Delphix users through the Delphix Management application (GUI), CLI and API.

During the provisioning process, source configs and repositories are used as follows:

  • The provisioning process requires a repository as input (provisioning targets a repository). The provisioning process results in the creation of a virtual dataset after the data has been copied and configured.
    The repository's corresponding data dependency is used during provisioning to perform configuration on the copied data.
  • The provisioning process creates a source config that corresponds to the new virtual dataset.

Glossary

TermDefinition
Virtual DatasetA dataset that has been provisioned from another dataset.
Virtual SourceAn object representing data about a virtual dataset.
Target EnvironmentAn environment hosting a virtual dataset.


Here is a summary of the various operations that may be performed on a virtual dataset that are relevant to the toolkit writer.

Operation

Summary

provisionInitial creation of a virtual dataset. A virtual dataset is provisioned from a snapshot of another dataset (which could be a dSource or a virtual dataset)
stopHalting of any use of the virtual dataset on the target environment. For example, this might involve stopping various processes that are interacting with the dataset.
start

Beginning of any use of the virtual dataset on the target environment. For example, this might involve starting DBMS processes to interact with the dataset.

disableRemoval of the dataset from the target environment.
enableAppearance of the dataset on the target environment.
rewind

Moving a dataset "back in time" so that it appears the same as it did at some point in the past.

refreshReplacing the dataset's contents with the contents of parent dataset.


Virtual Source Definition

Virtual dataset management is coordinated by a virtualSourceDefinition that is provided by the toolkit. The virtualSourceDefinition has two important parts:

  • Parameters– A specification of custom metadata required for this toolkit to manage virtual datasets.
  • Hooks– A collection of Lua scripts that coordinate operations that happen on the provisioned virtual dataset or the target host.

In their main.json file, Toolkits specify which parameters of which types are necessary. Users will fill in values for these parameters when they provision a new virtual dataset, and these values will be available to all of the hooks.

Here is an example of a virtualSourceDefinition that defines two parameters.

{
    "type""ToolkitVirtualSource",
    "parameters": {
        "type": "object",
        "properties": {
         "virtualDbName": {
             "type""string",
             "prettyName""Virtual DB Name",
             "description""The name of the virtual database to create.",
             "default""virtualDB"
         },
         "port": {
             "type""integer",
             "prettyName""Port",
             "description""The port to be used by the virtual database.",
             "default"1234
         }
        }
    }
}


Provisioning Hooks

Toolkits customize Delphix's data configuration and management by providing scripts for the following hook points:

HookInputOutputDescriptionPurpose
configure

resources, source, parameters, repository, snapshot

SourceConfig

Executed just after cloning the captured data and mounting it to a target environment. Specifically, this hook is run during provision and refresh, prior to taking the initial snapshot of the clone. This toolkit hook is run before the user-customizable Configure Clone and Before Refresh hooks are run. It must return a sourceConfig object that represents the new dataset.

Configure the data to be usable on the target environment. For database data files, this may mean recovering from a crash consistent format or backup. For application files, this may mean reconfiguring XML files or rewriting hostnames and symlinks.

unconfigureresources, source, parameters, repository, config, delete flagNoneExecuted when a dataset is about to be disabled. This includes cases when a currently-enabled data is about to be deleted.Preparing the target environment for the disappearance of the data. For example, this may involve "unregistering" the dataset from a DBMS.
reconfigure

resources, source, parameters, repository, config, snapshot

SourceConfigExecuted just after a dataset has been re-enabled on the target host. This includes the re-enabling that happens as part of a rewind. This is passed the current sourceConfig object, and must also return a sourceConfig object to represent the new status of the dataset. The passed-in object can be returned as-is if there is no need to make any changes to it.Configure the potentially-changed data to be usable on the target environment.
start

resources, source, parameters, repository, config

None/Error

Executed whenever the data should be placed in a "running" state. Specifically, this hook is run:

    • when you click the Start button in the Delphix admin application
    • when the vFiles is enabled from a previously disabled state
    • after a vFiles is rewound

Start any processes which should run on top of the mounted data, such as starting a DBMS

stop

resources, source, parameters, repository, config

None/Error

Executed whenever the data should be placed in a "stopped" state and unmounted. It is important that this hook stops all processes from accessing the mounted data; otherwise, subsequent unmount commands may fail. Specifically, this hook is run:

  • when you click the Stop button in the Delphix admin application
  • when the vFiles is disabled from a previously enabled state
  • when the vFiles is about to be refreshed, rewound, or deleted

Stop any processes which are running on top of the mounted data

preSnapshot

resources, source, parameters, repository, config

None/Error

Executed prior to taking a ZFS snapshot of the mounted data

Quiesce the data so it can be snapshotted.

Stage any files which should be included in the snapshot.

postSnapshot

resources, source, parameters, repository, config

Snapshot Metadata

Executed after taking a ZFS snapshot of the mounted data.

This toolkit hook is always run regardless of the success of the snapshot or preSnapshot hook.

If the toolkit has provided a snapshotSchema, then this hook needs to return a Lua table with values as specified there.

Undo any work done by the preSnapshot hook
status

resources, source, parameters, repository, config

"ACTIVE" or "INACTIVE"

Periodically executed to determine the state of the vFiles. The output of this script should be a single JSON string: "ACTIVE" or "INACTIVE." See Output from Lua Functions.

Errors are reported by returning a non-zero exit code from an executed Powershell or Bash script.

Alert Delphix users of data management problems before it affects end users

The output of the status script must be a JSON string: "ACTIVE" not ACTIVE.

Provisioning DelphixDB

This section walks through an example of defining virtual dataset behavior for a toolkit designed for the fictional DelphixDB.

Provision Parameters

Parameter NameDescriptionType
portPort that provisioned database should useInteger
dbNameName to use for newly provisioned databaseString

In the main.json file, the ToolkitVirtualSource will be:

{
    "type": "ToolkitVirtualSource",
    "parameters": {
		"type": "object",
		"properties": {
			"port": {
				"type": "integer"
				"prettyName": "Port",
				"description": "Port that provisioned database should use."
			},
			"dbName": {
				"type": "string",
				"prettyName": "Database Name",
				"description": "Name to use for newly provisioned database."
			}
		}
	}
}

Hooks

HookDelphixDB-Specific Steps
configure/reconfigure
  1. Reconfigure the configuration file with the specified port and database name.
  2. Register the database with the installation.
  3. Start the database by running "delphixdb start."
unconfigure
  1. Unregister the database with the installation
start
  1. Start the database by running "delphixdb start."
stop
  1. Stop the database by running "delphixdb stop."
preSnapshot
  1. Flush all pending writes to disk.
  2. Quiesce the database by running "delphixdb quiesce."
postSnapshot
  1. Unquiesce the database by running "delphixdb unquiesce."
status
  1. Check the status by running "delphixdb status."


Below are the Lua and bash scripts.

Shell scripts

cat > resources/reconfigure_config_file.sh <<EOF
# shell code ommitted for brevity
# config file is found at "$DATAPATH/config.txt"
# replace config file port with $PORT and database name with $DBNAME
EOF
 
cat > resources/register_database.sh <<EOF
$DELPHIXDB register $DATAPATH
EOF
 
cat > resources/start_database.sh <<EOF
$DELPHIXDB start $DBNAME
EOF
 
cat > resources/stop_database.sh <<EOF
$DELPHIXDB stop $DBNAME
EOF
 
cat > resources/flush_database.sh <<EOF
$DELPHIXDB flush $DBNAME
EOF
 
cat > resources/quiesce_database.sh <<EOF
$DELPHIXDB quiesce $DBNAME
EOF
 
cat > resources/unquiesce_database.sh <<EOF
$DELPHIXDB unquiesce $DBNAME
EOF
 
cat > resources/query_database_status.sh <<EOF
# Check if the output of status contains the string "running"
status=$($DELPHIXDB status $DBNAME)
if [[ $status == *"running"* ]]
then
    echo "\"ACTIVE\"" > $DLPX_OUTPUT_FILE
else
    echo "\"INACTIVE\"" > $DLPX_OUTPUT_FILE
fi
EOF

virtual/configure.lua

envMap = {
    DELPHIXDB   = repository.installationPath,
    DATAPATH    = source.dataDirectory,
    PORT        = parameters.port,
    DBNAME      = parameters.dbName
}
RunBash {
    environment     = source.environment,
    user 			= source.environmentUser,
    host            = source.host,
    command         = resources["reconfigure_config_file.sh"],
    variables       = envMap
}
 
RunBash {
    environment     = source.environment,
    user 			= source.environmentUser,
    host            = source.host,
    command         = resources["register_database.sh"],
    variables       = envMap
}
 
RunBash {
    environment     = source.environment,
    user 			= source.environmentUser,
    host            = source.host,
    command         = resources["start_database.sh"],
    variables       = envMap
}
 
-- Return the newly provisioned source config
return {
    dataPath    = source.dataDirectory,
    port        = parameters.port,
    dbName      = parameters.dbName
} 

virtual/start.lua

envMap = {
    DELPHIXDB   = repository.installationPath,
    DATAPATH    = source.dataDirectory,
    PORT        = parameters.port,
    DBNAME      = parameters.dbName
}
 
RunBash {
    environment     = source.environment,
    user			= source.environmentUser,
    host            = source.host,
    command         = resources["start_database.sh"],
    variables       = envMap
}

virtual/stop.lua

envMap = {
    DELPHIXDB   = repository.installationPath,
    DATAPATH    = source.dataDirectory,
    PORT        = parameters.port,
    DBNAME      = parameters.dbName
}
RunBash {
    environment     = source.environment,
    user			= source.environmentUser,
    host            = source.host,
    command         = resources["stop_database.sh"],
    variables       = envMap
}

virtual/preSnapshot.lua

envMap = {
    DELPHIXDB   = repository.installationPath,
    DATAPATH    = source.dataDirectory,
    PORT        = parameters.port,
    DBNAME      = parameters.dbName
}
RunBash {
    environment     = source.environment,
    user			= source.environmentUser,
    host            = source.host,
    command         = resources["flush_database.sh"],
    variables       = envMap
}
RunBash {
    environment     = source.environment,
    user			= source.environmentUser,
    host            = source.host,
    command         = resources["quiesce_database.sh"],
    variables       = envMap
}

virtual/postSnapshot.lua

envMap = {
    DELPHIXDB   = repository.installationPath,
    DATAPATH    = source.dataDirectory,
    PORT        = parameters.port,
    DBNAME      = parameters.dbName
}
RunBash {
    environment     = source.environment,
    user 			= source.environmentUser,
    host            = source.host,
    command         = resources["unquiesce_database.sh"],
    variables       = envMap
}

virtual/status.lua

envMap = {
    DELPHIXDB   = repository.installationPath,
    DATAPATH    = source.dataDirectory,
    PORT        = parameters.port,
    DBNAME      = parameters.dbName
}
status = RunBash {
    command         = resources["query_database_status.sh"],
    environment     = source.environment,
    user 			= source.environmentUser,
    host            = source.host,
    variables       = envMap,
    outputSchema    = { type = "string" }
}
return status

More Information

Gotcha: Consider both dSource- and vFiles-based provisioning

When filling out the provision hook for your data management toolkit, be sure to take into account that provisioning from a dSource might be different from provisioning from a vFiles.

  • During a dSource sync, certain files and directories may have been explicitly excluded from the set of data captured using the Exclude Paths linking option. This same set of files and directories will not automatically be excluded from snapshots of vFiles. Consequently, this data may be present in certain snapshots.
  • vFiles provision operations may edit the target environment in a way that will break subsequent provisions or refreshes to the environment.

Be sure to add logic to handle these cases at the beginning of your provision operations so that your toolkit can provision robustly.