Introduction
Provisioning is the process by which the Delphix Engine creates a new, virtual copy of a data source.
To initiate a provision, a Delphix user first selects a snapshot of data on the Delphix Engine that they want to copy: this snapshot is called the parent snapshot for the provision. The Delphix Engine clones the data in the parent snapshot to create a new copy called a virtual dataset. A virtual dataset is cheap to make, fully readable and writable, and requires no extra storage to maintain until changes are made to it. It is referred to as "virtual" because most of the data that appears to belong to a virtual dataset actually belongs to its parent snapshot.
At the same time as a Delphix user selects a parent snapshot, they also select a target environment for the provision. A target environment is an environment suitable for hosting a virtual copy of a data source.The Delphix Engine mounts the virtual dataset to the target environment after creating the virtual dataset.
The mounted virtual dataset is then configured on the target environment. Configuration is the process by which the Delphix Engine takes the raw copy of the data stored in the parent snapshot and transforms into a useful copy of the original data source. For a database, this process involves bringing data files back to consistency and administering the database so that it can accept queries.
After a successful provision, a virtual dataset mimics the original data source. It is accessible on the target environment, but all reads and writes performed against it will access mounted storage provided by the Delphix Engine.
Virtual Dataset Operations
The Delphix Engine provides storage for virtual datasets and therefore must have the ability to manage virtual datasets. Note that this management differs from that of data sources because the Delphix Engine does not provides storage for data sources.
The operations available to manage a virtual dataset are referred to collectively as virtual dataset operations. The Delphix Engines consumes these operations internally to provide Delphix Data-as-a-Service (DaaS) and intelligently handle advanced scenarios like Delphix Engine reboot, target environment reboot, etc. A subset of these operations are also available to Delphix users through the Delphix Management application (GUI), CLI and API.
During the provisioning process, source configs and repositories are used as follows:
- The provisioning process requires a repository as input (provisioning targets a repository). The provisioning process results in the creation of a virtual dataset after the data has been copied and configured.
The repository's corresponding data dependency is used during provisioning to perform configuration on the copied data. - The provisioning process creates a source config that corresponds to the new virtual dataset.
Glossary
Term | Definition |
---|---|
Virtual Dataset | A dataset that has been provisioned from another dataset. |
Virtual Source | An object representing data about a virtual dataset. |
Target Environment | An environment hosting a virtual dataset. |
Here is a summary of the various operations that may be performed on a virtual dataset that are relevant to the toolkit writer.
Operation | Summary |
---|---|
provision | Initial creation of a virtual dataset. A virtual dataset is provisioned from a snapshot of another dataset (which could be a dSource or a virtual dataset) |
stop | Halting of any use of the virtual dataset on the target environment. For example, this might involve stopping various processes that are interacting with the dataset. |
start | Beginning of any use of the virtual dataset on the target environment. For example, this might involve starting DBMS processes to interact with the dataset. |
disable | Removal of the dataset from the target environment. |
enable | Appearance of the dataset on the target environment. |
rewind | Moving a dataset "back in time" so that it appears the same as it did at some point in the past. |
refresh | Replacing the dataset's contents with the contents of parent dataset. |
Virtual Source Definition
Virtual dataset management is coordinated by a virtualSourceDefinition
that is provided by the toolkit. The virtualSourceDefinition
has two important parts:
- Parameters– A specification of custom metadata required for this toolkit to manage virtual datasets.
- Hooks– A collection of Lua scripts that coordinate operations that happen on the provisioned virtual dataset or the target host.
In their main.json
file, Toolkits specify which parameters of which types are necessary. Users will fill in values for these parameters when they provision a new virtual dataset, and these values will be available to all of the hooks.
Here is an example of a virtualSourceDefinition
that defines two parameters.
{ "type" : "ToolkitVirtualSource" , "parameters" : { "type" : "object", "properties" : { "virtualDbName" : { "type" : "string" , "prettyName" : "Virtual DB Name" , "description" : "The name of the virtual database to create." , "default" : "virtualDB" }, "port" : { "type" : "integer", "prettyName" : "Port", "description" : "The port to be used by the virtual database.", "default" : 1234 } } } } |
Provisioning Hooks
Toolkits customize Delphix's data configuration and management by providing scripts for the following hook points:
Hook | Input | Output | Description | Purpose |
---|---|---|---|---|
| SourceConfig | Executed just after cloning the captured data and mounting it to a target environment. Specifically, this hook is run during provision and refresh, prior to taking the initial snapshot of the clone. This toolkit hook is run before the user-customizable Configure Clone and Before Refresh hooks are run. It must return a | Configure the data to be usable on the target environment. For database data files, this may mean recovering from a crash consistent format or backup. For application files, this may mean reconfiguring XML files or rewriting hostnames and symlinks. | |
unconfigure | resources, source, parameters, repository, config, delete flag | None | Executed when a dataset is about to be disabled. This includes cases when a currently-enabled data is about to be deleted. | Preparing the target environment for the disappearance of the data. For example, this may involve "unregistering" the dataset from a DBMS. |
reconfigure | SourceConfig | Executed just after a dataset has been re-enabled on the target host. This includes the re-enabling that happens as part of a rewind. This is passed the current sourceConfig object, and must also return a sourceConfig object to represent the new status of the dataset. The passed-in object can be returned as-is if there is no need to make any changes to it. | Configure the potentially-changed data to be usable on the target environment. | |
start | None/Error | Executed whenever the data should be placed in a "running" state. Specifically, this hook is run:
| Start any processes which should run on top of the mounted data, such as starting a DBMS | |
stop | None/Error | Executed whenever the data should be placed in a "stopped" state and unmounted. It is important that this hook stops all processes from accessing the mounted data; otherwise, subsequent unmount commands may fail. Specifically, this hook is run:
| Stop any processes which are running on top of the mounted data | |
preSnapshot | None/Error | Executed prior to taking a ZFS snapshot of the mounted data | Quiesce the data so it can be snapshotted. Stage any files which should be included in the snapshot. | |
postSnapshot | Snapshot Metadata | Executed after taking a ZFS snapshot of the mounted data. This toolkit hook is always run regardless of the success of the snapshot or If the toolkit has provided a | Undo any work done by the preSnapshot hook | |
status | "ACTIVE" or "INACTIVE" | Periodically executed to determine the state of the vFiles. The output of this script should be a single JSON string: "ACTIVE" or "INACTIVE." See Output from Lua Functions. Errors are reported by returning a non-zero exit code from an executed Powershell or Bash script. | Alert Delphix users of data management problems before it affects end users |
The output of the status script must be a JSON string: "ACTIVE"
not ACTIVE
.
Provisioning DelphixDB
This section walks through an example of defining virtual dataset behavior for a toolkit designed for the fictional DelphixDB.
Provision Parameters
Parameter Name | Description | Type |
---|---|---|
port | Port that provisioned database should use | Integer |
dbName | Name to use for newly provisioned database | String |
In the main.json file, the ToolkitVirtualSource will be:
{ "type": "ToolkitVirtualSource", "parameters": { "type": "object", "properties": { "port": { "type": "integer" "prettyName": "Port", "description": "Port that provisioned database should use." }, "dbName": { "type": "string", "prettyName": "Database Name", "description": "Name to use for newly provisioned database." } } } }
Hooks
Hook | DelphixDB-Specific Steps |
---|---|
configure/reconfigure |
|
unconfigure |
|
start |
|
stop |
|
preSnapshot |
|
postSnapshot |
|
status |
|
Below are the Lua and bash scripts.
Shell scripts
cat > resources/reconfigure_config_file.sh <<EOF # shell code ommitted for brevity # config file is found at "$DATAPATH/config.txt" # replace config file port with $PORT and database name with $DBNAME EOF cat > resources/register_database.sh <<EOF $DELPHIXDB register $DATAPATH EOF cat > resources/start_database.sh <<EOF $DELPHIXDB start $DBNAME EOF cat > resources/stop_database.sh <<EOF $DELPHIXDB stop $DBNAME EOF cat > resources/flush_database.sh <<EOF $DELPHIXDB flush $DBNAME EOF cat > resources/quiesce_database.sh <<EOF $DELPHIXDB quiesce $DBNAME EOF cat > resources/unquiesce_database.sh <<EOF $DELPHIXDB unquiesce $DBNAME EOF cat > resources/query_database_status.sh <<EOF # Check if the output of status contains the string "running" status=$($DELPHIXDB status $DBNAME) if [[ $status == *"running"* ]] then echo "\"ACTIVE\"" > $DLPX_OUTPUT_FILE else echo "\"INACTIVE\"" > $DLPX_OUTPUT_FILE fi EOF
virtual/configure.lua
envMap = { DELPHIXDB = repository.installationPath, DATAPATH = source.dataDirectory, PORT = parameters.port, DBNAME = parameters.dbName } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["reconfigure_config_file.sh"], variables = envMap } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["register_database.sh"], variables = envMap } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["start_database.sh"], variables = envMap } -- Return the newly provisioned source config return { dataPath = source.dataDirectory, port = parameters.port, dbName = parameters.dbName }
virtual/start.lua
envMap = { DELPHIXDB = repository.installationPath, DATAPATH = source.dataDirectory, PORT = parameters.port, DBNAME = parameters.dbName } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["start_database.sh"], variables = envMap }
virtual/stop.lua
envMap = { DELPHIXDB = repository.installationPath, DATAPATH = source.dataDirectory, PORT = parameters.port, DBNAME = parameters.dbName } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["stop_database.sh"], variables = envMap }
virtual/preSnapshot.lua
envMap = { DELPHIXDB = repository.installationPath, DATAPATH = source.dataDirectory, PORT = parameters.port, DBNAME = parameters.dbName } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["flush_database.sh"], variables = envMap } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["quiesce_database.sh"], variables = envMap }
virtual/postSnapshot.lua
envMap = { DELPHIXDB = repository.installationPath, DATAPATH = source.dataDirectory, PORT = parameters.port, DBNAME = parameters.dbName } RunBash { environment = source.environment, user = source.environmentUser, host = source.host, command = resources["unquiesce_database.sh"], variables = envMap }
virtual/status.lua
envMap = { DELPHIXDB = repository.installationPath, DATAPATH = source.dataDirectory, PORT = parameters.port, DBNAME = parameters.dbName } status = RunBash { command = resources["query_database_status.sh"], environment = source.environment, user = source.environmentUser, host = source.host, variables = envMap, outputSchema = { type = "string" } } return status
More Information
Gotcha: Consider both dSource- and vFiles-based provisioning
When filling out the provision hook for your data management toolkit, be sure to take into account that provisioning from a dSource might be different from provisioning from a vFiles.
- During a dSource sync, certain files and directories may have been explicitly excluded from the set of data captured using the Exclude Paths linking option. This same set of files and directories will not automatically be excluded from snapshots of vFiles. Consequently, this data may be present in certain snapshots.
- vFiles provision operations may edit the target environment in a way that will break subsequent provisions or refreshes to the environment.
Be sure to add logic to handle these cases at the beginning of your provision operations so that your toolkit can provision robustly.