Introduction

Discovery is the process by which the Delphix Engine identifies data sources and data dependencies on a remote environment. Discovery is run when an environment is added to the Delphix Engine or when an already added environment is refreshed.

After running discovery, the Delphix Engine tracks data sources as source configs and tracks data dependencies as repositories.

Source configs are objects that outline intrinsic properties of data sources. Source configs are used to uniquely identify remote data sources, assist the linking process when importing data into the Delphix Engine, and uniquely identify virtual copies of data created during provisioning.

Repositories are objects that outline intrinsic properties of data dependencies. Repositories are used to uniquely identify remote data dependencies, assist the linking process when importing data into the Delphix Engine, and assist the provisioning process when configuring virtual copies of data.
 

Glossary

TermDefinition
DiscoveryThe process by which the Delphix Engine identifies data sources and data dependencies on a remote environment.
Data sourceA dataset that exists outside the Delphix Engine.
Data dependency

Additional data needed to correctly interact with data sources.

For databases, this data is often a DBMS necessary for taking a backup or recovering virtual copies of data files.
Certain data sources have no data dependencies. For example, the Oracle EBS application binaries do not have a data dependency.

Source configAn object on the Delphix Engine that uniquely identifies a data source.
RepositoryAn object on the Delphix Engine that uniquely identifies a data dependency for some data source.


During discovery, a toolkit will supply metadata that defines the layout of the repository and source config objects. For each of these object types, this metadata is comprised of three parts:

  • Object schema – A specification of all of the fields belonging to the object.
  • Identity fields – A set of fields that, together, will define the identity of the object. That is, no two objects will have identical values for all of these fields.
  • Name field – A single field that can be used as a user-visible name for an object.

The relationship between an environment, a repository, and a source config is as follows:

  • An environment can contain many repositories.
  • Each repository can support zero or more source configs.
    You can also say that multiple source configs can depend on a single repository.

During the linking process, source configs and repositories are used as follows:

  • The linking process requires a discovered source config as input and results in the creation of dSource after data has been imported.
    The source config still exists on the Delphix Engine after the linking process is completed: its corresponding data source is considered to be linked now.
  • The linking process references the source config's associated repository throughout the data import process.

During the provisioning process, source configs and repositories are used as follows:

  • The provisioning process requires a repository as input (provisioning targets a repository). The provisioning process results in the creation of a vFiles after data has been copied and configured.
    The repository's corresponding data dependency is used during provisioning to perform configuration on the copied data.
  • The provisioning process creates a source config that corresponds to the new vFiles.

Discovering a Data Platform

A toolkit can support either of two types of discovery: manual or automatic. Manual discovery means that the user must manually specify data sources. Automatic discovery means that the toolkit can find the data sources on its own.

Supporting Automatic Discovery

The following is a walkthrough of how to implement discovery for the fictional database platform, Delphix DB.

Delphix DB, like most databases, provides binaries (the DBMS) that must be used to correctly read from, write to, and administer the data stored in Delphix DB instances. In other words, Delphix DB instances depend on the Delphix DB binaries for management.

Thus, the Delphix DB binaries will correspond to the repository object defined by our Delphix DB toolkit. Delphix DB instances will correspond to the source config objects.

Define the Repository Object

Start by defining a repository object schema. Here, the toolkit defines which properties are needed to identify and use the Delphix DB binaries.

"repositorySchema": {
	"type": "object",
    "required": ["installPath", "version"],
    "ordering": ["installPath", "version"],
    "additionalProperties": false,
    "properties": {
        "installPath": {
            "type": "string",
            "prettyName": "Delphix DB Binary Installation Path",
            "description": "The path to the Delphix DB installation binaries"
        },
        "version": {
	        "type": "string",
	        "prettyName": "Version",
	        "description": "The version of the Delphix DB binaries"
        }
    }
}

The Delphix Engine needs to be able to display a name for this repository object. The toolkit must specify which one of the repository object's fields should be used as a name.

Similarly, the Delphix Engine needs to be able to compare repository objects for identity. So, the toolkit must specify a list of fields that together will uniquely identify a repository. Two repository objects that share identical values for these "identity fields" will be considered as the same.

"repositoryIdentityFields": ["installPath"],
"repositoryNameField": "installPath"

Be careful in your choice! Imagine a situation in which you discover a Delphix DB repository on an environment and then later upgrade the binaries on the environment. The upgraded binaries will still represent the same data dependency as they did prior to being upgraded -- these binaries are still managing the same database instances. If the Delphix Engine ever reruns discovery, the currently-tracked repository should simply be updated so that its "version" field reflects the binary upgrade. That is, we want this to mean "The same repository still exists, but has a new version" – we don't want it to mean "The old repository no longer exists, and there is now a different repository". Because we limited the identity field set to include only "installPath" (but not "version"), we ensure this behavior.

Define the Source Config Object

Start by defining a source config schema. Here, the toolkit defines which properties are needed to identify any use a data source.

"sourceConfigSchema": {
	"type": "object",
    "required": ["dbName", "dataPath", "port"],
    "ordering": ["dbName", "dataPath", "port"],
    "additionalProperties": false,
    "properties": {
        "dbName": {
	        "type": "string",
	        "prettyName": "Delphix DB Name",
	        "description": "The name of the Delphix DB instance"
        },
        "dataPath": {
            "type": "string",
            "prettyName": "Data Path",
            "description": "The path to the Delphix DB instance's data"
        },
        "port": {
	        "type": "string",
	        "prettyName": "Port",
	        "description": "The port of the Delphix DB"
        }
    }
}

As with repositories above, the toolkit must decide on a "name" field and a set of "identity" fields.

"sourceConfigIdentityFields": ["dataPath"],
"sourceConfigNameField": "dataPath"

As before, be careful in your choice of identity fields! Imagine a situation in which either the port or name of a running Delphix DB instance is modified. The modified database instance will still represent the same data as it did prior to being updated. If the Delphix Engine ever reruns discovery, the currently tracked source config should simply be updated such that its "port" and "dbName" fields reflect the update. Therefore, we would not want to include these fields in the set of identity fields.

Supply the Hook for Discovering Repositories

A toolkit must supply a Lua hook which will identify data dependencies on an environment, and return information about them so that the Delphix Engine can create a repository object. This Lua hook will be run any time the Delphix Engine needs to discover the environment (i.e. when adding a new environment, or when refreshing an existing one). This Lua hook must be named repositoryDiscovery.lua and work as outlined below.

repositoryDiscovery.lua

Available Global State:

Expected Output: A list of Lua tables where each table represents a repository that corresponds to a data dependency found on the host.

  • Each of these tables must contain the fields that were specified in the toolkit's repository schema.

Execution Conditions:

  • Repository discovery is run whenever you add a new environment to the Delphix Engine.
    Repository discovery will also be rerun when you refresh an environment.

Whenever you add or refresh any environment, discovery is run for all the toolkits installed on the Delphix Engine.

Tutorial

At the root of your toolkit, create a directory named discovery and another directory named resources (if these directories do not already exist).

Inside the discovery directory, create a file named repositoryDiscovery.lua. This file will contain Lua code that implements repository discovery. See How to Write Lua Hooks for an introduction to Lua.

Typically, most of the logic of repository discovery will be done by a Bash script that runs on the environment in question. The repositoryDiscovery.lua would coordinate running this script and returning the information to the Delphix Engine.

There are two ways to supply such a Bash script. One is simply to write it, inline, as a string in your Lua hook. However, because the Bash code will often be sizable, inlining would result in code that is less readable and harder to test. Typically a better option is to supply the script in its own file in the toolkit's resources directory. The script will then be accessible from the Lua hook via the supplied resources object. See How to Write Lua Hooks for more information about the resources object.

In the example below, we've created a Bash script named find_installs.sh. When our Lua hook runs, the find_installs.sh Bash logic will be executed on the remote environment that is being discovered. The logic will use a CLI tool to find installed Delphix DB binaries. For each set of binaries found, we will use the jq command to build up a JSON object describing the binaries. At the end of the script, we will write the JSON to $DLPX_OUTPUT_FILE to pass the data back to Lua.

After the Bash logic finishes executing, the Delphix Engine will validate that the data written to $DLPX_OUTPUT_FILE  matches the outputSchema defined in our Lua hook. If the data is malformed, execution will stop and an error will be displayed. If the data is well formed, the JSON will be converted to a Lua table which can further be manipulated in repositoryDiscovery.lua.

In our case, no additional Lua logic is needed after executing the find_installs.sh Bash logic. The Lua script simply needs to return the repository objects to the Delphix Engine.

After the Lua script finishes executing, the Delphix Engine will validate that the returned data is an array of objects matching the repositorySchema defined earlier in this documentation. If the data is malformed, execution will stop and an error will be displayed. If the data is well formed, the repository objects will be persisted on the Delphix Engine.

discovery/repositoryDiscovery.lua

installs = RunBash {
    command        	= resources["find_installs.sh"],
    environment    	= remote.environment,
    user			= remote.environmentUser,
    host            = remote.host,
    variables       = {},
    outputSchema    = {
    	type="array",
        items={
            type="object",
            properties={
                installPath = { type="string" },
                version     = { type="string" }
            }
        }
    }
}
return installs

resources/find_installs.sh

# Add the directory containing jq to path so that invoking jq is less painful
PATH="$(dirname "$DLPX_BIN_JQ"):${PATH}"
 
# This function escapes its first argument and surrounds it with quotes
function quote {
	jq -R '.' <<< "$1"
}
 
# create empty output list
repoList='[]'
 
# get the list of install paths
installs=$(/usr/bin/delphixDB list-installs)
 
# for each install path, get the version and add the repo object to the array
for install in $installs; do
   	version=$("$install" --version)
    repo='{}'
    repo=$(jq ".installPath = $(quote "$install")" <<< "$repo")
    repo=$(jq ".version = $(quote "$version")" <<< "$repo")
	repoList=$(jq ". + [$repo]" <<< "$repoList")
done
 
echo "$repoList" > "$DLPX_OUTPUT_FILE"


Supply the Hook for Discovering Repositories

As with repositories above, a toolkit must also supply a Lua hook which will identify datasets on an environment, and return information about them so that the Delphix Engine can create sourceConfig objects. This Lua hook must be named sourceConfigDiscovery.lua and work as outlined below.

SourceConfigDiscovery

Available Global State:

  • resources – The resources object described in How to Write Lua Hooks.
  • remote – The remote object described in How to Write Lua Hooks.
  • repository – A repository object described by the toolkit's repository schema. Specifically, this object corresponds to the repository being discovered.

Expected Output: A list of Lua tables where each table corresponds to a source config.

  • Each of these tables must contain the fields specified in the toolkit's source config schema.

Execution Conditions:

  • Source config discovery is run for each repository discovered on an environment.
    This script will be run zero or more times after repository discovery is run.

Tutorial

At the root of your toolkit, create a directory named discovery and another directory named resources (if these directories do not already exist).

Inside the discovery directory, create a file named sourceConfigDiscovery.lua. This file will contain Lua code that implements source config discovery. See How to Write Lua Hooks for an introduction to Lua.

Inside the resources directory, create a file named find_instances.sh. As with repository discovery, as described above, this script will run on the host. It is in charge of outputting information about any datasets found that match the given repository. The sourceConfigDiscovery.lua hook will coordinate the execution of this Bash logic by referencing it via the resources object. See How to Write Lua Hooks for more information about the resources object.

The Delphix Engine will execute sourceConfigDiscovery.lua once for every discovered repository. The repository object will give us access to information about the particular repository being discovered. In our example, we will unpack the repository's installPath into an environment variable $INSTALLPATH that it can be referenced by Bash logic in find_instances.sh.

When our Lua code runs, the find_instances.sh Bash logic will use a CLI tool to find running Delphix DB instances belonging to $INSTALLPATH. For each instance found, we will use the jq command to build up a JSON object describing the instance. At the end of the script, we will write the JSON to $DLPX_OUTPUT_FILE to pass the data back to Lua.

After the Bash logic finishes executing, the Delphix Engine will validate that the data written to $DLPX_OUTPUT_FILE  matches the outputSchema defined in Lua. If the data is malformed, execution will stop and an error will be displayed. If the data is well formed, the JSON will be converted into a Lua table which can further be manipulated in sourceConfigDiscovery.lua.

In our case, no additional Lua logic is needed after executing the find_instances.sh Bash logic, so our Lua script will simply immediately return the source config objects to the Delphix Engine.

After the Lua script finishes executing, the Delphix Engine will validate that the returned data is an array of objects, each of which match the sourceConfigSchema defined earlier in this documentation. If the data is malformed, execution will stop and an error will be displayed. If the data is well formed, the source config objects will be persisted on the Delphix Engine.

discovery/sourceConfigDiscovery.lua

instances = RunBash {
    command        	= resources["find_instances.sh"],
    environment    	= remote.environment,
    user			= remote.environmentUser,
    host            = remote.host,
    variables       = {
		INSTALLPATH = repository.installPath -- When the command is run $INSTALLPATH will be an environment variable
	},
    outputSchema    = {
    	type="array",
        items={
            type="object",
            properties={
                dbName   = { type="string" },
                dataPath = { type="string" },
                port     = { type="string" }
            }
        }
    }
}
return instances

resources/find_instances.sh

# Add the directory containing jq to path so that invoking jq is less painful
PATH="$(dirname "$DLPX_BIN_JQ"):${PATH}"
 
# This function escapes its first argument and surrounds it with quotes
function quote {
	jq -R '.' <<< "$1"
}
 
# create empty output list
sourceConfigList='[]'
 
# get the list of install paths
instances=$("$INSTALLPATH" list-instances)
 
# for each install path, get the version and add the repo object to the array
for instance in $instances; do
   	port=$("$INSTALLPATH" get-port "$instance")
	dataPath=$("$INSTALLPATH" get-data-path "$instance")
    sourceConfig='{}'
    sourceConfig=$(jq ".dbName = $(quote "$instance")" <<< "$sourceConfig")
    sourceConfig=$(jq ".dataPath = $(quote "$dataPath")" <<< "$sourceConfig")
	sourceConfig=$(jq ".port = $(quote "$port")" <<< "$sourceConfig")
	sourceConfigList=$(jq ". + [$repo]" <<< "$sourceConfigList")
done
 
echo "$sourceConfigList" > "$DLPX_OUTPUT_FILE"

Delphix DB Example

Adding an environment with a single Delphix DB installation located at "/usr/bin/delphixdb" with three databases called "skywalker," "vader," and "obiwan" will result in the following source configs being discovered on the environment:

Manual Discovery

For data platforms that do not support automated, scriptable discovery, you must go through a process of manual discovery to create repository and source config objects. This means that the user will have to manually specify the information required to create source config objects.

For a toolkit to support manual discovery, the following toolkit properties must be adhered to:

  • The repositorySchema must not be empty. At a minimum, the schema should contain a single string field, "name".
  • The repositoryIdentityFields must not be empty. At a minimum, the set of identity fields should reference the schema's "name" field.
  • The repositoryNameField must not be empty. At a minimum, the name field should reference the schema's "name" field.
  • The "repositoryDiscovery.lua" script must return a Lua object containing a "name" field.  At a minimum, the value of "name" should match the toolkit's prettyName.
  • The sourceConfigSchema must be empty.
  • The sourceConfigIdentityFields must be empty.
  • The sourceConfigNameField must be the empty string.
  • The "sourceConfigDiscovery.lua" script must return an empty Lua array.

Please see Build a Direct Toolkit for an example of a toolkit that only supports manual discovery.

When a toolkit uses manual discovery, the user must specify the source config information in the Delphix Management application as follows:

  1. Login to the Delphix Management application.
  2. Click Manage.
  3. Select Environment.
  4. Click the environment which contains the data source you want to link.
  5. Click the Database tab for that environment.
  6. Click Add Dataset Home.
  7. From the drop-down menu, select the repository associated with your data platform.
  8. Fill in the appropriate fields.
    1. For a direct-linked data platforms, enter a Name for the source config and the Path to the data to be imported into the Delphix Engine.
    2. For a staged-linked data platforms, enter the Name for the source config.
  9. Click the Check icon to persist the source config to the Delphix Engine. The source config should appear underneath the previously selected repository. You should now be able to link the source config.