This new feature enables self-monitoring/diagnosability of Delphix Engines by providing native integration with Splunk Enterprise. By providing details about your Splunk instance, you can allow Delphix Engine(s) to automatically send structured JSON logs to Splunk that capture activity on those Delphix engine(s). These logs include Delphix events (Actions, Job Events, Faults, and Alerts) as well as performance metrics (CPU, disk, network, TCP, dataset, NFS, iSCSI) and capacity metrics. This feature enables extensible search and visualization of actionable information and provides a centralized, comprehensive view of Delphix activity (including the ability to cross-reference information from multiple Delphix engines) on a platform that allows building your own operational intelligence for your Delphix installation.
Before you configure the Delphix Engine you will need to configure and make a note of the following in Splunk:
In the Splunk web UI Enable SSL (this is optional but best practice for security) in your global HTTP Event Collector (HEC) settings.
The Splunk hostname or IP Address.
The HEC Port number for your Splunk instance (default 8088).
Enable the HTTP Event Collector on Splunk, and create a new HEC Token with a new Splunk index set as an allowed index for the token. Make sure Enable Indexer Acknowledgement is unchecked for the token.If you wish, you can use a separate Splunk index for performance and capacity metrics (otherwise, the same index will be used for both events and metrics). If you are using Splunk 7.0+, it is recommended that you create this second index as a special “Metrics” type index that is optimized for indexing and searching metrics data.
Note the HEC Token Value and the Allowed Indexes for the token.
Configuring Delphix for Splunk
Log in to the Delphix Server Setup UI as the sysadmin.
From the Preferences menu select Splunk Configuration.
In the Splunk, Configuration window, enter your Splunk values.
To reduce the volume of data that will be sent to Splunk, you can optionally uncheck Enable Metrics.
Splunk hostname or IP address
The TCP port number for the Splunk HTTP Event Collector (HEC)
The token for the Splunk HTTP Event Collector (HEC)
The Splunk Index events will be sent to. Must be set as an allowed index for the HEC token.
Events Push Frequency
The frequency at which the Events will be pushed to Splunk. Specified in seconds.
Whether to use HTTPS to connect to Splunk. Must match your HTTP Event Collector settings in Splunk.
The Splunk Index metrics will be sent to. If none is specified then the Main Index will be used for metrics as well. Must be set as an allowed index for the HEC token.
Metrics Push Frequency
The frequency at which the Performance Metrics will be pushed to Splunk. Specified in seconds
Performance Data Granularity
The resolution of performance metrics data sent to Splunk. This controls how frequently snapshots of system performance data are taken.
Click Send Test Data to verify your provided values.
This will send a test event to the provided token and indexes.
Click Save to enable the Splunk configuration and begin sending all new Actions, Job Events, Faults, Alerts, and Metrics to your Splunk instance.
Use the search to analyze your data and enumerate items in a metrics index. For more about searching a metrics index, refer to the Splunk documentation.
Search Examples - Metrics
The following examples provide information on viewing Metrics on Splunk 7.x
To get a list of all Metrics:
| mcatalog values(metric_name)
To get a list of all dimensions of a given metric - say CPU utilization percentage:
| mcatalog values(_dims) where metric_name="system.cpu.util.pct"
To view the average values of overall CPU utilization percentage across all hosts with a span of 30 seconds:
| mstats avg(_value) WHERE index=delphix_metrics AND metric_name=system.cpu.util.pct span=30s
You can also display results in a chart with CPU wildcard:
| mstats perc85(_value) AS val85 avg(_value) AS val where metric_name="system.cpu.*" span=1s by data.kernel, data.user, data.idle | eval total='data.kernel' + 'data.user' + 'data.idle' | eval sys_pct=(('data.kernel'/total) * 100) | eval usr_pct=(('data.user'/total) * 100) | eval idle_pct=(('data.idle'/total) * 100) | timechart span=10m avg(val) as "cpu.overall", avg(val85) as "cpu.overall 85th Percentile", avg(sys_pct) as "cpu.system", avg(usr_pct) as "cpu.user", avg(idle_pct) as "cpu.idle"
This type of search can be used to stack different CPU metrics that add up to 100%. Here is a sample screenshot of the above “stack different CPU metrics” from the Delphix Engines.