Release Information

Introduction

Getting Started

Delphix Engine System Administration

Virtual Database Management with the Delphix Engine

Delphix Modernization Engine

Delphix Compliance Engine

Virtualizing Unstructured Files in the Delphix Engine

Virtualizing Oracle E-Business Suite in the Delphix Engine

JetStream

Mission Control

Delphix Express User Guide

Reference


Documentation for Previous Versions of the Delphix Engine

Delphix Server 3.0 Documentation
Delphix Server 3.1 Documentation
Delphix Engine 3.2 Documentation
Delphix Engine 4.0 Documentation
Delphix Engine 4.1 Documentation
Delphix Engine 4.2 Documentation

Skip to end of metadata
Go to start of metadata

This topic describes how to perform a sample performance investigation with one statistic from the Performance Analytics tool.

Introduction

The Delphix Engine uses Network File System (NFS) as the transport for Oracle installations. An increase in the NFS latency could be causing sluggishness in your applications running on top of Virtual Databases. This case study illustrates how this pathology can be root caused using the analytics infrastructure. This performance investigation uses one statistic to debug the issue, and utilizes the many axes of that statistic to filter down the probably cause of the issue. This technique uses an approach of iteratively drilling down by inspecting new axes of a single statistic, and filtering the data to only include information about the operations that appear slow. This technique is valuable for determining which use patterns of a resource might be causing the system to be sluggish. If you isolate a performance issue using this approach, but aren't sure what is causing it or how to fix it, Delphix Support can provide assistance for your investigation.

The following example inspects the statistic which provides information about NFS I/O operations on the Delphix Engine. This statistic can be collected a maximum of once every second, and the axes it can collect, among others, are:

  • latency, a histogram of wait times between NFS requests and NFS responses
  • size, a histogram of the NFS I/O sizes requested
  • op, whether the NFS requests were reads or writes
  • client, the network address of the NFS client which was making requests

Roughly the same performance information can be obtained from the iSCSI interface as well.

Investigation

  1. Begin the performance investigation by examingin some high-level statistic such as latency.

    1. Create a slice with statistic type NFS_OPS.
    2. Set the slice to collect the latency axis.
    3. Do not add any constraints.
    4. Set the collection interval.
      Anything over one second will work, but ten seconds gives good data resolution and will not use a lot of storage to persist the data that is collected. The rest of this example will assume a collection period of ten seconds for all other slices, but any value could be used.

      /analytics
      
      create
      set name=step1
      set statisticType=NFS_OPS
      set collectionInterval=10
      set collectionAxes=latency
      commit

      This will collect a time-series of histograms describing NFS latencies as measured from inside the Delphix Engine, where each histogram shows how many NFS I/O operations fell into each latency bucket during every ten-second interval. After a short period of time, read the data from the statistic slice:

      select step1
      getData
      setopt format=json
      commit
      setopt format=text

      The setopt steps are optional but allow you to see the output better via the CLI. The output looks like this:

      {
          "type": "DatapointSet",
          "collectionEvents": [],
          "datapointStreams": [{
              "type": "NfsOpsDatapointStream",
              "datapoints": [{
                  "type": "IoOpsDatapoint",
                  "latency": {
                      "32768": "16",
                      "65536": "10"
                  },
                  "timestamp": "2013-05-14T15:51:40.000Z"
              }, ...]
          }],
          "resolution": 10
      }

      The data is returned as a set of datapoint streams. Streams hold the fields which are shared by all the datapoints they contain. Later on in this example, the opt and client fields will be added to the streams, and multiple streams will be returned. Streams are described in more detail in Performance Analytics Tool Overview. The resolution field indicates the number of  seconds that corresponds to each datapoint, which in our case matches the requested collectionIntervalThe  collectionEvents  field is not used in this example, but lists when the slice was paused and resumed, to distinguish between moments when no data was collected because the slice was paused, and moments when there was no data to collect.

  2. If the latency distributions show some slow NFS operations, the next step would be to determine whether the slow operations are reads or writes.

    1. Specify a new NFS_OPS slice to collect this by collecting the op and latency axes.
    2. To limit output to the long-running operations, create a constraint on the latency axis that prohibits the collection of data on operations with latency less than 100ms.

      /analytics
      
      create
      set name=step2
      set statisticType=NFS_OPS
      set collectionInterval=10
      set collectionAxes=op,latency
       
      edit axisConstraints.0
      set axisName=latency
      set type=IntegerGreaterThanConstraint
      set greaterThan=100000000
      back
       
      commit

      The greaterThan field is 100ms converted into nanoseconds.

      Reading the data proceeds in the same way as the first step, but there will be two streams of datapoints, one where op=write, and one where op=read.

      Because we constrained output to operations with latencies higher than 100ms, none of the latency histograms will all have any buckets for latencies lower than 100ms.

  3. After inspecting the two data streams, you might find that almost all slow operations are writes, so it could be valuable to determine which clients are requesting the slow writes, and how large each of the writes is.

    1. To collect this data, create a new NFS_OPS slice which collects the size and client axes.
    2. Add constraints ensuring that the op axis should be constrained to only collect data for write operations, and the latency axis should be constrained to filter operations taking less than 100ms.

      Because the constraint on the op axis dictates that it will always have the value write, it is not necessary to collect the op axis anymore.

      /analytics
      
      create
      set name=step3
      set statisticType=NFS_OPS
      set collectionInterval=10
      set collectionAxes=size,client
       
      edit axisConstraints.0
      set type=IntegerGreaterThanConstraint
      set axisName=latency
      set greaterThan=100000000
      back
       
      edit EnumEqualConstraint
      set type=StringEqualConstraint
      set axisName=op
      set equals=write
      back
       
      commit

      Reading the data proceeds in the same way as the first two steps, but there will be one stream for every NFS client. The dataset collected by this will consist of a set of streams, one corresponding to each NFS client, and each stream will be a time-series of histograms showing write sizes that occurred during each ten second interval.

      Continuing to use this approach will allow you to narrow down the slow writes to a particular NFS client, and you may be able to tune that client in some way to speed it up.