Overview

This topic describes the purpose and function of system faults.

System faults describe states and configurations that may negatively impact the functionality of the Delphix Engine which can only be resolved through active user intervention. When you login to the Delphix Admin application as a delphix_admin, the number of outstanding system faults appears on the right-hand side of the navigation bar at the top of the screen. Faults serve as a record of all issues impacting the Delphix Engine and can never be deleted. However, ignored and resolved faults are not displayed in the faults list.

System Faults indicator in the navigation bar

Delphix Object Based Environment Monitor Faults 

 Delphix now has a self-contained Java-based discovery infrastructure that consolidates with environment monitoring, communicates via common framework, and is able to provide feedback.

The environment monitor previously only created faults for "hosts" and "sources." There are several faults which more logically apply to other Delphix objects, such as repositories, which are DB install files. Posting them against sources results in fault duplication. The environment monitor now posts faults against -- and re-associates the offending faults with -- the correct objects. Consequently, users see fewer errors that are easier to diagnose.

Viewing Faults

To view the list of active system faults:
  1. In the top navigation bar, click Faults.
  2. Click any fault in the list to expand it and see its details.

Each fault comprises six parts:

  • Severity – How much of an impact the fault will have on the system. A fault can have a severity of either Warning or Critical.
    • Warning Fault implies that the system can continue despite the fault but may not perform optimally in all scenarios.
    • Critical Fault describes an issue that breaks certain functionality and must be resolved before some or all functions of the Delphix Engine can be performed. 
  • Date – The date that the Delphix Engine diagnosed the fault.
  • Target Object – The object against which the fault was posted. Faults will be posted against the host for incorrect environment configurations, sources for problems with the database, and repositories for issues with the installation.
  • Title – A short descriptive summary of the fault
  • Details – A detailed summary of the cause of the fault
  • User Action – The action you can take to resolve the fault


Parts of each system fault


Addressing Faults

After viewing a fault and deciding on the appropriate course of action, you can address the fault through the user interface (UI). You can mark a fault as Ignored or Resolved. If you have fixed the underlying cause of the fault, mark it as Resolved. Note that if the fault condition persists, it will be detected in the future and re-diagnosed. You can mark the fault as Ignored if it meets the following criteria:
  • The fault is caused by a well-understood issue that cannot be changed
  • Its impact to the Delphix Engine is well understood and acceptable

In this case, the fault will not be re-diagnosed even if the fault condition persists. You will receive no further notifications.

To address a fault follow the steps below.

  1. In the top menu bar, click Faults.
  2. In the list of faults, click a fault date/name to view the fault details.
  3. If the fault condition has been resolved, click Mark Resolved.
    Note that if the fault condition persists it will be detected in the future and re-diagnosed.
  4. If the fault condition describes a configuration with well-understood impact to the Delphix Engine that cannot be changed, you can ignore the fault by clicking Ignore.
    Note that an ignored fault will not be diagnosed again even if the underlying condition persists.

By default, when a critical or warning fault occurs, the Delphix Engine immediately sends an email to the delphix_admin. Make sure you have configured an SMTP server and defined an appropriate email address for delphix_admin. See Setting Up the Delphix Engine for more information.

By default, emails will also be sent for critical or warning alerts (aka events). You can modify the default behavior by changing the alert profile with the CLI. See the CLI Cookbook Creating Alert Profiles for more information.

Fault Lifecycle Example

Below is an image of the fault card for the fault "TCP slot table entries below recommended minimum."

The Details section of the fault explains that the sunrpc.tcp_slot_table_entries property on frodo.dcenter.delphix.com is set to a value that is below the recommended minimum of 128. The User  Action section instructs you to adjust the value of the sunrpc.tcp_slot_table_entries property upward to the recommended minimum. The process for adjusting this property differs between operating systems. To resolve the underlying issue, search "how to adjust sunrpc.tcp_slot_table_entries" using a search engine and find that the second result is a link to the Delphix community forum describing how to resolve this issue. After following the instructions applicable to your operating system, return to the Delphix UI and mark the fault Resolved.