Author/s

Methodology

Tools

Tom et al. 2009 [21]

Based on statistical information retrieved from an instrumented version of the program under analysis.

Zoltar tool that adopts a technique to localize software faults.

Alessandra Gorla, 2009 [22]

Automatically locates the faults underlying the failures, derive assertions to effectively detect functional failures, and identify sequences of actions alternative to the failing sequence to bring the system back to an acceptable behavior.

Techniques to build software systems that can automatically heal such failures.

Ammo Krueger, 2010 [23]

Intercepts requests and decides on a per-token basis whether a token requires automatic “healing”.

Protocol-aware reverse HTTP proxy TokDoc (the token doctor) an intelligent mangling technique, which, based on the decision of previously trained anomaly detectors, replaces suspicious parts in requests by benign data the system has seen in the past.

Jens Ehlers, 2011 [24]

Incorporating architectural information about the diagnosed software system, Time series analysis of operation response times is employed for anomaly localization. Comprising quality of service data, such as response times, resource utilization, and anomaly scores, OCL-based monitoring rules specify the adaptive monitoring coverage.

An approach for localizing performance anomalies in software systems employing self-adaptive monitoring, implemented as part of the Kieker monitoring and analysis framework.

Boris Koldehofe, 2013 [25]

Eliminates the need for persistent checkpoints rollback-recovery by allowing for recovery from multiple simultaneous operator failures.

Event processing model to determine save oints and algorithms for their coordination in a distributed operator network.

Thorat et al. 2015 [26]

Rapid recovery (RR) mechanism to perform an immediate link recovery at the switch level without overburdening the controller.

Self-healing SDN framework which can optimize the recovery by applying autonomic principles and analytical model for calculating the failure recovery time and the backup flow rules required for recovery.

Katti et al. 2015 [14]

Algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable.

Compares two novel failure detection and consensus algorithms.