Many IT service providers have clients who use their services at various times of the day and night. When a service issue arises, identifying and resolving the problem, or in other words, reducing the Mean Time to Repair (MTTR), is crucial. Reducing MTTR largely depends on quickly finding the problem.
The ease of defining services, the high speed in detecting problems, and consequently reducing MTTR are key promises of the Moein monitoring platform. The following explains how this important feature is achieved.
Suppose you have a service consisting of a cluster of application servers, web servers, and several types of databases. The structure of this hypothetical service is shown below.
This structure is common in many services within large and mid-sized organizations that handle a substantial number of transactions. Instead of an Apache web server, other popular products like Nginx or IIS might be used. Similarly, instead of a WebLogic application server, other products such as Tomcat, WebSphere, or jBoss might be utilized. This applies to both relational and non-relational databases as well. The critical aspect of this structure is the variety and multiplicity of tools that the service administrator must continuously monitor in the operations section. In the event of an issue, the administrator needs to inspect each tool and evaluate their performance to identify the root cause of any potential problem. This complexity increases significantly when the administrator is responsible for overseeing multiple systems.
The Moein development team, aware of these challenges in system monitoring, has introduced a new feature to the Moein platform called "System And Complex Objects" This feature allows administrators to define systems or services. Using a rule-based approach to establish correlations between objects, the root cause of the problem is identified based on the defined rules and displayed in a user-friendly tree format.
The following figure shows a tree representation of a hypothetical service in the Moein platform. This allows users to quickly and easily identify issues if a problem occurs.