DB2 for z/OS is a relational database deployed on IBM mainframe hardware, used for storing and retrieving large volumes of data. IBM also offers this database under the same name for Windows and Linux operating systems (collectively referred to as LUW), which have been discussed in other sections. However, for environments with high transaction volumes, the use of mainframe hardware is recommended. Therefore, many leading global banks and government banks in Iran, due to their need for stability and high processing power, use mainframes and the DB2 for z/OS. Currently, DB2 is an integral part of the core banking systems of many government banks in Iran, and setting up and maintaining it comes with challenges such as monitoring tools and expertise.
This article addresses some of the important and influential metrics in monitoring the DB2 for z/OS. (The examination of the DB2 for LUW database is done in another article; click on this link to read it). Before delving into the subject, it should be noted that the DB2 database in the z/OS environment can be deployed with different configurations. The Distributed Data Facility (DDF) is one of the DB2 database services for z/OS, used for connecting applications with JDBC and ODBC drivers, and it is assumed that this service is in the started state for monitoring purposes.
For any database administrator, ensuring the availability of the database is of utmost importance. Database availability means that the database is started, the connection port is open, and applications can connect to the database and execute queries. When a problem arises, the first step administrators typically take is to test the port and check if the database port is accessible. However, this test does not necessarily guarantee the database's availability to applications, as the port might be open but requests or queries might not be executable. Therefore, database availability testing must go beyond port testing. Executing a simple query with minimal overhead can provide administrators with the assurance that the database is accessible from the perspective of the applications. Periodic query execution tests and alert notifications in case of failures can inform administrators of issues within the database. Failure to receive a response might indicate network connectivity problems, access issues, operating system problems, the database not being started, the relevant port not being open, or issues with command execution, all of which need prompt investigation.
In addition to availability metrics, the database's response time to a specific request over time is also crucial. Currently, these two metrics are periodically extracted from the DB2 database by the Moein monitoring platform.
One of the important parameters in connecting through DDF is the thread statistics. These statistics can be extracted in z/OS using the DISPLAY DDF DETAIL command. The default output of this command is shown below:
In this illustration, the items highlighted in red are key metrics. This diagram provides comprehensive information on thread statistics and connections, and currently, the Moein system captures and stores many of these parameters as metrics. This allows administrators to analyze the database status over time accurately.
If the STATUS=STOPDQ, it indicates that the DDF port is closed. In this state, no application can connect to the database through DDF. The normal status for connections is STATUS=STARTD. For more detailed information on various statuses, refer to the DB2 Messages book.
One of the most important parameters is ADBAT. According to the definition provided in this link, the ADBAT metric includes the number of active threads and disconnected threads. However, the value of the DT parameter is also critical in analyzing this metric. Threads are responsible for executing commands and operations requested by clients and applications. Therefore, there could be thousands of connections to the database, and the threads are tasked with servicing each connection.
The total number of allowable threads can be found in the MDBAT parameter, which can be configured in ZPARM. Hence, the maximum number of threads is equivalent to MDBAT, and ADBAT cannot exceed it. It is advisable to extract the normal ADBAT statistics during regular times and monitor them. A decrease below the normal value indicates a drop in application connections or a reduction in users, while an increase above normal indicates the database's failure to respond promptly to requests. Observing a decrease or increase in connection statistics relative to the normal value should prompt an investigation into the root cause, which could be external or internal database factors.
The real-time statistics of disconnected threads are provided in the DSCDBAT parameter. With this in mind, the difference between ADBAT and DSCDBAT becomes an important metric. Since the total number of active threads, both active and disconnected, is included in ADBAT, the higher this difference, the more it indicates that threads are not being released and are occupied. Therefore, if the ADBAT value approaches MDBAT (the thread limit) and the difference between ADBAT and DSCDBAT is also close to ADBAT (indicating a negligible number of DSCDBATs), it signifies that threads are not being released and are busy, necessitating an investigation into the reason for the thread occupation.
When the difference between ADBAT and DSCDBAT increases (assuming ADBAT is close to MDBAT), the database stops responding to new connections, effectively queuing requests in DDF. This queue size is reflected in the QUEDBAT parameter. However, QUEDBAT is cumulative and increases from the time the DDF module is activated.
Therefore, if a request queue forms in the database for any reason, it indicates that threads are busy. Currently, all the mentioned parameters are monitored by the Moein platform, allowing administrators to understand the performance of the DB2 database on z/OS based on the trends of these parameters. The platform also enables threshold settings for the specified metrics, sending notifications to users when thresholds are breached. It's important to note that since QUEDBAT is cumulative, the changes are calculated differentially in each period to make the change in queued requests more apparent to the system administrator.
It's worth noting that an increase in ADBAT alone does not necessarily indicate adverse conditions. If this increase is accompanied by a decrease in DSCDBAT count, a queue of requests may form in DDF after a while. Therefore, Moein has established a separate metric called "ADBAT and DSCDBAT Difference" to provide a basis for assessing the overall database status regarding thread conditions.
Finally, it should be noted that the DB2 database configuration might include multiple members. In this case, DDF information for all members is extracted, and the aforementioned metrics are calculated for each member and the entire database.
DB2 stores all events related to data storage in log files. The timely storage of information and offloading (Offload) and archiving of these files have a significant impact on DB2 performance. In summary, DB2 usually has three pairs of log files, where data is written and offloaded in a circular manner. After each file fills up, its data is stored in other files called archive files, and the data is cleared, preparing for the next cycle. Therefore, it is essential to continuously monitor these files and ensure they are emptied.
In some cases, DB2 may experience significant delays in writing log files or archiving them due to I/O problems, making log file monitoring critical. To check the log status, the DISPLAY LOG command is used. The result of executing this command contains the amount of free space and the names of the files being written to. In the Moein platform, the total number of active logs and the number of offloads not yet performed are extracted from the database. Therefore, if the number of unfinished offloads increases, it indicates that there is an issue with archiving the log files, and further investigation is required.
By executing the DISPLAY LOG command, the following information is retrieved:
One of the important aspects in DB2 is the status of tablespace space in databases. The DISPLAY DATABASE(dbname) SPACENAM(*) RESTRICT command is used to show the tablespace status, and if there is any unusual status, it helps resolve the issue by showing it. The ideal response is that there is no tablespace status in the response, indicating that all tablespace spaces in the database are normal.
From a monitoring perspective, it's important to note that tablespace space may experience issues preventing reading and writing. In such cases, programs will encounter problems, and users will receive error messages. In the Moein monitoring platform, the status of each database and its tablespace is monitored, ensuring that the administrator is informed of the normal status of these spaces.
The mentioned items are some of the key indicators affecting the performance of DB2 under Z, and monitoring them can provide insights into the overall health of the database, helping operators identify issues. Timely monitoring of these indicators is very important, especially since this database is used in large organizations with various clients and programs. Knowing when a problem occurs in the database can help identify the root cause.
It is recommended to monitor these database indicators alongside the health indicators of the z/OS operating system. Currently, the Moein platform also provides the ability to monitor this operating system via the standard RMF interface, and the details of these indicators can be found in this link. Therefore, by monitoring the operating system alongside the database health indicators, the correct performance of this crucial database can be ensured. If you need to know all the monitored indicators for the DB2 For z/OS database, you can refer to this link.