What is a Buffer Pool (Bufferpool or Cache)?
A buffer pool is a section of the system's main memory allocated by the database. A database allocates part of the memory for various sections to enhance performance. Here, we will focus on one aspect that is crucial for database performance. When a database receives a Select command, it first searches for the data in the buffer pool. If the desired data is present in the buffer pool, it is sent to the user. If the data is not in the buffer pool, a read request (I/O) is made from the disk, and the data is loaded into the buffer pool before being sent to the user. For write commands, the pages are also moved to the memory, and changes are made to the pages, but for performance enhancement, these changes are not necessarily written to the file on the disk immediately.
Advantages of Using a Buffer Pool
When a command is sent to the database, if the desired data is in the buffer pool, the user receives the data more quickly without waiting for a disk read (I/O). This effectively reduces response time and decreases the number of I/O operations. The more useful and frequently accessed data is stored in the buffer pool, the faster users receive their responses, thereby increasing the speed of storage and retrieval operations.
What is a Page?
In relational databases like SQL Server, MySQL, Oracle, DB2, etc., data is often stored in rows within pages. Data can also be stored in columns, which has its applications but is not the focus here. If the data is written in rows within a page, it is physically written to the disk in the same manner, and during data retrieval, instead of reading one or several rows, the pages containing the records are read and transferred to the buffer pool. Thus, even if only one field of a record or one row of a table is read, an entire page is read. Pages usually come in sizes of 4K, 8K, or 16K, and the choice of page size depends on the length and number of records, design, and the administrator's preference.
Terminology
- Buffer Pool Size: The size of the buffer pool is determined by the database administrator. It typically has minimum, maximum, and default values. The minimum value is the amount that, once allocated by the database, is not released. The maximum value is the amount to which the buffer pool can be allocated, and if not used, its space is freed and made available to the operating system.
- Dirty Pages, Updated Pages: Pages that are modified are termed dirty or updated pages. These pages are written to the disk at an appropriate time.
- Clean Pages, Available Pages: When the data of a changed page is written to the disk, they can be rewritten or deleted.
- Hit Ratio: The ratio of pages found in the buffer to the total pages read is called HR. This metric is used to evaluate the buffer pool. The closer the number is to 100, the more optimal the use of the buffer pool. It means that more data from tables and indexes were read from memory, putting less load on the disks and resulting in fewer I/O operations. A lower HR indicates a slower database, high response time for users, and poorly executed commands with voluminous results.
Hit Ratio=(total get pages−total pages read)/total get pages
- Offload, Page Write: Sometimes, the database decides to write the changed pages to the disk. Some databases have parameters such as the number of commits per minute, the fullness of the buffer pool, or specific schedules. When the time comes, or parameters reach their thresholds, the database writes the modified pages to the disk, making the pages ready for reuse (Rewrite).
- Page Stealing: When the buffer pool is full and new pages need to be read into the buffer pool, and if additional memory allocation is not possible, the database candidates pages for removal. Usually, the LRU algorithm (Least Recently Used) is employed, which uses pages that are used less, or FIFO (First In, First Out) uses pages that arrived first.
How is Buffer Pool Freed?
As mentioned in the "Page Stealing" section, the main memory is usually smaller compared to the size of tables and indexes. Therefore, allocating high capacity for buffer pools is not feasible. When the database needs free space in the buffer pool, it frees up pages, or even when the pages are not in use, it removes them. This way, the main memory is freed until it reaches the minimum size.
Setting the Buffer Pool Size
The initial sizing and determining the amount of buffer pool has a significant impact on the database performance. A small amount increases response time and the number of I/O operations. On the other hand, allocating too much memory for the buffer pool consumes main memory, causing other systems and programs to suffer from memory shortage. Setting the buffer pool size and allocating them to tables and indexes requires extensive experience. It is recommended to monitor and evaluate the buffer pools of a database continuously.
Recommendations for Developers and DBAs
- Avoid running queries that perform extensive data scans in OLTP systems: Extensive data scans cause useful pages to be cleared or evicted, and if needed again, they must be reloaded into memory.
- Design and build appropriate indexes: Prevent reading large tables. While adding indexes can be a writing overhead, appropriate indexes improve the performance of queries containing WHERE clauses.
- Cache repetitive and base data on the client side: Storing static data such as lists of cities, branches, etc., reduces network traffic and database load, allowing more buffer pool memory to be used for critical data.
- Monitor the buffer pool continuously: Avoid running queries that decrease buffer pool performance during peak times.
- Explain queries before moving them to the production environment: Optimize if extensive scans are observed.
- Avoid features in applications that require extensive traversal or changes to data whenever possible.
- Avoid accepting large-scale reporting and complex calculation queries in OLTP systems.
- If the database allows, fix base and static tables in the buffer pool to ensure their data is never evicted from the buffer pool.
- Thoroughly study the technical information, infrastructure, and recommendations for each database type you use.
Monitoring Buffer Pool
In this section, it is better to discuss some important metrics for monitoring, as defined by specific platforms. In SQL Server, we can monitor the following metrics:
- Buffer Hit Ratio
- Page Lookups Rate
- Page Read Rate
- Page Write Rate
- Database Pages
- Page Life Expectancy
In Oracle, important metrics for buffer pools are as follows:
- IO Read Rate
- IO Write Rate
- Blocks Read Rate
- Blocks Write Rate
- Number of Active Sessions
- Number of System Sessions
- Number of Inactive Sessions
- Buffer Hit Ratio
- Data Dictionary Hit Ratio
- Library Hit Ratio
- SGA Free Memory Space
- Buffer Cache Size
- Shared Pool Size
- PGA Freed Back Memory
- Total PGA Freeable Memory
- Maximum PGA Allocated Memory
- Current PGA Allocated Memory
- Total PGA Used Memory
- PGA Cache Hit Ratio
In PostgreSQL, the following metrics are important:
- Number of Buffers Written
- Buffer Clean
- Maximum Written Clean
- Buffer Written Directly
- Allocated Buffer