What does "Database High Availability" really mean?
For many of you, high availability is a key concern. Architects spend a lot of time in designing and planning for high availability of applications and databases. High availability is important for business continuity. A short downtime can lead to loss of business, therefore this topic needs to be addressed.
Listen to the blog
If you Google high availability, you will find many definitions. One definition from Wikipedia is given below:
High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
Key Principles of High Availability
The following are the key principles of High Availability:
- Eliminate any single point of failure: Adding redundancy, so that the failure of any one part of the system does not lead to the collapse of the entire system.
- Reliable crossover: In a redundant system, the crossover point itself becomes a single point of failure. Fault-tolerant systems must provide a reliable crossover or automatic switchover mechanism to avoid failure.
- Detection of failures: If the above two principles are proactively monitored, then a user may never see a system failure.
EDB Postgres has building blocks for covering all of the above key principles.
- Elimination of single points of failover - Postgres supports the following types of physical standbys:
- Cold standby - A backup server that has backups and all necessary WAL files for recovery. This system by definition is not up and running. However, the system can be made available if needed. Mainly we use backup servers and WAL files for creating a new PostgreSQL node as part of disaster recovery.
- Warm Standby - In Warm Standby mode, Postgres runs in recovery mode and receives the updates using archived log files or using log shipping replication of Postgres. In this mode, Postgres is not accepting connections or queries.
- Hot Standby - In Hot Standby mode, Postgres runs in recovery mode and receives the updates using archived log files or using log shipping replication. In recovery mode, Postgres supports connections and read-only queries.
Any of the above can help in eliminating single points of failover. However, depending on the agreed level of performance/uptime, users can choose any one of the above. The most popular standby mode after Postgres 9.0 is Hot Standby.
- Reliable crossover - For a reliable crossover, i.e., switching between master and standby(s) node(s), EDB provides a technology called EDB Postgres Failover Manager (EFM). This technology enables automatic failover of the Postgres master node to a Standby node in case of a software or hardware failure on the Master. EFM uses JGroups, which provides a reliable, distributed, and redundant infrastructure without a single point of failure.
- Detection of failures - EDB Postgres Failover Manager continuously monitors the server and detects failures. It also executes the failover from the Master to one of the Replicas in order to make the system available for accepting database connections and executing queries. Properly configured, EFM can detect failures, and execute a failover within a few seconds.
Combining all the above can help in achieving High Availability of EDB Postgres within a data center or across data centers. If you are a cloud user, you can have High Availability within a region (across multiple zones) or across the regions (using a backplane network supported by the cloud vendors).
PostgreSQL Database Uptime and Availability
Uptime and availability are generally used as synonymous. To achieve High Availability and maintain the agreed uptime, architects make sure to reduce the outages/downtime.
Service outages come in two main flavors:
- Planned outages
- Unplanned outages
Some people refer to them as Scheduled and Unscheduled downtime.
- Planned outage/Scheduled downtime - Planned outage/scheduled downtime is a result of maintenance activities, which disrupt system operation and usually cannot be avoided. It might include patches to system software that require a reboot or database restart. In general Planned outage is a result of some logical, management-initiated event.
- Unplanned outage/Unscheduled downtime - Unplanned Outage/unscheduled downtime is the result of downtime events due to some physical failures/events, such as hardware or software failure or environmental anomaly. For example, power outages, failed CPU or RAM components (or possibly other hardware components failure), network failure, security breaches, or various applications, middleware, and operating system failures result in Unplanned outage/Unscheduled downtime.
In the above outages/downtimes, the EDB Postgres Failover manager can help in minimizing the downtime. For planned outage/Scheduled downtime, a user/DBA can first patch all the standby(s) and use EDB Postgres Failover Manager perform switchover before patching the master (primary) node.
For unplanned outage/unscheduled downtime, EDB Postgres Failover Manager can detect failures and perform the failover to the appropriate standby, and make it the new master, which can then accept read/write connections and provide database services to the application. EDB Postgres Failover Manager also makes sure that the old master/primary doesn’t come back (after failover) to avoid a split-brain situation.
© EDB blog article by Vibhor Kumar, Vice President - Performance Engineering at EDB
Want to learn how to achieve extreme high availability and eliminate unnecessary downtime?
EDB Gold Partner
Kangaroot has been a proud Gold Business Partner for many years where we have built our expertise in PostgreSQL. Our strength lies in advising, managing & implementing high-quality technical solutions. Together we help you at every stage on your path to PostgreSQL.