This page is supposed to address this topic with a high level overview, covering “Ordinary” Single Instance Databases, Data Guard, Real Application Clusters (RAC) and Extended RAC (sometime called “Stretched Cluster”). The combination of RAC and Data Guard is advertised by Oracle Corp. under the label Maximum Availability Architecture (MAA). In addition to these Oracle HA Solutions, I will briefly cover also one Third Party HA Solution: Remote Mirroring.
I don’t intend to dive deep into technical details of all these
solutions but instead just want to differentiate them and talk briefly
about the various advantages and maybe drawbacks of each of them.
Also common these days is the placement of the Database on a Storage Area Network (SAN) like this:
Above illustrates the 11g New Feature Real-Time Query for Physical Standby Databases. The Standby is accessed Read Only while the actualization continues. Additionally, it is possible to offload Backups to the Physical Standby (also before 11g).
At first, we look at the still most common Oracle Database Architecture: Single Instance. An Oracle RDBMS consists always of one Database – made up by Datafiles, Online Redo Logfiles and Controlfile(s) and at least one Instance
– made up by Memory Structures (like a Database Buffer Cache) and
Background Processes (like a Database Writer). If we have one Database
and multiple Instances accessing it – that’s a RAC. If only one Instance
accesses the Database – that’s Single Instance. Small Installations
have stored all the components inside one Server like this:
From a HA perspective this Architecture is vulnerable: Server A and
Server B are Single Point of Failures (SPOFs) as well as Database A and
Database B are. Also, the Site where these Servers are placed is a SPOF.
Should one of the SPOFs fail, the whole Database is unavailable. An “ordinary” RAC addresses the Server SPOFs like this:
Should one of the two Servers fail, the Database C is still available.
HA is not the only reason to use RAC, of course. Amongst others, a
further valid reason to use RAC is Scalability: If our requirements
increase in the future, we can add another Server (Node) to the cluster.
Also, we have options like Service-Management and Load Balancing with
RAC. In short words: RAC is not just for HA, but it is out of the scope
of this article to address the other reasons in detail. Drawback from a
HA perspective of the above architecture is: The Database C resp. the Site C is a SPOF.
Should i.e a fire destroy Site C, the Database C is unavailable.
Therefore, we have the option to stretch the Database across two Sites,
which is usually called Extended RAC.
The Sites are no longer SPOFs here. The Database D is mirrored across
the two sites. Drawback of this architecture is the cost of the Network
Connection between the two Sites, if long distances are desired. That is
critical, because large Data Volume has to be mirrored. In effect, this
leads to distances that usually do not extend a couple of kilometers –
which may conflict with the goal to get Disaster Protection. Here, Data Guard kicks in: We can reach long distances for Disaster Protection with
Data Guard easier, because we do not transmit the whole Data Volume but
instead just the (relatively small) Redo Protocol. In the following
picture, the Servers hold Single Instances like Server A and Server B
above:
The Redo Protocol from the Primary
Database is used to actualize the Standby Database. Should the Primary
fail, we can failover to the Standby and continue to work productively.
This failover can be done automatically by an Observer (which is called
Fast-Start Failover). The distance between the two Servers can reach
thousands of kilometers – depending on the kind of redo transmission and
the protection level. If we combine RAC and Data Guard, we get MAA. Obviously, MAA is an expensive solution, but it also combines the advantages of RAC and Data Guard.
A popular Third Party HA Solution is Remote Mirroring. On a high level, it looks like that:
The Site is no SPOF here also, like with
Extended RAC. Drawbacks of this solution: The distance is usually very
limited for the same reason as with Extended RAC. Also, the Secondary
Site cannot be used productively while the mirroring is in progress –
contrary to the above Oracle HA solutions. With RAC all Servers resp.
Sites are in use productively. Even with Data Guard, the Standby is not
merely waiting for the Primary to fail. It can also be used for Read
Only Access – effectively reducing the load on the Primary:
Above illustrates the 11g New Feature Real-Time Query for Physical Standby Databases. The Standby is accessed Read Only while the actualization continues. Additionally, it is possible to offload Backups to the Physical Standby (also before 11g).