Monday, Feb 1, 2016
For the last couple of years, we’ve been working hard on a next generation of SaaS/BaaS offerings. Amongst these products and services, Baasic deserves special attention - we hope to see it growing to serve millions of users. It has been built in a “cloud-aware” fashion, so it can be installed on Azure/Amazon/Rackspace/you-name-it infrastructure, but it can also run in private environments. Truth be told, while Baasic is in the public beta phase, it already runs in production in several heavily customized, smaller-scale environments.
“Small-scale” in this context still means that we need to support all the best practices related to scalability and high availability. The problem is that information on these topics is scattered and technology-dependent. In a nutshell, our end goal is to put together a highly available and scalable infrastructure, including:
- ASP.NET application that scales out to multiple servers to accommodate ever-increasing user loads,
- database solution that can successfully scale to satisfy the requirements of the application,
- scalable and highly available file store solution. Binary user files are kept outside the database.
As it turns out, it is entirely possible to put together this type of infrastructure, but it requires time and experience. We have decided to choose tools and technologies outside Microsoft stack when it makes sense - both in terms of technical feasibility and licensing costs. This required a lot of time spent on researching, but yielded rather interesting results. Let us discuss some of the choices we’ve made, and proceed to the installation instructions.
- Scalability is the ability of an application to efficiently use additional resources in order to handle a growing amount of work.
- Scaling up (or vertical scaling) means buying a bigger server when resources become thin.
- Scaling out (or horizontal scaling) means expanding to multiple servers rather than a single, big server.
- High availability solutions seek to minimize system downtime and data loss in the event of a single or multiple failures.
- Failover is switching to a redundant or standby system component (server, hardware component, etc.) upon the failure or abnormal termination of the previously active system component.
Having that in mind, let us explain some of the choices we’ve made, starting from the toughest one.
While we’ve used SQL Server in most of our previous implementations, after doing some preliminary tests, PostgreSQL proved to be a very capable relational database. After all, it has millions of implementations over its 30-year history. It is still being kept very relevant in the big data era by adding support for new data types, such as JSON - a feature that is essential for implementing dynamic types and similar functionality in Baasic and similar solutions. The capability to handle schema-less data simultaneously with traditional relational data is a big plus for us. However, as most of traditional RDBMSes designed to power transactional workloads, it cannot scale out horizontally (at least without specialized add-ons). When resources become thin, you would buy a bigger server, instead of scaling the load out to multiple smaller machines. While scale-out approach is becoming increasingly popular these days (mostly in various NoSQL incarnations), I would still argue that scaling up is a viable solution for a large class of problems. If you need to scale on a Google scale, scale out is the right way to go. However, we are not Google, and neither are most of the web sites and applications.
To quote one of the posters at stackexchange, “first rule of horizontal scaling of a database is to avoid it at all costs.".
Once we decided to scale vertically, we still had a rather difficult task of ensuring high availability for the database server. High availability strategies for PostgreSQL are discussed in depth in the documentation. You will quickly learn that setting a highly available PostgreSQL cluster is not a task for the faint of heart. Here is a tutorial documenting one possible approach.
In our case, we needed full support for failover - in a two DB servers scenarios, if the primary server fails then the standby server should begin failover procedures. On a side note, this is probably the right time to introduce concepts of cold, warm and hot standby nodes. So, when the old primary server restarts, we must have a mechanism for informing the old primary that it is no longer the primary. This is known as STONITH (Shoot The Other Node In The Head), and is essential to avoid situations where both systems think they are the primary, which is a sure recipe for disaster, split-brain scenario and, ultimately, data loss. On the other hand, attempts to use PostgreSQL in multi-master shared storage configurations result in extremely severe data corruption. To make things more interesting, PostgreSQL does not provide the tools required to identify a failure on the primary and notify the standby database server out of the box. We did some research and tested a lot of tools for solving this problem, and while some users are perfectly happy with their choices, we were somewhat reluctant. In addition to that, running Windows version of PostgreSQL in a production environment is really not the best idea - and we already had a set of Windows Server 2012 R2 Servers with Hyper-V role as a software infrastructure for a virtualized environment. Database servers are running Linux, as multiple flavors of it are supported for use as a guest operating system in a Hyper-V virtual machine.
So how to implement a failover infrastructure for PostgreSQL in this environment? Again, the documentation on shared storage says that PostgreSQL (as opposed to various third-party forks) does not support multi-master or master-and-read-replica shared storage operation. It can only be used in cold standby failover configurations for shared-storage, and even then it is only safe with robust fencing/STONITH. One way to make a highly-available hot/cold PostgreSQL pair is to use shared storage: a single, highly redundant disk array (usually an expensive SAN, but sometimes a good quality NFS server) that is connected to two servers. That is what we are going to do, but with a little help of Windows Failover Clustering features.
A failover cluster is a group of servers that work together to maintain high availability of applications and services (these are known as roles). If one of the servers (or nodes) fails, another node in the cluster can take over its workload without any downtime. In addition, the clustered roles are monitored to verify that they are working properly. If they are not working, they are restarted or moved to another node. Failover clusters also provide Cluster Shared Volume (CSV, more on that later) functionality that provides a consistent, distributed namespace that clustered roles can use to access shared storage from all nodes. Using this technology, you can scale up to 64 physical nodes and to 8,000 virtual machines.
Combined with Hyper-V, failover clustering allows us to relatively easily host fault tolerant virtual machines. The aforementioned concept of roles comes into play here: in the context of failover clustering, a role can be thought of as a function that can be made highly available. This can include DHCP server, WINS server, message queue, file server (we’ll actually use this one), any a few other functions, and most importantly, virtual machines. It can be tempting to think of virtual machines as something separate, but Failover Cluster Manager treats them just as another role. A lesson to take away here is that you must make individual virtual machines highly available by adding them as roles in the Cluster Manager, even if they are happily running on a Hyper-V server that participates as a node within a failover cluster.
Therefore, we will configure our database server as a virtual server role within Failover Cluster Manager. This will in turn require that its disks reside on a cluster storage. The storage location has to be accessible to all the nodes in the failover cluster. When a node on which our database server VM runs goes down, it will be automatically moved to a live node and restarted there. The downtime will be negligible in most of the cases. Even more importantly, at most one highly available VM will be running at any point in time, moving all multi-master scenarios and the associated horror stories out of the picture.
We will also be using a File Server role that provides a central shared location where we can store and share files with users across the application. The only significant difference from a standard (non-clustered) scenario is that shares can be made continuously available: if a node on which the file server runs becomes unavailable, it will be transparently moved to another node, and the applications that use it will be totally unaware of it. All file share paths will remain unchanged.
As we make the database and the file storage highly available, all that remains is to do the same with application servers. In our scenario, these are ordinary Windows Server VMs with ASP.NET applications running in Internet Information Server. This turns out to be the simplest task: we have a set of independent VMs running as “ordinary” machines in Hyper-V manager - they are not configured as failover clustering roles. They do access the same highly available PostgreSQL database server, and use Redis as a cache data store. Redis Sentinel is used to provide high availability.
Just to briefly mention the rest of the network infrastructure (firewalls, routers and load balancers) in front of these servers. Everything is set up in a redundant fashion. Therefore, firewalls/routers are using Virtual Router Redundancy Protocol (VRRP) that introduce a concept of virtual routers, which are an abstract representation of multiple routers - master and backup - acting as a group. When an active routers fails, a backup router is automatically selected to replace it. In a similar fashion, we are using multiple HAProxy load balancers, along with multiple A DNS records for our service employing round robin technique. There are multiple approaches for achieving the full redundancy and avoiding single point of failure in this type of environment: for example, Stackoverflow apparently uses keepalived to ensure high availability; heartbeat is also an alternative. We opted for an alternative approach that combines DNS load balancing - which achieves a fairly rough balance of traffic between multiple load balancers and enable failover when one of load balancer dies - and using load balancers to do their job on a more granular level. “DNS Load Balancing and Using Multiple Load Balancers in the Cloud” and “How To Configure DNS Round-Robin Load-Balancing For High-Availability” describe “our” approach in more details.
This was a fairly long introduction to the basic concepts of Windows Clustering with Hyper-V. Next time we’ll get our hands dirty and describe how to set up each of the system components. In the meantime, I can recommend a couple of excellent learning resources on this topic:
- Hyper-V failover clusters: hands-on tutorial on installing a Hyper-V failover cluster.
- Networking and Hyper-V: discusses important concepts such as VLANs, routing, link aggregation and teaming, load balancing, etc.
- Taking a Fresh Look at Hyper-V Clusters: a step-by-step tutorial that described the installation process.
- High Availability is Now Easier than Ever – Improvements to Failover Clustering and Hyper-V in Windows Storage Server 2012 R2: what’s new in Windows Server 2012 R2.