Drupal 8: Load Balancing with HAProxy

As your website grows, there will be a point when there are more people accessing your web server than possible for a single server to handle. This is when load balancing will become a critical step in your Drupal setup. Load balancing increases reliability of your application in case a web server goes down and spreads the load across multiple web servers. In this tutorial, we are going to use HAProxy as a Layer 4 Load Balancer for our Drupal website. We will have a proxy server, two web servers, and one database server, all running on Ubuntu 14.04 LTS (Trusty Tahr) 32-bit.

Load Balancing Diagram 2

The public will be able to access the proxy server; this server will then use an algorithm to redirect the user to a web server which will access the database server if needed and respond with generated web page or content requested.

In a production environment, there would be separate physical servers for each proxy, web, and database. However, for the sake of simplicity and availability, I will be using virtual machines.

Setting Up Vagrant

Vagrant is an easy tool you can use to quickly “create and configure lightweight, reproducible, and portable development environments.” Along with this, you will need to install VirtualBox to emulate these virtual machines.

Go ahead and create a folder for our Vagrant environment. For example, I will create mine at: C:\Users\{user-name}\Documents\Vagrant\drupal. Note: this directory will not contain any Drupal files.

Open up command prompt and run this command to download the Ubuntu 14.04 32-bit box:

This will download the pre-configured box for VirtualBox which will reduce the time required to create our later virtual machines.

In this directory, lets initialize Vagrant. To do this, run:

Load Balancing Vagrant Init

This will create a configuration file called “Vagrantfile” in that directory. This file will define basic settings for our virtual machines such as box, hostname, and public/private IP. Throughout this tutorial, we will keep adding configurations for the different servers we are going to set up.

Setting Up Web Server 1

Lets get started with this Load Balancing Setup, but starting with both of our web servers. These web servers will be the actual computers processing the web page, while the proxy server will just forward the requests.

To start, lets edit our Vagrantfile to look like this:

We will set the option to set up all our virtual machines as Ubuntu 14.04. We will give this specific virtual machine, a host name of web1. Forwarding ports allows a machine only connected to a private network, and not a public network to be accessed by the host computer. So for example, we set up the port 8884 to redirect to port 80 on the guest. As a result, if we visit 127.0.0.1:8884 on our host computer, it will be forwarded to port 80 of that virtual machine. We have set the private IP as 192.168.50.4. This is the type of IP that web1 and all other servers to be set up will use to communicate with each other. We have set the internal network name to “intnet” which we will need to keep constant throughout all servers.

Once you have saved this, go ahead and start up this server using this command:

This will import the Ubuntu box, and configure the server according to our settings in Vagrantfile. This will take longer the first time since it has to get everything set up.

Load Balancing Vagrant Up Web1

Once the virtual machine has booted up you can ssh into it using:

Load Balancing Vagrant SSH Web1

Installing Dependencies

Then lets update the package list and install the required software:

Before we start with downloading and running Drupal, we need to first configure PHP and Nginx.

Configuring PHP

Lets start with editing the configuration of php5-fpm:

In this file, find the option cgi.fix_pathinfo. By default, this value is set to 1. However, this is a security risk since you can access a file by typing in something that is close to it. Uncomment and change the value to 0:

Now you can save and exit out of this file since we are done with it.

Next, lets edit pool.d/www.conf:

Here, if not set to this value already, change the listen value to the php5-fpm unix domain socket:

Save and exit. Now lets restart the php5-fpm process for these changes to take effect:

Configuring Nginx

It is time to configure our web server, Nginx. To start, lets make a copy of the default site config in the available sites. Then, lets edit it:

Now make changes accordingly so your drupal virtual hosts file looks like this in the server block(Note: only replace corresponding lines with lines below, do not simply copy and paste the code):

This will listen to all requests coming from port 80. We will also comment out the IPv6 line. Change the root to where you will have your Drupal files. I have set my root to /var/www/drupal. Uncomment the 404 and 50x error lines. Comment out the line under “# With php5-cgi alone” since we have installed php5-fpm. Add the line above “# With php5-cgi alone” in the example to immediately return a 404 if an exact match is not found.

Once you have completed this, lets remove the default virtual host from the enabled sites, and make a symbolic link of drupal to the enabled sites from the available sites:

Now that we have configured everything, lets restart the nginx service:

Preparing Drupal

Lets go into our home directory and download the latest version of Drupal. I will be downloading 8.0.0-beta4. You can find the latest Drupal releases here.

Once this is finished downloading, we can extract the contents:

After the contents are extracted, we can now create the directory we defined as root in nginx. Lets create this folder and copy all the Drupal files to there:

Setting Permissions

Lets set the owner of all these files to our user account, in this case “vagrant”, and change the group to “www-data”, the web server user:

Now lets also add our user to the www-data group, and give that group permission to read and write to all files:

You should now be able to access the Drupal installation page by visiting 127.0.0.1:8884 which we set up in the port forwarding setting in the Vagrantfile. However, don’t install Drupal just yet! We still need to sync up both web servers.

 

Setting Up Web Server 2

There are two ways you can go about creating web server 2. You can either repeat the process of setting up web server 1 or you can create a clone of web server 1 and have vagrant set up the replica. I will be going with the latter.

First, lets create a replica box. To do this, halt your first webserver:

Then, lets create a package of that virtual machine:

This command will package everything in web1 into your Vagrant directory as package.box.

Load Balancing Vagrant Package Web1

Now, in your Vagrantfile, go ahead and add the configuration for web2:

You can give the box we just packaged a name like “drupal-nginx-php”. Set “config.vm.box_url” as the location of the package.box. We will be giving this machine a hostname of web2, port forwarding HTTP IP of 8886, and a private network IP of 192.168.50.5.

Now, lets start both of our servers again:

Synchronizing Drupal Between Web Servers

We need to synchronize the Drupal folder between both servers so if an image or module gets uploaded on one server, it will not be broken if another user connecting to the other servers is looking for that content.

There are many solutions to syncing file on two different servers such as GlusterFS, and NFS. While using GlusterFS, the core directory in Drupal always replied with a Input/output error. As a result, I will be using Unison to sync files across web1 and web2. As an alternative you can also use a script I wrote that utilize inotifywait and scp to sync files.

The benefit of using inotifywait with scp is that it will only sync whenever there is a file modified, created, deleted, or moved. However, this option is unidirectional, so you will need to run it on both servers. On the other hand, Unison is bidirectional and can run on one server only.

Syncing with Unison

Start by installing unison on both machines:

Once installed, run the command which will create the folder “~/.unison”:

Now, on web1, lets make a copy the default profile and edit our drupal profile:

Make your drupal profile for Unison look like this:

Save and exit. You can find more settings in the Unison User Manual.

You only need one server running Unison since it syncs bidirectionally, so you only need to create this profile file on one machine. The root variables in this file will tell unison which folders to sync. Since the second folder to sync is on another server, we give it a absolute folder via ssh.

To sync both files you can run:

The first time you run this, you might get a prompt asking you whether you want to add web2’s ECDSA key fingerprint to your known hosts. Type “yes”. It will also for the first time create archives of all the files which will take some time depending on the amount of files you have. For Drupal 8, it should take a minute or less.

Load Balancing Unison Drupal

However, there is a problem. This will only sync once and exit. We want unison to keep syncing the files. For this, we can use the -repeat flag.

Still there is an issue if we just run this from our SSH client. First, once we start Unison with repeat, it will start, but we cannot run any other command without stopping Unison. Second, we need to keep our terminal open at all times. If we close our client, a HUP or hangup will be sent to the server, and Unison will be stopped.

The Solution? We can use screen. We can start unison in screen and detach from it. Even if we exit our client, we can go back into the screen and see output from Unison.

If you do not already have screen installed:

Next you can start a new screen by typing:

It will look like your screen was cleared. Now lets starts unison with repeat mode set to watch, or whenever a file is updated:

Load Balancing Unison Drupal Repeat Watch

If repeat by watch does not work, you can set a value for how many seconds it should wait to sync again:

Load Balancing Unison Drupal Repeat 1

The above will sync the folders every second. You should keep it low when configuring Drupal, however once you have set up Drupal and don’t need constant syncing, you can increase the amount of time between each sync, set a cron job, or manually run the command whenever you update one server.

You can disconnect from the screen by keying Ctrl + A + D. This will keep the process running. If you ever want to reattach to the screen, get the screen id from the list:

Then, use the beginning characters of the screen to reattach:

Note: If you only have one screen open, using -r will automatically reattach you to that screen.

Syncing with drupal_sync (inotifywait + scp)

If you want to use my Bash script you will first need to install inotify-tools:

Once you have done this, transfer this file over to your home (~) directory through SFTP or another method:

Edit the variables at the top of the script to your server settings.

Before we can run this script, we need to add permissions to execute this file:

Now transfer it over to your other server with scp:

Note: you will also need to edit the variabled at the top of the script on the other server, web2, so that the “$REMOTE_HOST” is set to the private IP of web1.

We can run the file, however every time it will prompt you for the other server’s ssh password. To allow automatic access we need to make a pair of SSH keys:

Do not give a name, or pass-phrase, just press enter.

Once you have generated a pair of RSA keys, we need to add the public key to the authorized keys of the other server:

Now, after you complete this command by typing in the password of [email protected] again, the next time you ssh you should automatically be connected:

If you are taken to a new SSH screen, congratulations it has worked, to exit back into your other server:

Now repeat this process of generating RSA keys from web2 to web1, so web2 can access web1 without being prompted for a password.

Once both of your servers can SSH to each other automatically, you can start the drupal sync script on both servers:

Setting Up The Database Server

Now that we have successfully set up and synced both of our web servers, web1 and web2, it is time to set up the database server that is going to be accessed by both.

Start by adding the virtual machine settings in Vagrantfile:

We will give this server a host name of db and a private IP of 192.168.50.6.

Lets get the database server up and running:

Installing MySQL

Once the virtual machine has booted, lets ssh into the server, update package list, and install MySQL server:

While this is installing, it will ask you to make and confirm your root user password, make this complex!

Load Balancing Database Root Password

Once this has finished install, we need to run the actual MySQL installation:

By default, many things will be included in this install for testing. To be more secure, run this command:

Load Balancing Database Secure Installation

Configuring MySQL

As of right now, MySQL will only accept connections coming from the same computer, so we need to allow for it to accept connections from outside by enabling remote access.

Lets edit the MySQL configuration file:

Under the section [mysqld] change the value of bind-address from localhost to the private IP of the db server.

When you are finished, save and close out of the editor. Next, lets restart the MySQL server for the changes to take effect:

Creating MySQL Database & Users

Now that we have MySQL listening to a private IP that our web servers can access, we need to create MySQL users for our web servers.

Log in to MySQL as the root user with the password we set earlier:

First, lets create our drupal database:

Now lets create users for both web servers:

The IP following the name of your user should be the private IP’s of your web servers. I have set the password as “password” for both my users, however you should change it to something much more secure.

The last thing we need to do is grant privileges to these users at their respective IP address. For Drupal, you need to give these privileges to your users:

Once done, flush privileges and exit:

Our database server, db, is now configured and ready to be used by web1 and web2.

Setting Up The Proxy / Load Balancing Server

The only server left to set up is the load balancing server which acts as a proxy to both web servers. This may be the last to be set up, but certainly the most critical in this setup.

Lets add the server config to our Vagrantfile:

This server will have a host name of balancer. We have also declared a private IP of 192.168.50.6 and forward port 80 to port 9090 in our local machine. Once important different between the rest of the other server and this server is the fact we set in the configuration that we want to be connected to the public network. This will automatically assign and public IP to the server. We can then access this server at that IP from our host machine.

Lets start up the balancer server:

Since we have set this server to connect to the public network, it might ask you to choose an adapter. Try the first one, if it doesn’t work, halt and up balacer again, then choose the other option/s. For me, my Wi-Fi Adapter worked.

Finding the Public IP

Once the server has booted up, notice the order of your network adapters, it should look something like this:

Find where the “bridged” adapter is. Mine is set as bridged for Adapter 2.

Next, lets ssh into the balancer server.

To get our IP configuration, lets run this command:

Load Balancing ifconnfig

Since my second adapter was the bridged network, I will look at the second link: eth1 for my IP address. It will be printed after “inet addr”. My public IP for the balancer is 192.168.1.110. This will be the IP we will use to connect to the balancer/proxy in our browser.

Installing HAProxy

Now, lets update package list, and install haproxy:

Configuring HAProxy

Lets make it so that HAProxy will start on every boot of the server:

In this file, change the value of “ENABLED” to 1.

Save this file and exit.

Now, it’s time to set up the main HAProxy configuration:

We will let the global settings stay the same. Under “defaults”, change the mode and option from “http” to “tcp”:

Replacing “http” with “tcp” will tell HAProxy that we are going to be using a Layer 4 Load Balancing.

Next, we need to make a frontend, the balancer, with the address to listen, and which backend to point to. The frontend is the server the public can access, and the backend are servers that cannot be directly accessed, like web1 and web2. Add the www frontend:

You want to bind the public IP with port 80. So, whenever someone requests at port 80 to that public IP, it will use the default backend, “drupal-backend”. Now, lets define “drupal-backend”:

Here, we have defined that we want to use the round robin algorithm to chose a server for the request. There are many algorithms you can choose from. Here are a few:

  • roundrobin – “Each server is used in turns, according to their weights.”
  • leastconn – “The server with the lowest number of connections receives the connection.”
  • source – “The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request. This ensures that the same client IP address will always reach the same server as long as no server goes down or up.”
  • uri – “This algorithm hashes either the left part of the URI (before the question mark) or the whole URI (if the “whole” parameter is present) and divides the hash value by the total weight of the running servers. … This ensures that the same URI will always be directed to the same server as long as no server goes up or down.”

You can find more algorithms in the HAProxy Configuration Manual.

Again, the mode as “tcp” will specify we want Layer 4 and not Layer 7 Load Balancing. We also define our two servers with names “webserver1” and “webserver2”. The check is added at the end to check the health of the server before sending the request there. If one server fails, the other server will be sent all the requests.

We are done configuring HAProxy, so lets save and exit.

HAProxy Logging

If you want to enable logging, you need to edit /etc/rsyslog.conf:

Find and uncomment the first two lines, then add the third line:

Logging will now be enabled, and HAProxy logs can be viewed at: /var/log/haproxy.log once started.

When you are done editing this file, save and exit. Now, restart both rsyslog and haproxy services:

Installing Drupal

We are finally at the last step of getting Drupal working on a Load Balanced setup. It is time to install Drupal.

You should now be able to see the Drupal installation page when you visit the public IP of your load balancing server from your host machine.

Load Balancing Drupal Installation

Once you click “Save and continue” on the Profile selection page, settings.php and services.yml should appear in /var/www/drupal/sites/default/.

Load Balancing Drupal Sites Default Permissions

By default, Drupal will not be able to edit these as necessary because it does not have permission to write to these files as it’s user www-data. Since we set the group permissions to www-data, we need to give the group write permissions on these files:

Once done with the process, it is recommended you remove write permissions to keep your site secure:

You can now continue with the installation. Once the database options come up, you need to specify the private IP, database, user, and password we made during the MySQL Database server setup.

Load Balancing Drupal Database Configuration

Again, follow the configuration of Drupal, and you should soon come to your new Drupal install on a Layer 4 Load Balanced Setup using HAProxy.

Load Balancing Drupal Home Page

Conclusion

There you go, we now have a fully functional Layer 4 Load Balacing Setup using 4 servers running Drupal. As your traffic increases, this setup will reduce load on your servers by spreading your users out to different servers. It can also be fail-safe. If one day you wake up to find one of your servers down, throughout the night all your traffic was being sent to the other available server/s, so your users wouldn’t have experienced any downtime with your website.

To test if the load balancing is working and/or with your algorithm of choice, you can view the logs we enabled during the HAProxy server setup at /var/log/haproxy.log.

This blog post was made for Google Code-In 2014. This task has by far been the most challenging, and I have truly expanded my knowledge on Linux servers and Unix commands.

Troubleshooting

If you try running the command vagrant, and you get a response that says the command could not be found, try appending the directory of where vagrant.exe exists to the “PATH” variable in your System/User Environment Variables.

If you are having trouble when running vagrant ssh, this is because there is no ssh command on Windows by default. Luckily, if you have git, you can add the {git-directory}/bin to your System/User Evironment Variables to get access to many commands such as ssh, and ls on Windows. if you do no have git installed, you can use PuTTY to SSH to your server at 127.0.0.1:2200, or the assigned forwarded port for SSH.

If you get 50x errors while visiting your HTTP forwarded ports on web1 or web2, and have this in your nginx logs: “*1 connect() to unix:/var/run/php5-fpm.sock failed (13: Permission denied)”, you need to edit /etc/php5/fpm/pool.d/www.conf and uncomment these lines:

If you get PHP Fatat errors such as: “Unexpected [, expecting )”, this is because you have installed a version of php5-fpm below PHP 5.4. This error means your PHP version does not support short-hand array notation in PHP. Solution is to find a PPA to update your PHP version to 5.4+ or update to Ubuntu 14.04 which should have PHP 5.4+ with php5-fpm.

Sources

Author: Akshay Kalose

A teenager, who is interested in Computer Science, Information Technology, Programming, Web Designing, Engineering and Physical Sciences.

Leave a Reply

Your email address will not be published. Required fields are marked *