<div dir="ltr">Hi Andy <div><br></div><div>So much to write about this. Where to start?<div><br></div><div>Having scaled from one server to dozens, where we're running in two DCs replicating MySQL across multiple sites, I have a lot of thoughts on, and experience with, this. What I would say in the first instance is: keep it simple. I wouldn't advise trying to do ALL THE THINGS and aiming for fully redundant, automated failover straight away. And a lot of this is incredible cost and time sensitive, so don't go there. At least not right away.</div>


<div><br></div><div>Is this a DR solution for catastrophic failure of your primary site (DC down, unrecoverable)? Or do you just want to improve redundancy in your stack so you can continue servicing requests if a single point of failure goes down?<br>


</div><div><br></div><div style>I'm presuming the second, so I won't go into the first (but that's an interesting thread in itself). In which case, there are three main considerations:</div><div style><br></div>


<div style>1. App server redundancy</div><div style>2. Job server redundancy</div><div style>3. Database redundancy</div><div style><br></div><div style>The first two are relatively simple assuming your app is stateless. You can load balance between multiple app servers using something like nginx, hardware (pricey) or even <a href="http://dyn.com/dns/dynect-managed-dns/traffic-management-load-balancing-round-robin-cdn-manager/">DNS</a>. Similarly, if you're using Delayed Job for queuing then you can just spin up multiple job servers. This involves breaking our your app into a stack of these distinct components (what I did at FreeAgent back in the day, which is still effectively the same now - just more moving parts) and deploying to all of them as part of your Cap process. This is all reasonably straightforward. We actually used hardware load balancing originally (someone else's problem at the time - throw money at it), but in time we moved to nginx to do it in software. Anyway, you get the idea.</div>


<div style><br></div><div style>The database is a whole different problem. I would suggest not considering auto-failover of this layer at all, at least not right now. Keep the DB as a single point of failure (in terms of app uptime) but ensure you have data redundancy. The best option here is maintaining a synchronous replica for (almost) zero data loss, a simpler option would be full and incremental backups that are transferred offsite but here you may lose data if the server fails catastrophically and is unrecoverable. Depending on your risk appetite, this may be something you could live with. </div>


<div style><br></div><div style>Once you have a replica (slave), you can build a process for switching the slave to master and vice versa. Keep this manual, at least for now. Worst case if your master goes down, you'll have a tried and tested failover process which may take worst case 30 mins to get the service up and running with no data loss. I think that's pretty robust for your needs right now. There are questions about where the replica lives (same DC and network, your life is easier again depending on risk appetite).</div>


<div style><br></div><div style>I hope this rambling helps at some level.</div><div style><br></div>

<div><br></div><div style>Cheers,</div><div style>Olly</div><div><br></div><div style><span style="font-size:13px;font-family:Helvetica">--</span><br><span style="font-size:13px;font-family:Helvetica"><b>Olly Headey :: Co-Founder and CTO</b></span><br>


<span style="font-size:13px;font-family:Helvetica">FreeAgent</span><br><span style="font-size:13px;font-family:Helvetica"><a href="http://www.freeagent.com">www.freeagent.com</a></span><br><span style="font-size:13px;font-family:Helvetica"></span><br>


<span style="font-size:13px;font-family:Helvetica">Follow <a href="http://twitter.com/freeagent">@freeagent</a> on Twitter</span><br><br></div><div style><br></div><div><br></div><div><br></div></div></div><div class="gmail_extra">


<br><br><div class="gmail_quote">On Tue, Apr 16, 2013 at 11:50 AM, Andrew Stewart <span dir="ltr"><<a href="mailto:boss@airbladesoftware.com" target="_blank">boss@airbladesoftware.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Good afternoon El Rug,<br>

<br>

What's the best way to increase from one server to two?<br>

<br>

Currently I have everything for my webapp – code, database, background jobs, etc – on one server.  Performance is fine but it's a single point of failure (see this morning's email thread).  Off the top of my head I'm thinking:<br>


<br>

- Use a different host in a different city from my current server.<br>

- Install same operating system as current server and set up identically via Chef/whatever.<br>

- Deploy all code changes to both servers with Capistrano but have second server serving Rails maintenance page (just in case anybody finds it).<br>

- Ideally set up live (mysql) replication...somehow.<br>

- If/when first server croaks, manually fail over to second server via changing DNS.<br>

<br>

I'm sure it's more complicated than that, particularly the switching from one server to the other (and back).  Does anybody have any tips?<br>

<br>

Thanks again,<br>

<br>

Andy Stewart<br>

_______________________________________________<br>

Chat mailing list<br>

<a href="mailto:Chat@lists.lrug.org">Chat@lists.lrug.org</a><br>

<a href="http://lists.lrug.org/listinfo.cgi/chat-lrug.org" target="_blank">http://lists.lrug.org/listinfo.cgi/chat-lrug.org</a><br>

</blockquote></div><br></div>