[LRUG] Puma, CLOSE_WAIT. Arg.

Simon Morley simon at polkaspots.com
Fri Feb 19 02:52:38 PST 2016


And there's this one, which I think the tb one is based off.

http://www.rrn.dk/running-ruby-process-callstack/

I've tried using GDB but the servers in question are docker containers and
seem to be missing something. I'm just left with the error:

warning: Unable to find libthread_db matching inferior's thread library,
thread debugging will not be available.

Which I couldn't solve with Googling. Tried adding many suggestions to
.gdbinit unsuccessfully and put down to docker.I had more success with a
non-docker vm but did not 100% understand what was happening. If you do a
talk, I'll be there for sure. Plus, the online material is really lacking.

The changes we made yesterday (adjusting os limits and adding puma config)
didn't sadly work. It's like the reconnect isn't happening if the servers
have been idle for a while...

If there's anyone using multiple dbs with Rails, what versions of the
mysql2 gem are you using. Down to my last ideas now.

S


On 19 February 2016 at 10:21, Ben Lovell <benjamin.lovell at gmail.com> wrote:

>
> On Feb 19 2016, at 9:20 am, Jon Wood <jon at ninjagiraffes.co.uk> wrote:
>>
>> Is there some documentation on using rb_backtrace and gdb for this? We
>> occasionally see similar issues and it would be great to know how to
>> properly debug them rather than the gut feel approach we've been using so
>> far.
>>
>
> Some quick googling threw up both [0] and [1] which at first glance seem
> like good introductory texts. Perhaps I should give a talk on these
> things...
>
> [0]
> https://blog.newrelic.com/2013/04/29/debugging-stuck-ruby-processes-what-to-do-before-you-kill-9/
> [1]
> https://robots.thoughtbot.com/using-gdb-to-inspect-a-running-ruby-process
>
>
>> On Fri, 19 Feb 2016 08:52 Ben Lovell <benjamin.lovell at gmail.com> wrote:
>>
>>
>>
>>
>>
>> Sent from my iPhone
>> On 19 Feb 2016, at 08:33, Riccardo Tacconi <rtacconi at gmail.com> wrote:
>>
>> Yes there are, you move from blocking to non-blocking, but with JRuby you
>> have parallel processing, so why sticking with MRI?
>>
>>
>> I'm one of JRuby's greatest fans and would always recommend it, but
>> you're saying this like it's a trivial change. It isn't. This is also
>> probably *the* worst time to be chucking semi-random tech into a stack for
>> some kind of trial-and-error approach.
>>
>> I'd recommend you attach a gdb to one of the stuck processes and
>> rb_backtrace() your way out. Once attached you can (depending on the state
>> of the stuck process) also execute arbitrary Ruby code to help with you
>> investigations by using ruby_eval(...).
>>
>> There should be plenty written about this, I'd recommend some googling.
>> Otherwise get in touch, I'll happily help you out (for a fee)
>>
>> Good luck,
>> Ben
>>
>>
>> On 18 February 2016 at 23:36, Glenn @ Ruby Pond Ltd <glenn at rubypond.com>
>> wrote:
>>
>> There's still a lot of benefit to using Puma, even if you're on MRI.
>> Admittedly not as much benefit as using it with JRuby.
>>
>> This comment from when Heroku recommended customer switch to Puma goes
>> through a basic example:
>> https://www.reddit.com/r/ruby/comments/2vjoxe/puma_is_now_the_recommended_ruby_webserver_on/coiypgp
>>
>> On 18 February 2016 at 23:43, Riccardo Tacconi <rtacconi at gmail.com>
>> wrote:
>>
>> Ruby MRI? If yes what's the point of using Puma? With MRI you have one
>> worker and one thread, which is very inefficient. Would be possible to
>> split HTTP requests handling from querying the DB? From Puma you could send
>> requests to a topic (MOM), and multiple workers could process requests and
>> each worker will have a DB connection. This could work with MRI, although
>> you will need more RAM. However I would try rubinius of jruby first.
>>
>> Sorry if I misunderstood, I did not follow the whole thread.
>>
>>
>> On 18 February 2016 at 12:28, Simon Morley <simon at polkaspots.com> wrote:
>>
>> Ruby 2.2.2
>> Rails 4.2.5.1
>> mysql2 0.4.2 (tried a few)
>> Puma 2.16.0
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> Got an unlicensed Meraki? Set it free with Cucumber
>> cucumberwifi.io/meraki
>>
>>
>> On 18 February 2016 at 12:24, Riccardo Tacconi <rtacconi at gmail.com>
>> wrote:
>>
>> Which version of Ruby are you using?
>>
>> On 18 February 2016 at 12:17, Simon Morley <simon at polkaspots.com> wrote:
>>
>> Actually puma docs suggest doing that when using preload_app and
>> ActiveRecord...
>>
>> https://github.com/puma/puma#clustered-mode
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> Got an unlicensed Meraki? Set it free with Cucumber
>> cucumberwifi.io/meraki
>>
>>
>> On 18 February 2016 at 12:05, Frederick Cheung <
>> frederick.cheung at gmail.com> wrote:
>>
>>
>>
>>
>> On 18 February 2016 at 11:17:34, Simon Morley (simon at polkaspots.com)
>> wrote:
>>
>>
>> class RadiusDatabase
>>   self.abstract_class = true
>>   establish_connection "radius_#{Rails.env}".to_sym
>> end
>>
>> class Radacct < RadiusDatabase
>> end
>>
>> Then I decreased our database pool from 20 to 5 and added a wait_timeout
>> of 5 (since there seems to be some discrepancies with this). Things got
>> much better (but weren't fixed).
>>
>> I tried querying differently, including using
>> connection_pool.with_connection. I've tried closing the connections
>> manually and also used ActiveRecord::Base.clear_active_connections!
>> periodically. No joy.
>>
>> By this point, we were running 2-4 instances - handling around very
>> little traffic in total (about 50rpm). Every few hours, they'd block, all
>> of them. At the same time, we'd see a load of rack timeouts - same DB. I've
>> checked the connections - they were each opening only a few to MySQL and
>> MySQL was looking good.
>>
>> One day, by chance, I reduced the 4 instances to 1. *And the problem is
>> solved!!! WHAT*? Obviously the problem isn't solved, we can only use a
>> single server.
>>
>>
>> Are you using puma in the mode where it forks workers? if so, then you
>> want to reconnect post fork or multiple processes will share the same file
>> descriptor and really weird shit will happen.
>>
>> The puma readme advises to do this:
>>
>> before_fork do
>>   ActiveRecord::Base.connection_pool.disconnect!
>> end
>>
>> I don't know off the top of my head whether that  will do the job for
>> classes that have established a connection to a different db - presumably
>> they have a separate connection pool
>>
>> Fred
>>
>> I don't know what's going on here. Have I been staring at this for too
>> long (yes)?
>>
>> Our other servers are chugging along happily now, using a connection pool
>> of 20, no errors, no timeouts (different db though).
>>
>> Has anyone got any suggestions / seen this? Is there something
>> fundamentally wrong with the way we're establishing a connection to the
>> external dbs? Surely this is MySQL related
>>
>> Thanks for listening,
>>
>> S
>>
>>
>> Simon Morley
>>
>> Got an unlicensed Meraki? Set it free with Cucumber
>> cucumberwifi.io/meraki
>>
>>
>> On 15 January 2016 at 13:58, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> The understanding of difficult problems/bugs and the learning that comes
>> with it cannot be rushed. Each and every one of us has his / her own pace,
>> and all "speeds" are perfectly fine. The only question that really matters
>> is whether it's worth it (a.k.a. the cost of lost opportunity). If the
>> answer is yes, plough on. If not, look for alternatives.
>>
>> Not everyone likes or wants to run their own infrastructure. The monthly
>> savings on the PaaS, IaaS advertised costs are undisputed, but few like to
>> think - never mind talk - about how many hours / days / weeks have been
>> spent debugging obscure problems which "solve themselves" on a managed
>> environment. Don't get me started on those that are building their own
>> Docker-based PaaS-es without even realising it...
>>
>> As a side-note, I've been dealing with a similar TCP-related problem for
>> a while now, so I could empathise with your struggles the second I've seen
>> your post. One of us is bound to solve it first, and I hope it will be you
>> ; )
>>
>> Have a good one, Gerhard.
>>
>> On Fri, Jan 15, 2016 at 10:01 AM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> You must be more patient that I am. It's been a long month - having said
>> that, I'm excited to find the cause.
>>
>> I misunderstood you re. file descriptors. We checked the kernel limits /
>> files open on the systems before and during and there's nothing untoward.
>>
>> Since writing in, it's not happened as before - no doubt it'll take place
>> during our forthcoming office move today.
>>
>> I ran a strace (thanks for that suggestion John) on a couple of processes
>> yesterday and saw redis blocking. Restarted a few redis servers to see if
>> that helped. Can't be certain yet.
>>
>> As soon as it's on, I'll run a tcpdump. How I'd not thought about that I
>> don't know...
>>
>> Actually, this is one thing I dislike about Rails - it's so nice and easy
>> to do everything, one forgets we're dealing with the real servers /
>> components / connections. It's too abstract in ways, but that's a whole
>> other debate :)
>>
>> S
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> simon at PolkaSpots.com <simon at polkaspots.com>
>> Linkedin: I'm on it again and it still sucks
>> 020 7183 1471
>>
>> 🚀💥
>>
>> On 15 January 2016 at 06:53, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> File descriptors, for traditional reasons, include TCP connections.
>>
>> Are you logging all requests to a central location? When the problem
>> occurs, it might help taking a closer look at the type of requests you're
>> receiving.
>>
>> Depending on how long the mischief lasts, a tcpdump to pcap, then
>> wireshark might help. Same for an strace on the Puma processes, similar to
>> what John suggested . Those are low level tools though, verbose, complex
>> and complete, it's easy to get lost unless you know what you're looking for.
>>
>> In summary, CLOSE_WAITs piling up from haproxy (client role) to Puma
>> (server role) indicates the app not closing connections in time (or maybe
>> ever) - why? It's a fun one to troubleshoot ; )
>>
>> On Thu, Jan 14, 2016 at 11:35 PM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> Right now, none of the servers have any issues. No close_waits.
>>
>> All is well. Seemingly.
>>
>> When it occurs ALL the servers end up going. Sometimes real fast. That's
>> why I thought we had a db bottleneck. It happens pretty quickly, randomly,
>> no particular times.
>>
>> We don't ever really get spikes of traffic, there's an even load inbound
>> throughout.
>>
>> I thought we had someone running a slow loris style attack on us. So I
>> added some rules to HA Proxy and Cloudflare ain't seen nofin honest guv.
>>
>> Will find a way to chart it and send a link over.
>>
>> Will see if we're not closing any files - not much of that going on.
>> There's some manual gzipping happening - we've had that in place for over a
>> year though - not sure why it'd start playing up now. Memory usage is high
>> but consistent and doesn't increase.
>>
>> S
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> simon at PolkaSpots.com <simon at polkaspots.com>
>> Linkedin: I'm on it again and it still sucks
>> 020 7183 1471
>>
>> 🚀💥
>>
>> On 14 January 2016 at 22:14, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> That sounds like a file descriptor leak. Are the CLOSE_WAITs growing over
>> time?
>>
>> You're right, New Relic is too high level, this is a layer 4-5 issue.
>>
>> The simplest thing that can plot some graphs will work. Throw the
>> dirtiest script together that curls the data out if it comes easy, it
>> doesn't matter how you get those metrics as long as you have them.
>>
>> This is a great blog post opportunity ; )
>>
>> On Thu, Jan 14, 2016 at 8:40 PM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> I would ordinarily agree with you about the connection however they hang
>> around for hours sometimes.
>>
>> The 500 in the hyproxy config was actually left over from a previous
>> experiment. Realistically I know they won't cope with that.
>>
>> Using another server was to find any issues with puma. I'm still going to
>> try unicorn just in case.
>>
>> Will up the numbers too - thanks for that suggestion.
>>
>> I'll look at a better monitoring tool too. So far new relic hasn't helped
>> much.
>>
>> Thanks
>>
>> S
>>
>> Simon Morley
>> Big Chief | PolkaSpots Supafly Wi-Fi
>>
>>
>>
>> I'm doing it with Cucumber Tony. Are you?
>>
>> On 14 Jan 2016, at 20:30, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> Hi Simon,
>>
>> CLOSE_WAIT suggests that Puma is not closing connections fast enough. The
>> client has asked for the connection to be closed, but Puma is busy.
>>
>> Quickest win would be to increase your Puma instances. Unicorn won't help
>> - or any other Rack web server for the matter.
>>
>> Based on your numbers, start with 10 Puma instances. Anything more than
>> 100 connections for a Rails instance is not realistic. I would personally
>> go with 50, just to be safe. I think I saw 500 conns in your haproxy
>> config, which is way too optimistic.
>>
>> You want metrics for detailed CPU usage by process, connections open with
>> state by process, and memory usage, by process. Without these, you're
>> flying blind. Any suggestions anyone makes without real metrics - including
>> myself - are just guesses. You'll get there, but you're making it far too
>> difficult for yourself.
>>
>> Let me know how it goes, Gerhard.
>>
>> On Thu, Jan 14, 2016 at 3:16 PM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> Hello All
>>
>> We've been battling with Puma for a long while now, I'm looking for some
>> help / love / attention / advice / anything to prevent further hair loss.
>>
>> We're using it in a reasonably typical Rails 4 application behind Nginx.
>>
>> Over the last 3 months, our requests have gone from 500 rpm to a little
>> over 1000 depending on the hour. Over this period, we've been seeing weird
>> CLOSE_WAIT conns appearing in netstat, which eventually kill the servers.
>>
>> We have 3 Rails servers behind Haproxy running things. Load is generally
>> even.
>>
>> Running netstat on the servers shows a pile of connections in the
>> CLOSE_WAIT state with varying recv-q values as so:
>>
>> tcp      2784    0 localhost:58786         localhost:5100
>>  CLOSE_WAIT
>> tcp      717      0 localhost:35794         localhost:5100
>>  CLOSE_WAIT
>> tcp      784      0 localhost:55712         localhost:5100
>>  CLOSE_WAIT
>> tcp        0        0 localhost:38639         localhost:5100
>>  CLOSE_WAIT
>>
>> That's just a snippet. A wc reveals over 400 of these on each server.
>>
>> Puma is running on port 5100 btw. We've tried puma with multiple threads
>> and a single one - same result. Latest version as of today.
>>
>> I've checked haproxy and don't see much lingering around.
>>
>> Only a kill -9 can stop Puma - otherwise, it says something like 'waiting
>> for requests to finish'
>>
>> I ran GDB to see if I could debug the process however I can't claim I
>> knew what I was looking at. The processes that seemed apparent were event
>> machine and mongo.
>>
>> We then ditched EM (we were using the AMQP gem) in favour of Bunny. That
>> made zero difference.
>>
>> So we upgraded Mongo and Mongoid to the latest versions, neither of which
>> helped.
>>
>> I thought we might have a bottleneck somewhere - Mongo, ES or MySQL. But,
>> none of those services seem to have any issues / latencies.
>>
>> It's also 100% random. Might happen 10 times in an hour, then not at all
>> for a week.
>>
>> The puma issues on github don't shed much light.
>>
>> I don't really know where to turn at the moment or what to do next? I was
>> going to resort back to Unicorn but I don't think the issue is that side
>> and I wanted to fix the problem, not just patch it up.
>>
>> It's starting to look like a nasty in my code somewhere but I don't want
>> to go down that route just yet...
>>
>> Sorry for the long email, thanks in advance. Stuff.
>>
>> I hope someone can help!
>>
>> S
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> simon at PolkaSpots.com <simon at polkaspots.com>
>> Linkedin: I'm on it again and it still sucks
>> 020 7183 1471
>>
>> 🚀💥
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>>
>> --
>> Riccardo Tacconi
>>
>> http://github.com/rtacconi
>> http://twitter.com/rtacconi
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>>
>> --
>> Riccardo Tacconi
>>
>> http://github.com/rtacconi
>> http://twitter.com/rtacconi
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>>
>> --
>> Riccardo Tacconi
>>
>> http://github.com/rtacconi
>> http://twitter.com/rtacconi
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
>



Simon Morley

Big Chief | PolkaSpots Supafly Wi-Fi
Bigger Chief | Cucumber Tony

simon at PolkaSpots.com
Linkedin: I'm on it again and it still sucks
020 7183 1471

🚀💥

On 19 February 2016 at 10:21, Ben Lovell <benjamin.lovell at gmail.com> wrote:

>
> On Feb 19 2016, at 9:20 am, Jon Wood <jon at ninjagiraffes.co.uk> wrote:
>>
>> Is there some documentation on using rb_backtrace and gdb for this? We
>> occasionally see similar issues and it would be great to know how to
>> properly debug them rather than the gut feel approach we've been using so
>> far.
>>
>
> Some quick googling threw up both [0] and [1] which at first glance seem
> like good introductory texts. Perhaps I should give a talk on these
> things...
>
> [0]
> https://blog.newrelic.com/2013/04/29/debugging-stuck-ruby-processes-what-to-do-before-you-kill-9/
> [1]
> https://robots.thoughtbot.com/using-gdb-to-inspect-a-running-ruby-process
>
>
>> On Fri, 19 Feb 2016 08:52 Ben Lovell <benjamin.lovell at gmail.com> wrote:
>>
>>
>>
>>
>>
>> Sent from my iPhone
>> On 19 Feb 2016, at 08:33, Riccardo Tacconi <rtacconi at gmail.com> wrote:
>>
>> Yes there are, you move from blocking to non-blocking, but with JRuby you
>> have parallel processing, so why sticking with MRI?
>>
>>
>> I'm one of JRuby's greatest fans and would always recommend it, but
>> you're saying this like it's a trivial change. It isn't. This is also
>> probably *the* worst time to be chucking semi-random tech into a stack for
>> some kind of trial-and-error approach.
>>
>> I'd recommend you attach a gdb to one of the stuck processes and
>> rb_backtrace() your way out. Once attached you can (depending on the state
>> of the stuck process) also execute arbitrary Ruby code to help with you
>> investigations by using ruby_eval(...).
>>
>> There should be plenty written about this, I'd recommend some googling.
>> Otherwise get in touch, I'll happily help you out (for a fee)
>>
>> Good luck,
>> Ben
>>
>>
>> On 18 February 2016 at 23:36, Glenn @ Ruby Pond Ltd <glenn at rubypond.com>
>> wrote:
>>
>> There's still a lot of benefit to using Puma, even if you're on MRI.
>> Admittedly not as much benefit as using it with JRuby.
>>
>> This comment from when Heroku recommended customer switch to Puma goes
>> through a basic example:
>> https://www.reddit.com/r/ruby/comments/2vjoxe/puma_is_now_the_recommended_ruby_webserver_on/coiypgp
>>
>> On 18 February 2016 at 23:43, Riccardo Tacconi <rtacconi at gmail.com>
>> wrote:
>>
>> Ruby MRI? If yes what's the point of using Puma? With MRI you have one
>> worker and one thread, which is very inefficient. Would be possible to
>> split HTTP requests handling from querying the DB? From Puma you could send
>> requests to a topic (MOM), and multiple workers could process requests and
>> each worker will have a DB connection. This could work with MRI, although
>> you will need more RAM. However I would try rubinius of jruby first.
>>
>> Sorry if I misunderstood, I did not follow the whole thread.
>>
>>
>> On 18 February 2016 at 12:28, Simon Morley <simon at polkaspots.com> wrote:
>>
>> Ruby 2.2.2
>> Rails 4.2.5.1
>> mysql2 0.4.2 (tried a few)
>> Puma 2.16.0
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> Got an unlicensed Meraki? Set it free with Cucumber
>> cucumberwifi.io/meraki
>>
>>
>> On 18 February 2016 at 12:24, Riccardo Tacconi <rtacconi at gmail.com>
>> wrote:
>>
>> Which version of Ruby are you using?
>>
>> On 18 February 2016 at 12:17, Simon Morley <simon at polkaspots.com> wrote:
>>
>> Actually puma docs suggest doing that when using preload_app and
>> ActiveRecord...
>>
>> https://github.com/puma/puma#clustered-mode
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> Got an unlicensed Meraki? Set it free with Cucumber
>> cucumberwifi.io/meraki
>>
>>
>> On 18 February 2016 at 12:05, Frederick Cheung <
>> frederick.cheung at gmail.com> wrote:
>>
>>
>>
>>
>> On 18 February 2016 at 11:17:34, Simon Morley (simon at polkaspots.com)
>> wrote:
>>
>>
>> class RadiusDatabase
>>   self.abstract_class = true
>>   establish_connection "radius_#{Rails.env}".to_sym
>> end
>>
>> class Radacct < RadiusDatabase
>> end
>>
>> Then I decreased our database pool from 20 to 5 and added a wait_timeout
>> of 5 (since there seems to be some discrepancies with this). Things got
>> much better (but weren't fixed).
>>
>> I tried querying differently, including using
>> connection_pool.with_connection. I've tried closing the connections
>> manually and also used ActiveRecord::Base.clear_active_connections!
>> periodically. No joy.
>>
>> By this point, we were running 2-4 instances - handling around very
>> little traffic in total (about 50rpm). Every few hours, they'd block, all
>> of them. At the same time, we'd see a load of rack timeouts - same DB. I've
>> checked the connections - they were each opening only a few to MySQL and
>> MySQL was looking good.
>>
>> One day, by chance, I reduced the 4 instances to 1. *And the problem is
>> solved!!! WHAT*? Obviously the problem isn't solved, we can only use a
>> single server.
>>
>>
>> Are you using puma in the mode where it forks workers? if so, then you
>> want to reconnect post fork or multiple processes will share the same file
>> descriptor and really weird shit will happen.
>>
>> The puma readme advises to do this:
>>
>> before_fork do
>>   ActiveRecord::Base.connection_pool.disconnect!
>> end
>>
>> I don't know off the top of my head whether that  will do the job for
>> classes that have established a connection to a different db - presumably
>> they have a separate connection pool
>>
>> Fred
>>
>> I don't know what's going on here. Have I been staring at this for too
>> long (yes)?
>>
>> Our other servers are chugging along happily now, using a connection pool
>> of 20, no errors, no timeouts (different db though).
>>
>> Has anyone got any suggestions / seen this? Is there something
>> fundamentally wrong with the way we're establishing a connection to the
>> external dbs? Surely this is MySQL related
>>
>> Thanks for listening,
>>
>> S
>>
>>
>> Simon Morley
>>
>> Got an unlicensed Meraki? Set it free with Cucumber
>> cucumberwifi.io/meraki
>>
>>
>> On 15 January 2016 at 13:58, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> The understanding of difficult problems/bugs and the learning that comes
>> with it cannot be rushed. Each and every one of us has his / her own pace,
>> and all "speeds" are perfectly fine. The only question that really matters
>> is whether it's worth it (a.k.a. the cost of lost opportunity). If the
>> answer is yes, plough on. If not, look for alternatives.
>>
>> Not everyone likes or wants to run their own infrastructure. The monthly
>> savings on the PaaS, IaaS advertised costs are undisputed, but few like to
>> think - never mind talk - about how many hours / days / weeks have been
>> spent debugging obscure problems which "solve themselves" on a managed
>> environment. Don't get me started on those that are building their own
>> Docker-based PaaS-es without even realising it...
>>
>> As a side-note, I've been dealing with a similar TCP-related problem for
>> a while now, so I could empathise with your struggles the second I've seen
>> your post. One of us is bound to solve it first, and I hope it will be you
>> ; )
>>
>> Have a good one, Gerhard.
>>
>> On Fri, Jan 15, 2016 at 10:01 AM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> You must be more patient that I am. It's been a long month - having said
>> that, I'm excited to find the cause.
>>
>> I misunderstood you re. file descriptors. We checked the kernel limits /
>> files open on the systems before and during and there's nothing untoward.
>>
>> Since writing in, it's not happened as before - no doubt it'll take place
>> during our forthcoming office move today.
>>
>> I ran a strace (thanks for that suggestion John) on a couple of processes
>> yesterday and saw redis blocking. Restarted a few redis servers to see if
>> that helped. Can't be certain yet.
>>
>> As soon as it's on, I'll run a tcpdump. How I'd not thought about that I
>> don't know...
>>
>> Actually, this is one thing I dislike about Rails - it's so nice and easy
>> to do everything, one forgets we're dealing with the real servers /
>> components / connections. It's too abstract in ways, but that's a whole
>> other debate :)
>>
>> S
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> simon at PolkaSpots.com <simon at polkaspots.com>
>> Linkedin: I'm on it again and it still sucks
>> 020 7183 1471
>>
>> 🚀💥
>>
>> On 15 January 2016 at 06:53, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> File descriptors, for traditional reasons, include TCP connections.
>>
>> Are you logging all requests to a central location? When the problem
>> occurs, it might help taking a closer look at the type of requests you're
>> receiving.
>>
>> Depending on how long the mischief lasts, a tcpdump to pcap, then
>> wireshark might help. Same for an strace on the Puma processes, similar to
>> what John suggested . Those are low level tools though, verbose, complex
>> and complete, it's easy to get lost unless you know what you're looking for.
>>
>> In summary, CLOSE_WAITs piling up from haproxy (client role) to Puma
>> (server role) indicates the app not closing connections in time (or maybe
>> ever) - why? It's a fun one to troubleshoot ; )
>>
>> On Thu, Jan 14, 2016 at 11:35 PM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> Right now, none of the servers have any issues. No close_waits.
>>
>> All is well. Seemingly.
>>
>> When it occurs ALL the servers end up going. Sometimes real fast. That's
>> why I thought we had a db bottleneck. It happens pretty quickly, randomly,
>> no particular times.
>>
>> We don't ever really get spikes of traffic, there's an even load inbound
>> throughout.
>>
>> I thought we had someone running a slow loris style attack on us. So I
>> added some rules to HA Proxy and Cloudflare ain't seen nofin honest guv.
>>
>> Will find a way to chart it and send a link over.
>>
>> Will see if we're not closing any files - not much of that going on.
>> There's some manual gzipping happening - we've had that in place for over a
>> year though - not sure why it'd start playing up now. Memory usage is high
>> but consistent and doesn't increase.
>>
>> S
>>
>>
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> simon at PolkaSpots.com <simon at polkaspots.com>
>> Linkedin: I'm on it again and it still sucks
>> 020 7183 1471
>>
>> 🚀💥
>>
>> On 14 January 2016 at 22:14, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> That sounds like a file descriptor leak. Are the CLOSE_WAITs growing over
>> time?
>>
>> You're right, New Relic is too high level, this is a layer 4-5 issue.
>>
>> The simplest thing that can plot some graphs will work. Throw the
>> dirtiest script together that curls the data out if it comes easy, it
>> doesn't matter how you get those metrics as long as you have them.
>>
>> This is a great blog post opportunity ; )
>>
>> On Thu, Jan 14, 2016 at 8:40 PM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> I would ordinarily agree with you about the connection however they hang
>> around for hours sometimes.
>>
>> The 500 in the hyproxy config was actually left over from a previous
>> experiment. Realistically I know they won't cope with that.
>>
>> Using another server was to find any issues with puma. I'm still going to
>> try unicorn just in case.
>>
>> Will up the numbers too - thanks for that suggestion.
>>
>> I'll look at a better monitoring tool too. So far new relic hasn't helped
>> much.
>>
>> Thanks
>>
>> S
>>
>> Simon Morley
>> Big Chief | PolkaSpots Supafly Wi-Fi
>>
>>
>>
>> I'm doing it with Cucumber Tony. Are you?
>>
>> On 14 Jan 2016, at 20:30, Gerhard Lazu <gerhard at lazu.co.uk> wrote:
>>
>> Hi Simon,
>>
>> CLOSE_WAIT suggests that Puma is not closing connections fast enough. The
>> client has asked for the connection to be closed, but Puma is busy.
>>
>> Quickest win would be to increase your Puma instances. Unicorn won't help
>> - or any other Rack web server for the matter.
>>
>> Based on your numbers, start with 10 Puma instances. Anything more than
>> 100 connections for a Rails instance is not realistic. I would personally
>> go with 50, just to be safe. I think I saw 500 conns in your haproxy
>> config, which is way too optimistic.
>>
>> You want metrics for detailed CPU usage by process, connections open with
>> state by process, and memory usage, by process. Without these, you're
>> flying blind. Any suggestions anyone makes without real metrics - including
>> myself - are just guesses. You'll get there, but you're making it far too
>> difficult for yourself.
>>
>> Let me know how it goes, Gerhard.
>>
>> On Thu, Jan 14, 2016 at 3:16 PM, Simon Morley <simon at polkaspots.com>
>> wrote:
>>
>> Hello All
>>
>> We've been battling with Puma for a long while now, I'm looking for some
>> help / love / attention / advice / anything to prevent further hair loss.
>>
>> We're using it in a reasonably typical Rails 4 application behind Nginx.
>>
>> Over the last 3 months, our requests have gone from 500 rpm to a little
>> over 1000 depending on the hour. Over this period, we've been seeing weird
>> CLOSE_WAIT conns appearing in netstat, which eventually kill the servers.
>>
>> We have 3 Rails servers behind Haproxy running things. Load is generally
>> even.
>>
>> Running netstat on the servers shows a pile of connections in the
>> CLOSE_WAIT state with varying recv-q values as so:
>>
>> tcp      2784    0 localhost:58786         localhost:5100
>>  CLOSE_WAIT
>> tcp      717      0 localhost:35794         localhost:5100
>>  CLOSE_WAIT
>> tcp      784      0 localhost:55712         localhost:5100
>>  CLOSE_WAIT
>> tcp        0        0 localhost:38639         localhost:5100
>>  CLOSE_WAIT
>>
>> That's just a snippet. A wc reveals over 400 of these on each server.
>>
>> Puma is running on port 5100 btw. We've tried puma with multiple threads
>> and a single one - same result. Latest version as of today.
>>
>> I've checked haproxy and don't see much lingering around.
>>
>> Only a kill -9 can stop Puma - otherwise, it says something like 'waiting
>> for requests to finish'
>>
>> I ran GDB to see if I could debug the process however I can't claim I
>> knew what I was looking at. The processes that seemed apparent were event
>> machine and mongo.
>>
>> We then ditched EM (we were using the AMQP gem) in favour of Bunny. That
>> made zero difference.
>>
>> So we upgraded Mongo and Mongoid to the latest versions, neither of which
>> helped.
>>
>> I thought we might have a bottleneck somewhere - Mongo, ES or MySQL. But,
>> none of those services seem to have any issues / latencies.
>>
>> It's also 100% random. Might happen 10 times in an hour, then not at all
>> for a week.
>>
>> The puma issues on github don't shed much light.
>>
>> I don't really know where to turn at the moment or what to do next? I was
>> going to resort back to Unicorn but I don't think the issue is that side
>> and I wanted to fix the problem, not just patch it up.
>>
>> It's starting to look like a nasty in my code somewhere but I don't want
>> to go down that route just yet...
>>
>> Sorry for the long email, thanks in advance. Stuff.
>>
>> I hope someone can help!
>>
>> S
>>
>> Simon Morley
>>
>> Big Chief | PolkaSpots Supafly Wi-Fi
>> Bigger Chief | Cucumber Tony
>>
>> simon at PolkaSpots.com <simon at polkaspots.com>
>> Linkedin: I'm on it again and it still sucks
>> 020 7183 1471
>>
>> 🚀💥
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>>
>> --
>> Riccardo Tacconi
>>
>> http://github.com/rtacconi
>> http://twitter.com/rtacconi
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>>
>> --
>> Riccardo Tacconi
>>
>> http://github.com/rtacconi
>> http://twitter.com/rtacconi
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
>>
>>
>> --
>> Riccardo Tacconi
>>
>> http://github.com/rtacconi
>> http://twitter.com/rtacconi
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>> _______________________________________________
>> Chat mailing list
>> Chat at lists.lrug.org
>> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
>> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
>> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>>
>>
> _______________________________________________
> Chat mailing list
> Chat at lists.lrug.org
> Archives: http://lists.lrug.org/pipermail/chat-lrug.org
> Manage your subscription: http://lists.lrug.org/options.cgi/chat-lrug.org
> List info: http://lists.lrug.org/listinfo.cgi/chat-lrug.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lrug.org/pipermail/chat-lrug.org/attachments/20160219/a2881f59/attachment.html>


More information about the Chat mailing list