[LRUG] Finding out why server was unresponsive

Andrew Stewart boss at airbladesoftware.com
Tue Apr 16 02:42:16 PDT 2013


Good morning El Rug,

Yesterday afternoon one of my servers stopped responding to HTTP and SSH.  I eventually got it back by executing a hardware reset via the host's (Hetzner) web GUI.  I have no idea what the problem was.

The server runs Ubuntu 12.04 LTS.  It's been in production for a couple of months and hadn't had any downtime before yesterday.

All the following logs were silent during the outage:

- unicorn.std{out,err}.log
- production.log
- /var/log/kern.log
- /var/log/syslog
- /var/log/auth.log
- /var/log/nginx/{access,error}.log

I was running an mtr traceroute the whole time which showed packets making it into the host's network but failing to reach my server.

New Relic shows nothing abnormal.  Memory use and CPU load were low as usual.

I would dearly like to establish what happened so I can (try to) prevent it happening again...but I'm stumped.

Any ideas?

Many thanks in advance,

Andy Stewart
-------
http://airbladesoftware.com




More information about the Chat mailing list