Upgrade coming, and thanks for the all donations…

Well, I have to say thanks to all of those who dipped into their pockets and donated nearly £300 to the Slugger tip jar (2/3 of which will pay off my mobile phone bill for the heated election period… You’ll have noticed the tendency of Slugger to cowpe (ie, fall over) at the least surge (mostly at lunchtimes)… Well now we’re subtantially upgrading the server (with a second CPU and 2 extra gigs of memory) in hopes of heading further crashes off at the pass… Donations will, of course, continue to be received most gratefully…

  • kensei

    How big is capacity problem? Doubling your CPU and memory won’t help if the traffic jump in these periods is explosive. It helps, obviously, but you will just take slightly longer to get to the crash if you have 2x capacity and have a 4x traffic spike.

    You might want to consider a lean version of the site for electoral or high news periods – the basicest version would be just text and comments.

    A lot depends on where the bottleneck is, what webserver you are using, what hardware etc but there is probably tunings you can do. On a general note, you might be able to use the webserver or firewall to throttle connections in periods like these, and if you can lower the number of posts returned on the start page, taht might also help.

  • Dave

    Alternatively, get kensei to put this in place:

    http://www.danga.com/memcached/

    It’s a very fast alternative to loading everything out of your database. Kensei will do it for the love…

  • Dave

    By the way, if your new host doesn’t have a 100M/Gigabit connection to the internet, get one that does. It isn’t nessessary to have a dedicated server since high traffic isn’t your problem (high bandwidth is), so you can often solve that problem by seperating your bandwidth-heavy content onto a cheap server (lunarpages.com for $4.95 a month) and also seperating your server-intensive scripts onto a seperate server. You could well find that 3 x $4.95 does the business for you just as well as 1 x $199.

  • Dave

    In case it wasn’t obvious, if you go for a multi-server solution as a cheaper alternative, then go with seperate hosts for each server lest your shared hosting be on the same server thereby defeating the purpose of the seperation.

  • Phil

    “Uprade coming, and thanks for the all donations…”

    Fork out for some decent spell checking software! /pedantry

  • Comrade Stalin

    memcached is for distributing database workloads over several machines in a local network. To use that, Mick would need to get several servers on the go at once. Large e-commerce sites require this, but a site like this one where the common case is doing read accesses.

    I’d have thought the latency involved in doing distributed cache stuff would have made it prohibitive to use it with different hosting providers. But architecting scaleable internet sites is not my thang. Not yet, anyway. 🙂

    There is no point in prescribing a solution without understanding the problem, as kensei already said.

    How big is capacity problem? Doubling your CPU and memory won’t help if the traffic jump in these periods is explosive. It helps, obviously, but you will just take slightly longer to get to the crash if you have 2x capacity and have a 4x traffic spike.

    yeah, agreed, Mick needs someone to look at what the bottleneck actually is. It’s noteable that the site does not merely go slow; it stops dead. I wouldn’t have expected that if the system was underpowered ? Often when the site is doing I see an error about a failure to connect to a MySQL database. I had figured that access quotas were being hit somewhere.

    Fork out for some decent spell checking software! /pedantry

    Phil, a decent browser does this for you. I’m using Chrome right now and it’s doing it.

  • Comrade Stalin

    Mick, what’s the actual spec of the machine post-upgrade ? And what operating system is it running ?

  • Itwas SammyMcNally whatdoneit

    Probably staring me right in the face but where are the details of how to do the donating thing?

  • Comrade Stalin

    On the front page there’s a “hit the tip jar” icon on the upper left.

  • kensei

    I have had to deal with a distributed MySQL Cluster in work. Giant databases are not fun, have about a million tuning options and about a million ways to go wrong. And as CS points out, in any case for distributed architectures you need more than single server.

    CS

    It’s noteable that the site does not merely go slow; it stops dead. I wouldn’t have expected that if the system was underpowered ?

    There is a number of ways it can happen. *nix systems do not like going above 80% resource usage of anything. In the case of underpowered, if the system was hitting 100% CPU usage for any period, it can back up so far that the machine effectively freezes. This is exceptionally annoying as it can take forever to recover, particularly if a large process then cores. Other stuff that could happen – low memory leading to a core leading to a temporary crash. Disk I/O blocking. Running out of disk space. Network bandwidth blocking. High mutex spinning. Database becoming inconsistent. Running of sockets or open files. Not all of those are tractable by shoving in more memory and CPU.

    Second, if you truly want Slugger to have serious uptime, then you need to have a secondary system that you failover in the event of a crash. You need to eliminate single points of failure. Redesigning the front page will not sort that for you. That requires rearchitecting the backend. Worth thinking about at any rate.

  • Comrade Stalin

    kensei, very interesting. I can see how things would go badly wrong if memory got tight due to lots of threads or processes being spawned by a glut of incoming requests. Lots of swapping, and subsequently brk() failing, would lead to highly unpredictable results. If that’s the problem, your point earlier about getting the web server to back off on the number of requests being served makes the most sense.

    /var/log would likely reveal all ..

    Obviously the point about failover assumes that the scalability issues have been addressed so that we don’t end up with ping-ponging. I’d have thought that once the site is stable and handles a peak load a little better (slowness is tolerable) this should be “good enough”. Ultimately it’s just a news source (and sometime conversation facility) rather than an e-commerce site. I say that because I guess Mick isn’t looking to spend the money that would be required to build one 🙂 And it’s not massively important that user comments are immediately visible, so surely there must be a way to cache all the pages including comments and update the comments in batch fashion during peak loads.

  • Pancho’s Horse

    It’s incredible how much these boys know about computers and how little about politics. Heh heh.

  • kensei

    CS

    kensei, very interesting. I can see how things would go badly wrong if memory got tight due to lots of threads or processes being spawned by a glut of incoming requests. Lots of swapping, and subsequently brk() failing, would lead to highly unpredictable results. If that’s the problem, your point earlier about getting the web server to back off on the number of requests being served makes the most sense.

    You are much more likely to hit problems before you run out of memory I think, though it isn’t impossible. Hitting 100% CPU won’t necessarily mean intermittent problems, particularly if it compounds by everyone whacking reload because it is slow. There is no guarantee that once you are in a bad state, you’ll recover in any reasonable timescale. The biggest issue is if it is some fault with the software – a memory leak, tight loop or mutex spin that is not easily sorted without someone messing with source code.

    Basically, the only information available is this is related to the volume of traffic. So, there are two guaranteed options to work 1. Mark sure the load doesn’t hit that threshold – throttle at high usage times. 50% availability is better than 0% 2. Reduce the cost of that load – run a lightweight version of the site where each request causes less subrequests for images etc, and is smaller content. Really, there should be one of these for low powered devices anywho.

    But yeah, some thought probably needs put into it, particularly if the new site just equals richer content and more heavyweight backend.

  • Mick Fealty

    I’m sending this thread directly to the hosts, so thanks lads… Sammy, that would b very kind, top left above everything else… Paypal…

  • Dave

    CS, memcached and sharing hosting is a contradiction since if you had the high traffic that required it, you would use a dedicated server and not a VPS. As I said in the second post (or guessed at), a dedicated server isn’t required “since high traffic isn’t your problem (high bandwidth [usage] is).” If high traffic merits its use, then it’s a perfect fit for a dynamic site like Slugger that makes extensive use of a database. If the problem is actually slashdotting, then that can be circumvented more cheaply than using a dedicated server by the method I outlined (splitting the load over various shared hosting servers). It is a traffic-related problem but the actual problem is the lack of bandwidth and the demands on the server, not the traffic – hence separating bandwidth-heavy content (images ect) and server-intensive scripts, databases, etc, onto other servers can usually sort it out. A traffic problem is a separate problem in that it relates specifically to more requests going to the server than it can handle (in which case, the problem is not slashdotting). But as you said, it’s just guessing at the problem without being able to analyse the data. Most amateur webmasters (and cheap bastards) encounter these problems as their sites become more successful and usually figure the hacks out for themselves. I’m just putting out a few cheap options (when you should buy new software and a dedicated server if the budget allows) because I get the impression that the site’s owner is trying to operate the site within the funds that it generates.