www.asp.net, update part 2

Last week we did an update of www.asp.net and had some significant problems with the main www.asp.net site. I summarized some of this here.

We originally thought the problem – what appeared as a memory leak – was due to a search component we were using. Under load the application would run the CPU at 100% and eat 2-3 MB of memory per-second. This would continue until the memory reach about 700 MB and then the application would attempt to recycle and start the whole cycle over again.

We’re now fairly confident we found the bug, it’s a bug which is only obvious under significant load. Simulating the type and pattern of load on a site as highly trafficked as www.asp.net is not as simple as it sounds. A few people questioned whether or not we even tested the site before we put it into production, which of course we had.

The bug was a in a couple of lines of code for a custom component on www.asp.net. On a positive note this was not Community Server code or Lucene (search code), but custom code written specifically for the CMS system that drives www.asp.net. I won’t go into the specifics, but the bug had to do with loading and parsing an XML document.

We’re definitely very sorry about the down time this caused, but we still want to keep everyone up-to-date with what we’ve found out.

This entry was posted in Uncategorized. Bookmark the permalink.

10 Responses to www.asp.net, update part 2

  1. http:// says:

    An infinite loop while parsing an XML document? LOL!!!

  2. Jon Galloway says:

    Thanks to you and your team. I think a lot of your users don’t appreciate the volume of data and number of moving parts you’re managing to keep the whole *.asp.net system working day after day.

    I’ve previously suggested a separate blog for community announcements. I really think that would be more effective than e-mail, and of course the combination would be best. The more “self-serve” you can make things, the better – then when bloggers don’t know about an outage, you can very politely point them at the public announcement(s) you’ve made. To be effective, the announcements would need to be included in the main feed and shown in a side box on the main ASP.NET page.

  3. Nic Wise says:

    oooh, can I guess? Did someone load a large XML document into an XmlDocument class?

    Like, maybe one with a 10meg base64 encoded block in it?

    Or is it just us who does that? (tho not anymore :) )

  4. http:// says:

    Rob thanks for the update. By the way I am still waiting an answer about our last email conversation.

    What do you think?

    Cheers

    Paschal

  5. Thanks for reporting back to the community on your findings. It would be interesting to hear more on how you tracked down the memory leak.

  6. Vikram says:

    Its great to hear that the bug was resolved. But I would like to know the technical reason or problem you faced with XML document so that we do not make these mistake again.

    But congrats on getting the thing done.

    http://www.vikramlakhotia.com

  7. Impressive bro, thanks.

    Salute, with honors. ;-)

  8. ScottW says:

    @Vikram: This does not look like an issue with actual XmlDocument class (or anything else in System.Xml).

    Instead there was a block of code which was constantly re-loading xml files on the same request (probably 100′s of times). At moderate load this was not an issue, but under a lot of pressure we seemed to hit a tipping point which pushed things over the edge.

    Thanks,

    Scott

  9. http:// says:

    Thanks for sharing, but mighty Rob howard/telligent seems fallible

  10. Mitch Wheat says:

    Hi Rob, I’d be interested in hearing how you tracked this bug down. Or was it just a D’oh ! moment?