Tagged: outage Toggle Comment Threads | Keyboard Shortcuts

  • lars 6:59 pm on September 1, 2011 Permalink | Reply
    Tags: outage   

    Socorro HBase connection issues 

    Socorro is currently experiencing an HBase connection issue of unknown origin. Processing is down. Web App functions that require fetching actual crashes is down. Web App functions that use the database appear to be working normally.

    When I learn more, I’ll say more…

    UPDATE: as of 9:30pm PDT, we’re back up and running. All systems should be back to normal. No crashes were lost during this outage.

     
  • lars 6:18 pm on August 31, 2011 Permalink | Reply
    Tags: outage   

    Socorro Disrupted by Data Center Issue. 

    The Socorro Web App is currently inaccessible due to issues in the Phoenix Data Center. Corey Shields broadcasted this message: “IT is working a network outage in our Phoenix data center right now. This is causing widespread site issues as well as netsplits on IRC. Will keep you posted.”

    The outage seem pretty absolute, as I can access none of the Socorro infrastructure to see how it is faring. I will keep you informed as I know more.

     
    • lars 7:20 pm on August 31, 2011 Permalink | Reply

      It appears that as of about 19:41 PDT, we’re back online. As far as I can tell, we’ve got full functionality.

      The bad news is: according to the collector logs, most crashes submitted from 18:07 PDT to 19:41have been lost. I’ll report more as I investigate more.

  • jabba 3:28 am on July 28, 2011 Permalink | Reply
    Tags: outage   

    Socorro unplanned outage 

    Socorro is experiencing an unplanned outage due to network difficulty in the Phoenix data center. The user interface continues to function for aggregate queries. Calling up individual crashes, along with crash processing is stalled. Crashes coming into the collectors are unaffected other than being delayed in processing. No priority jobs are being processed.

    I will report back as soon as I hear more news as to the potential duration of this outage.

    Update: HBase has been restarted and all functionality has been restored.

     

     
  • lars 5:12 pm on March 10, 2011 Permalink | Reply
    Tags: outage   

    Socorro Service Restored 

    IT announces that they’re done mucking about with the infrastructure for now. All Socorro services should be back to normal.

    We are notified that this intermittent outage may repeat later this evening.

     
  • lars 4:26 pm on March 10, 2011 Permalink | Reply
    Tags: outage   

    Socorro Trouble Today 

    Socorro processing has been intermittently adversely affected by infrastructure issues since about 3:30 PST today. First our connection to PostgreSQL failed and, after that came back, our connection to HBase started to get finicky.

    IT is on top of the issue, and I’ll post again when we’ve got more of an idea as to when these intermittent problems will be resolved.

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel