AussieHQ - Making the web work

Service Status

Service Status -> Advisory #2025

Advisory #2025
Australian network outage
Category: Outages and Unscheduled Maintenance
Severity: High Impact
Status: resolved
Start: 29/05/2008 08:30
End: 29/05/2008 10:55
Services:

Thu, May 29 2008 09:23 am by support@aussiehq.com.au
We are currently experiencing issues with our Australian network. Technicians are currently investigating and further updates will be provided as information comes to hand.

Thu, May 29 2008 10:07 am by support@aussiehq.com.au
Our Australian network is now accessible, however it may be slower than usual from some connections. These issues are still being worked on and we hope to restore full service shortly.

Thu, May 29 2008 10:41 am by support@aussiehq.com.au
It would appear that some ISPs are still having issues connecting to our network. We are continuing to investigate this problem and will provide more details as they become available.

We apologise for any inconvenience that this issue may be causing.

Thu, May 29 2008 10:55 am by support@aussiehq.com.au
All services have now been restored. Anybody who is still experiencing issues should contact the service desk for further assistance.

Thu, May 29 2008 03:21 pm by support@aussiehq.com.au
An explanation of the fault is as follows:

Early this morning, cor1.cbr1.aussiehq.net.au developed an obscure fault on one of its interface controllers affecting gi0/1 and gi0/2 (which connect to bdr1.cbr1 and bdr2.cbr1 respectively).

The fault was causing intermittent packet loss as well as data errors across these two interfaces (which account for approximately 40% of data flowing into our network).

As you may know, our network is designed with no single point of failure. We can lose any one of our border routers and any one of our core switches without affecting end-to-end connectivity.

Unfortunately, this fault resulted in the actual switch still being online but degraded and as such our redundant switch did not assume control over the traffic.

When the fault was diagnosed this morning we removed cor1.cbr1 from service which restored connectivity to our access layer. At this point we disabled the faulty interfaces and returned cor1.cbr1 to service as a backup switch only.

It has been a learning experience, and it is certainly the first time I've come across such an issue. We'll be running some further investigations and will be implementing some changes to our network to prevent such an issue from recurring in the future.