Friday, September 14, 2012

Testing the theory - DR Testing Takeaways

Everyone says you need to do it - hey I even said you should do it, but how many of us do "really" test our disaster recovery processes? I mean from start to finsih, warts and all.  We've been testing our plans for years - but somehow never really got to putting everything together into a single full scale "get out there an do it" kind of test.  We've been able to extrapolate and make assumptions, and generally be pretty happy about our capabilities should the world of BAU come to an end, but could never say, for sure, in the end, it would all hang together.

We decided - particularly on the back of some fairly big updates this year, that we really needed to do a full on, no holds barred test, by taking our primary centre fully offline, and then seeing how it all went - here are some of the things we learned:

Be Prepared?
Yes, it's a question and not a statement - you really want to make a decision on just how "prepared" you want to be.  When we talked to our partners, we found everyone wanted to "plan" this test - and that's something you need to be a little careful of.  Having everyone and everything all in the best places possible to achieve a successful outcome might well be what you are used to doing - but in this case you risk lulling yourself into a false sense of security - in a real disaster, all the prep time you have has already gone.

What you DO need to be prepared however, and prepared as well as possible, is for is a clean and rapid rollback.  If something goes disasterously wrong with the test, or if curcumstances mean that safety or the business is put at risk - you really need to make sure that however you simulate the disaster, you can "unsimulate" it as fast as possible

Accept Failure
Whatever you think going into it - there are going to be things that don't work as you thought they should - to be honest I'd be more worried if everything DID work as it was supposed to - if so you probably missed something!  If your test is realistic (and not planned to primarily highlight the best bits of your DR plan!) then there should always be things you can learn - even if it's just an opportunity to speed something up.

When writing the test plan, it helps to have someone who didn't design the recovery procedure recommend what the scenario is - if you can resist it, try not to overthink the situation.

Record everything
Have someone who is not involved in the recovery act as referee, they will be able to avoid the hustle and bustle of trying to make things work, and they will actually have the time to write things down as the test progresses. The referee is also a great pair of eyes on other opportunity to improve processes that might be missed by those who are in the middle of it all

So.. How did it go?
I guess you are all wondering how it went for us then?  I suppose it would be unfair for me to preach the things above and not tell our story - so here goes...

We planned our DR event for late at night, when we only have  a few staff around, and impact to customers would be minimal - maybe not as big and scary as the middle of the day - but the sysems and processes are identical so it's still a valid test.

We simulated a complete loss of our contact centre and data systems, and at 11:18pm we pulled the plug (quite literally in some cases) on our internet, phone lines and external WAN connections.  Simultaneously we killed the lights and the staff had to get themselves out and into cabs to our DR site (diverting critical lines to mobiles as they went).

Once at the DR site, the fun began....

Overall we had a successful test - it was a great validation of the work we'd done over the last year - but the real value came in the things that maybe didn't go 100% to plan.  It took longer than expected for some of our recovery servers to come up - something only a realistic test would show - we've since reorganised the startup process.  One of our backup telephony servers also decided not to fail over cleanly (even though the previous 6 test were flawless) - we'd never seen the issue before - but now we know about it we're better placed for next time (test or real event).

One of the most suprising things though was how engaged the staff who participated in the excercise were - even though they were "off the clock" by later in the evening (morning!) - many staff were keen to stay on and help out even when they weren't specifically needed anymore (we had a second crew back at HQ who took over once the testing was complete and we'd "rolled back" - this save additional delays whilst we got staff back to base)

If you want to hear more about our test (particularly if you are a client of ours), drop me a line, I'll be happy to tell you in more gory detail!  But I'd like to leave you with one final, and most important learning from this exercise.  I've said it before, and I'll surely say it again...

Test it.  Test it again
No amount of talking about it, looking at diagrams, or testing parts of your DR process is anywhere near as valuable as taking the risk to do a full scale test.  If you've never done it (or only done it part way) make a resolution to yourself to prove it.  What's the worst that can happen?  If you keep your primary site/systems ready to go - not a lot - but you sure will learn where you need to focus your efforts. Once that's done, start thinking about doing it all over again.... best of luck !!

Steve Hennerley
GM IS ,Telnet

Labels: , , , , , , ,

Monday, September 3, 2012

5 ways to get the right people for your contact centre



Our people are the most important part of Telnet – our technology is leading edge but it’s our team that really makes the difference. For this reason I take an active part in the employment of everyone in the company. These are my guidelines:

  1.  I make sure our recruiters know their brief really well – I only want to interview people I want to hire
  2. Attitude is all important  – the twinkle in the eye - the passion – not past experience, of course that counts in some roles but even when it is necessary experience is not all.
  3. I ask myself will the new person fit in with our team – or if not are they going to bring a new skill or personality that we really need?
  4. Will they CARE about our business and our clients as much as we do ?
  5. I try as often as possible to promote from within Telnet – recognition means more than money to most people and it’s our role as managers to ensure our people grow and develop.  Almost everyone I interview asks about opportunities for promotion and I enjoy telling the stories of people who have gone on to succeed in senior roles both here and in other organisations


Penny Calder
Director Operations Telnet Services Ltd




Labels: , , ,