I remember going into the Operations centre of a large bank - wow - it was like mission control for the moon rocket - 100+ people, lots of desks, with big screens on the walls. There was a map of the world, showing where major data centres were, London, Beijing, New York were all green. If there was a problem they would change colour. There were charts showing 20,000 transactions a second with response time under 1 second. Another transaction was "amber" because the response time was over 1.0 seconds.
At the back of the room was the operations management team responsible for the day to day running of the systems. In a room off the back of the room was "the war room", for managing critical situations - from "long response time" to "system unavailable". I remember being involved in one "crit-sit" where the customer had hourly calls with IBM for nearly 24 hours, till they were up and running.
The people on the desks were the "operations staff" or "applications staff". Their job was to keep an eye on things, and sort out any problems. Automation does most of the day to day operations, but if you are told to "move work from this system to that system", you need to know how to do it. A good day in operations is when nothing happens!
In the operations room there were operations desks or areas for each major functional area, Mainframe, Disks, Networks, z/OS, CICS, WAS, MQ, IMS, DB2, TCP/IP, Security, Automation; There were areas for people monitoring the performance and availability of each area. There were desks at the back for "overall availability", for example deciding on whether to fail over, or not, or when to get the vendors on the phone.
As I worked for IBM I was there during a major upgrade of MQ - just in case anything went wrong. They had planned the upgrade for 6 months, and had 4 hours to make the upgrade. If it was not ready after 3 hours they had to roll back the changes. If they had to roll back, the next opportunity of doing the upgrade was 3 months later, so they wanted it to work. They had a spreadsheet with every command they needed to issue - and another column for the command to undo the change. They only used cut and paste, and did no typing, because typing is slower, and error rpone. Cutting and pasting mean they could issue the command they had tested with.
I was allowed to look at screens - but not touch a keyboard. The local team called me over to ask a question, and I had to ask them to "scroll down" to the next page - it was very hard to resist pressing the key myself!
We had worked through the night, so after the upgrade was successful, we left at 0700, had a slap up team breakfast, and I went to the hotel to sleep.
Colin Paice