Recurring incidents are, unfortunately, a familiar phenomenon. While the system vendors do the best they can, the problems often follow one another. If recurring incidents are not handled properly and the root cause of recursion is not investigated, the maintenance costs will eventually raise.
But don’t worry! We know how to tackle these challenges – here are six tips to improve your problem management process and decrease costs.
Make sure that all of the incidents are registered and categorized as accurately as possible. This will help your team to understand what is actually broken and which one of the system’s features are affected. When you resolve the incident, make sure that you also record the solution. These steps will help you to identify the recurring incidents, gather necessary data and to use that information in your problem management process to come up with a solution that will stop the incidents from recurring.
Set up a monitoring system to automatically run test processes through the system’s key features. Monitor and track related integrations to detect when the systems are functioning as they need to and when do possible disruptions occur. Connect the monitoring system and the incident management process in order to spot incident immediately and to further investigate their root causes. In other words, with these actions you will activate the “lights on” switch and you will have a clear overview of your IT system’s health and condition.
Utilise your log system and build automations that will alarm your incident management system in the case of critical errors. However, you need to filter out the unnecessary ones that don’t require attention. This helps you to raise the awareness of potential system disruptions and to react quickly without wasting any time.
Talk to your system vendors and organize multivendor meetings. Make sure that your vendors don’t pass the buck when it comes to your incidents, but rather collaborate in joint effort to resolve the incidents as quickly as possible.
Agree with your vendors that they regularly analyze availability, logs, and incident data. Ask them to pick up e.g. 3 most significant recurring incidents from the data every month and to start problem management process for them. Vendors should investigate root causes of these three recurring incidents and try to find out how to get rid of the incidents permanently. Sometimes the root case may be too expensive to find out or to be resolved. In these situations consider if there are some workarounds that can be used to make these incident to occur less often, or e.g. build an automatic maintenance process to tackle the incidents whenever they happen. With these actions, the number of recurring incidents are decreased over time.
Sometimes the root cause of recurring incident may lead to your own area of responsibility. For example, the root cause might be in the way you run your business processes across several different systems. In such cases, talk to your business process experts and vendors, and evaluate the possible actions. Invest into the actions that provide fastest return on investment.
With these tips, you can be sure that you have an efficient problem management process in place. As the number of incidents and their severity decreases, you will need to invest less and less time and effort into solving recurring incidents. However, to be on the safe side, do not stop the problem management processes completely and we advice to require that also from your vendors.
With Nortal’s Continuous Services we guarantee to our customers that we improve their system availability and reduce the number of incidents. Our goal in incident management is to reduce customers’ maintenance costs so that they may invest those resources into new development and innovation, and thereby to bring more value to their business.
Sami Merovuo, a Service Management professional at Nortal, has 15+ years of experience in designing, building and delivering software development and operations services to customers in industry, digital healthcare and government domains. Sami understands the business continuity point of view and knows what is required to keep the business-critical systems up and running. He is familiar with agile software development methodologies and frameworks such as Scrum, Kanban, Scaled Agile Framework (SAFe), DevOps, ITIL, Continuous Integration and test automation.