My research on monitoring outages

Written by Yi Ming Yung on 7th May 2016

In my previous post, I talked about the first steps I took in creating a monitoring dashboard for our telephony platform VoIPGRID. With my internship coming to an end, I’ll give you an update on the progress of my research. 

The current state

Right now VoIPGRID has a status page located at voipgrid.nl/status which is manually updated during outages by one of our colleagues and has an RSS feed. This page shows outage reports during outages and is continually updated with new information about the outage. The outage reports will stay on this page for one week after it is resolved. When there are no problems with the platform, the ideology that “no news is good news” will be maintained. In our eyes, this solution could be improved in terms of insightfulness.

The original idea

At the beginning of my research, there was a small controversy regarding the development of a custom status page or adopting a third party package that fits our needs. Before we could make this decision, we would need to determine the requirements first. The original assumption that I made was to implement a custom monitoring dashboard or status page in the current VoIPGRID portal that we would develop ourselves. I assumed this would be a good place to implement the dashboard because it is intended for partners who use the portal quite frequently. The portal would be a great central position for such informative dashboard. 

Outside the network

The downside of this approach would be that the page would be inaccessible during outages of the databases and/or web servers or other systems that the VoIPGRID web portal depends on. This would defeat the purpose of the monitoring dashboard. For that reason, it was decided to position the dashboard outside of the VoIPGRID network. This also means that it won’t be possible to password protect the dashboard and it will limit the information that we would be able to show due to security reasons.

Choosing a third party package

Now we could choose between adopting a third party package or to develop a custom dashboard to visualize the status of our platform. It is more efficient and time-saving to implement a third party package than to develop and maintain our own status page. I have investigated several candidates for third party packages and came to the conclusion that statuspage.io would be best suited for our needs. By implementing statuspage.io, we could save the company many hours of work regarding development, designing, and maintenance. Although statuspage.io looks fancy and has nice features, we still need to develop middleware to make the connection from our internal monitoring and statuspage.io.

The next steps

The next steps will be to design and tailor statuspage.io to VoIPGRID and then make the middleware to automate the process. By automating the process, we can give our partners the most up-to-date information at all times. I have also given statuspage.io some feedback on how to improve their product through various mail contact. Auto-refresh of the status page will be somewhere down the roadmap. I have also mentioned that they should be able to turn certain modules off/on. As of right now they are looking into this.

Your thoughts

No comments so far

Devhouse Spindle