For the purpose of monitoring systems, there are many different approaches and methodologies. Technical monitoring, functional monitoring, and business process monitoring are the three fundamental types of monitoring.
In this article, I will focus on the high level recommendations on functional and technical solution monitoring and explain how to set up a clear strategy for the processes around it. I will leave out business process monitoring as this is specific to each business and is dealt at the management level.
So why is monitoring important? Here are just a few of most important considerations:
There is a relationship between observability and monitoring, and they do have distinct functions. Monitoring involves gathering and displaying data before using it for further analysis or monitoring, observability refers to the accessibility of data.
Functional monitoring only looks at the functional aspect of the solution, evaluating an use-case or a group of use-cases on a system. It identifies performance and availability problems at the functional level and ensures this is visible and recorded.
Functional monitoring is usually performed automatically by executing scripted operations on a system. Robot-based monitoring is excellent to ensure quality of service and users' experience.
When your solution is actively used by customers, functional monitoring is essential to ensure quality of service. Essentially, this is about testing all core user journeys and system workflows repeatedly, and then monitoring the results for any anomalies.
By testing the workflows, we continuously get information about the availability of the system. Depending on the application, these tests are run on production on a specific schedule (e.g., hourly or daily).
To illustrate an example of functional monitoring, I will use a generic use journey of a customer placing an order.
This user journey is quite common, but it is far from easy to test functionally, since it integrates with payment and possibly shipment flows provided by external services. In addition to this, users may come from various promotions and may bypass some of the steps, which further complicates the testing process.
To address this, the best strategy is to apply multi-aspect functional testing:
The purpose of technical monitoring is to determine how well the software components underlying the system perform in real time. It focuses on specific technical functions of each component in isolation and may not report on the functionality of the system as a whole. It effectively reports on issues and allows operators to decide how to fix these.
It is important to stress out that technical monitoring may not identify all problems within the system, as some issues may not show up in technical monitoring at all.
Below, I will focus on several best practices for ensuring that your systems are effectively monitored:
The first step is to understand the context of your system, this is where you will need to conceptualise your system and get to know all components. The most important thing here is to determine and document which of the areas identified in your initial evaluation of your environment are the most business critical. Here are some steps I would recommend:
There are many tools available on the market. The most important thing is to choose a monitoring platform that can cover and monitor most of your business critical components in one single place. Adding additional tooling can significantly increase complexity, time to resolution, and the amount of effort required to perform proactive performance assessment and improvement activities.
Analyse and decide on which metrics and what kind of data is important to you and your application. The monitoring tools you choose depend on the type of the system you run, and may look very different for a small e-commerce environment than for a highly distributed containerized Java application.
Make sure the potential monitoring platform candidates can deliver the necessary metrics. Some tools might be able to gather the metrics you need right out of the box, while others might need significant adjustment or even code changes in your application.
At a bare minimum your metrics should report on component availability, performance and critical errors in the application. This can be relatively simple to achieve as most of the available tools are able to hook into your components and start aggregating your errors quickly, as well as monitor available endpoints.
While monitoring is basically collecting data, alerting is a proactive notification approach to monitoring, via email, SMS, ticketing system etc. While alerting is quite useful, you need to consider avoiding "alert fatigue" - when the system sends out too many alerts which require no action which leads to monitoring teams possibly missing important ones. Alert only on critical situations which are critical and DO require action. With alerting, less is more.
When setting up alerting, consider the business context of your application rather than getting too technical, for example in an ecommerce application, alerting and monitoring the technical metrics of a purchase workflow might prove much more valuable than collecting CPU data for example. Consider what is important for your end users and focus your alerting to enhance that end user experience.
It's important to keep in mind that tuning alert volume to reach an appropriate level is often an on-going process. When starting out, refer back to the business purpose of your application. In an e-commerce application, for example, monitoring the transaction response time for business critical transactions such as 'add to cart' and 'checkout' are clearly important, but monitoring for CPU usage on a given application server is probably unnecessary. Consider what is important for your end users and focus your alerting to enhance that end user experience. Think more about user experience and focus on what is business critical.
It is critically important to develop an effective monitoring strategy in order to have a truly performant and reliable application.
The best strategy is to combine functional and technical monitoring to obtain a complete view of the system. This will ensure control, impact awareness and will facilitate an adequate level of quality of service and confidence in operations.
Outsourcing can be a great way to bring in new skills, but it’s also a way to make sure that your business is protected, both financially and legally.
The Minimum Viable Product Theory was first introduced in a blog post by Frank Robinson in order to define the nature of a "product" in the context of increasing returns and reducing risks.
As the digital transformation landscape continues to evolve and new technologies emerge, it's important that we don't lose sight of the core drivers of digital transformation: sustainability, data volumes, and compute and network speeds.
Dive into cutting-edge tech trends and visionary insights, curated monthly for forward-thinkers like you.