Getting started with monitoring for developers

Published on by

Getting started with monitoring for developers image

This is a sponsored post by Bugsnag written by Karissa Peth

Until recently, monitoring was exclusive to the IT operations department who was responsible for maintaining the hardware and networking equipment to run a companies’ entire suite of software. But nowadays, changes in deployment frequency, architecture, and cloud hosting has shifted that responsibility.

Monitoring is no longer just for ops teams.

Monitoring has become essential for all developers to gain observability into their codebase. It’s important be aware of the monitoring systems your DevOps team uses, but the monitoring world is huge, and it can be hard to know where to start. This post is aimed at giving you an overview of the monitoring available and the value it provides.

Why monitor at all?

The main value of monitoring your system is to act as a smoke detector. It can let you know when things are heading in a dangerous direction, sometimes enabling intervention prior to a failure point. A properly set up monitoring system will alert to outages, and beyond that, help to pinpoint where to start investigating. Monitoring tools are an effective source of truth when an outage does happen that enable teamwork between engineering and operations.

Monitoring terminology

Monitoring originated with the concepts of black box and white box monitoring to describe where they interact with your application or infrastructure. We won’t focus on these concepts, but they are helpful to know:

Black Box monitoring: monitoring from outside the system; what the “user” experiences

White Box monitoring: monitoring from within the system; critical for attaching context to problems

Infrastructure monitoring

Network monitoring

On the surface, network monitoring might seem to be exclusive to the realm of operations, but monitoring a system’s ability to send and receive data is something all developers should care about, since most services don’t exist in a walled garden. Getting a quick handle on your team’s network monitoring platform (whether SolarWinds, Nagios, LogicMonitor, or others) can help you troubleshoot service slowdowns or outages from sources external to your application.

When building your application, you should proactively consider bandwidth implications and how this can affect the performance of your application or the user experience. On the mobile side of application development, architecting for intermittent network and data usage is critical for success.

Service availability monitoring

At its core, infrastructure monitoring is about service availability. When problems do occur, monitoring your infrastructure tells you quickly if it’s an app problem or infrastructure problem. This information will help inform who is responsible for fixing errors. If your service is a dependency to others and your instance is down, it’s critical that the development team have this information.

Outage prevention can take many forms. From the development side, architecting your work to be:

  1. functional in an availability zone independent way
  2. selecting services that inherently have higher availability
  3. engineer fault tolerance into your consuming of services by gating them as either “must have” to function or “nice to have”

Platforms for monitoring service availability include Datadog, New Relic Infrastructure, Nagios, Honeycomb, and Amazon CloudWatch.

Application monitoring

Stability

Stability monitoring measures overall error levels, and helps you understand when application errors have reached a critical level that could impact your users’ experience. This is especially important since buggy applications are frustrating and considered unreliable by end users.

In the past, logs were used to track application errors, and engineers would investigate them when they received customer complaints. This was a painful experience for both engineering teams trying to piece together what caused a bug, and also for the user who experienced the bug and was responsible for reporting it. Since then, engineering teams have standardized on error monitoring software that automatically captures and alerts on errors, but this still requires lots of engineering overhead to triage the list of errors continually.

Stability monitoring adds logic to the process of monitoring for errors. It provides a definitive metric that lets engineering teams know when errors are impacting stability to the point that they must spend time debugging. This type of monitoring is used full stack. In backend services or applications it will report on the percentage of successful requests. Similarly for client-side and mobile monitoring the stability is measured as error/crash free sessions.

Bugsnag is a platform for application stability monitoring.

Performance

Performance monitoring at the application layer provides information about how fast an application is responding. Slowdowns in request processing from both the server-side and client-side can result in a poor user experience. The challenge with performance monitoring most teams face is knowing when is fast really “fast enough“. With so many variables affecting the performance metrics of any system, it can turn into a rabbit hole of tweaks. Dynamic baselines mean improvements in performance can result in making mountains out of previous mole hills. Performance monitoring is the last bit of monitoring that should be put in place after other areas are covered. You can think of it like building a car from the ground up—when all the basics are working, you can then make performance tweaks.

Server-side performance monitoring

Latency is the primary metric used for server-side performance monitoring. This is the amount of time it takes for your system to service a request. Performance bottlenecks in your code base can produce downstream effects on multiple services, sometimes leading to a cascading failure. Using performance monitoring to tackle slow responses, especially after a new release, can help pinpoint induced performance changes in a system.

What about CPU and memory usage metrics?
Previously, many developers used CPU and memory usage as key metrics for performance, but a better way to interrogate this type of information is from an application perspective. This is a change as a result of abstraction of infrastructure. Setting up KPIs for any service where higher CPU or memory usage could result means you are notified closer to the source of any problem occurring. This is something we do internally at Bugsnag using StatsD, where we’ve set two KPIs for the processing of ingested data—duration in queue and number of events in queue. This approach allows you to have an intelligent conversation around increases in those metrics with your operations team.

Platforms for server-side performance monitoring include Skylight, New Relic, and Datadog.

Client-side performance monitoring

The first monitoring many people look at is the latency in the response of the system, especially as it relates to client-side user experience and the impact that can have on a business. There are two ways to interrogate this metric. One is monitoring actual human users of your site and their experience, referred to as RUM (Real User Monitoring). This is distinct from SUM (Synthetic User Monitoring) which uses the Selenium programming language to simulate a user on your site.

Real user monitoring is often the first monitoring implemented on client-side applications because it tracks the actual experienced loading and response times of users on your site. The downside of RUM is that the data can be noisy as each user’s internet speed and system influence the metrics collected. The data is still useful when digging into certain trends, or if you spend a bit of time pruning or filtering the data to exclude older browsers.

Synthetic user monitoring is frequently overlooked in the monitoring toolset available to engineers. The data collected from using a SUM tool is generally cleaner than real users. The variables of internet speed, browser, and system are all controlled, thus allowing the data to highlight changes in your site that have affected performance. One downside to SUM is that it requires setting up the behavioral scripts to capture the information. Although this takes just a few minutes, the need to update the scripts when changes are made to the site is a maintenance hurdle for many teams.

Platforms for client-side performance monitoring include New Relic, SolarWinds Pingdom, and AppDynamics.


Having the right monitoring in place is vital for engineering teams shipping applications to let them know of outages, slowdowns, and errors. What does your monitoring stack look like? Are there places where you’re lacking the right amount of monitoring visibility?

Eric L. Barnes photo

Eric is the creator of Laravel News and has been covering Laravel since 2012.

Filed in:
Cube

Laravel Newsletter

Join 40k+ other developers and never miss out on new tips, tutorials, and more.

image
Laravel Forge

Easily create and manage your servers and deploy your Laravel applications in seconds.

Visit Laravel Forge
Laravel Forge logo

Laravel Forge

Easily create and manage your servers and deploy your Laravel applications in seconds.

Laravel Forge
Tinkerwell logo

Tinkerwell

The must-have code runner for Laravel developers. Tinker with AI, autocompletion and instant feedback on local and production environments.

Tinkerwell
No Compromises logo

No Compromises

Joel and Aaron, the two seasoned devs from the No Compromises podcast, are now available to hire for your Laravel project. ⬧ Flat rate of $7500/mo. ⬧ No lengthy sales process. ⬧ No contracts. ⬧ 100% money back guarantee.

No Compromises
Kirschbaum logo

Kirschbaum

Providing innovation and stability to ensure your web application succeeds.

Kirschbaum
Shift logo

Shift

Running an old Laravel version? Instant, automated Laravel upgrades and code modernization to keep your applications fresh.

Shift
Bacancy logo

Bacancy

Supercharge your project with a seasoned Laravel developer with 4-6 years of experience for just $2500/month. Get 160 hours of dedicated expertise & a risk-free 15-day trial. Schedule a call now!

Bacancy
LoadForge logo

LoadForge

Easy, affordable load testing and stress tests for websites, APIs and databases.

LoadForge
Paragraph logo

Paragraph

Manage your Laravel app as if it was a CMS – edit any text on any page or in any email without touching Blade or language files.

Paragraph
Lucky Media logo

Lucky Media

Bespoke software solutions built for your business. We ♥ Laravel

Lucky Media
Lunar: Laravel E-Commerce logo

Lunar: Laravel E-Commerce

E-Commerce for Laravel. An open-source package that brings the power of modern headless e-commerce functionality to Laravel.

Lunar: Laravel E-Commerce
DocuWriter.ai logo

DocuWriter.ai

Save hours of manually writing Code Documentation, Comments & DocBlocks, Test suites and Refactoring.

DocuWriter.ai
Rector logo

Rector

Your partner for seamless Laravel upgrades, cutting costs, and accelerating innovation for successful companies

Rector

The latest

View all →
Non-backed Enums in Database Queries and a withSchedule() bootstrap method in Laravel 11.1 image

Non-backed Enums in Database Queries and a withSchedule() bootstrap method in Laravel 11.1

Read article
Laravel Pint --bail Flag image

Laravel Pint --bail Flag

Read article
Laravel Herd for Windows is now released! image

Laravel Herd for Windows is now released!

Read article
The Laravel Worldwide Meetup is Today image

The Laravel Worldwide Meetup is Today

Read article
Cache Routes with Cloudflare in Laravel image

Cache Routes with Cloudflare in Laravel

Read article
Learn how to manage timezones in your Laravel Apps image

Learn how to manage timezones in your Laravel Apps

Read article