2024 - The Year of Observability Everywhere!

Sentry's AI Quandary, the Ollys and more...

2024 - Observability Everywhere!

2023 was a seismic year for observability. Observability did not just stake its claim as a first-class citizen in the IT landscape, it also extended its reach acrossmultiple technical and business domains. Cisco grabbed a lot of headlines with their acquisition of Splunk. However, their less publicised acquisition of eBPF ground-breakers Isovalent was also a masterstroke.

2024 promises to be even more momentous. Not only is Observability being recognised as a tool for adding value across the business, the very boundaries of observability are also extending. Innovators like Embrace and Dylibso are taking observability into territory such as mobile and WebAssembly.

In this edition of the newsletter, we will look at some predictions for the year ahead, celebrate some of the achievements of 2023 and also provide our usual coverage of news, products and innovations.

It's great to have you all with us for the journey into the year ahead!


Let us know how we are doing at:


Looking Forward - Observability in 2024

Image courtesy of The New Stack

There have been any number of great articles predicting what lies ahead for observability in 2024. In his preview of the next twelve months, Ken Hamric of Tracetest used the phrase 'Observability Everywhere' - and we believe that this really captures the momentum that is building. Tracetest itself is one of the harbingers of the new wave of observability, harnessing the outputs of trace telemetry to support the testing of distributed services.

Looking Back - The Best of 2023

In case you missed, it, we rang out the old year by awarding our Ollys - our very own accolades to those we felt were some of the most innovative, interesting and valuable actors on the Observability stage in 2023. This has proved to be one of the most popular articles ever published on the Observability 360 web site. Click below to see the lucky winners who carried off the golden telescopes.

Sentry Caught Off-Guard By User Backlash

Sentry, vendors of one the market-leading crash reporting tools, stirred up something of a hornets’ nest with a recent update to their Terms of Service. The new clause meant that customers automatically consented to their log data being ingested into a Sentry Machine Learning system - with no choice to opt-out. Customers on the free tier effectively faced a 30-day deadline to consent or find an alternative vendor.

Unsurprisingly, a number of customers were both vexed by the tone of the announcement as well as expressing concerns over PII and data governance issues. Rather prudently, the company decided to backtrack and, in a chastened post on their web site, they sought to reassure users by postponing the rollout of this policy.

In their defence, the company noted that other vendors have similar clauses embedded in their ToS. This obviously leaves vendors such as Sentry in a quandary - if they do not follow suit, they will find themselves at a competitive disadvantage.

Grafana Nail Costs With Karpenter

There is an almost unanimous consensus that cost management and FinOps will be major themes for 2024. Some of the solutions to this will be configuration-led - i.e. optimising policies around sampling, filtering and retention. For any operation running at scale on a K8S platform, a solution such as Karpenter also enables huge savings at the infrastructure level. Its highly performant and responsive failover logic will enable engineers to maintain service provision whilst achieving the huge economies yielded by spot-provisioning. This blog article explains how Grafana achieved lower costs and improved reliability by migrating to Karpenter from Cluster Autoscaler.

From the Blogosphere

Releasing The Chaos Kraken

We have previously mentioned the Azure Chaos Studio tool for managing Chaos Engineering experiments in the Azure cloud. This article on the Red Hat blog looks at using Kraken, a chaos engineering framework for K8S and OpenShift environments. Kraken uses AI to take Chaos Engineering to the next level. It can autonomously identify dependencies in your microservices graph and actually recommend appropriate chaos scenarios.

eBPF - The Silent Platform Revolution

eBPF is not just a protocol or an SDK. It has, in a short space of time, cemented its place as a foundational technology. The Cisco acquisition of Isovalent at the end of 2023 may have taken the community by surprise but, given the explosive potential of the technology, it seems like a very safe bet. This illuminating article by two of the most eminent figures in the eBPF world sketches out the history, scope and potential of the technology. Even if you are familiar with eBPF you will still find much of interest.


Catchpoint - Observability for the Outer Loop

If the theme of this edition is Observability Everywhere, then perhaps there are few better exemplars than Catchpoint. The scope for their analytics service is, literally, the whole planet. A full-stack or full-spectrum observability solution will give you visibility over the inner loop - i.e. your service endpoints and all the services and infrastructure which sit behind them. Catchpoint complements this by looking at the outer loop - monitoring connectivity issues between your external endpoints and the global internet.

HyperDX - Lean & Mean Observability

HyperDX describes itself as an Open Source Observability Platform where users can run Session Replays that pull Logs, Traces and Errors together into a single view – all without the price tag associated with some of the larger vendors. The online demo is impressive and with 5.5k stars on their GitHub repo, they clearly have developer appeal.

The platform is built on the super-scalable ClickHouse column storage database system and natively supports OpenTelemetry. It aims to be highly competitive on cost - with no charges for hosts or seats - only for data. It supports a number of runtimes including Python, GoLang and Java - but not .NET. At present it has support for AWS, Heroku and Cloudflare but not Azure or GCP.


BindPlane - Fleet Management for oTel

The beauty of OpenTelemetry architecture is that it is highly flexible and scalable. For non-trivial implementations you may well end up deploying multiple Collectors for different signals, environments or clusters. The Open Agent Management Protocol (OpAmp) is the emerging open standard to manage a fleet of telemetry agents at scale. Interestingly, an agent does not have to be a Collector - it can be any service that implements the protocol. ObservIQ are one of the first vendors to release a product utilising the protocol. Their BindPlane system is an open source tool that enables you to monitor, deploy and configure your agents. Set up is a breeze and once it is complete, you can add agents via a simple web UI.

Azure Cloud Pivots to OpenTelemetry

In a recent article on the Azure Observability blog, Matthew McCleary of the Azure Monitor team unveiled the company’s blueprint to “make Azure the most observable cloud”. The strategy to achieve this includes putting OpenTelemetry at the heart of its observability infrastructure. Having already rolled out an Azure Monitor OpenTelemetry Distro, they are also committing to one-click Auto-Instrumentation of AKS as well as building OpenTelemetry compatibility into Azure SDKs.


VictoriaMetrics - Cutting Observability Costs

In this presentation from the 2023 stackconf event, Roman Khavronenko looks at a range of technical strategies to reduce observability costs. Although he looks at this from the point of view of the VictoriaMetrics platform, a number of the issues have general applicability at the architectural level - for example techniques for compressing network data or eliminating extraneous metrics labels to speed up queries.


Meeting Up/Skilling Up

If you want to build out your Column DB Skills, then ClickHouse are running another series of their Fundamentals training sessions. Alternatively, you can take a one-hour tour of Grafana’s Frontend observability tooling in this free webinar. Meanwhile the Cloud Native London MeetUp has announced its program for 2024 (you can also attend online) and next month sees the ScyllaDB 2024 Summit - which promises to be one of the big database events of the year. Finally, the schedule for ThanosCon has now been announced - featuring speakers from Shopify, Reddit and Cloudflare.

That’s all for this edition!

This week’s quote is from the great German polymath Johann Wolfgang von Goethe:

“There is nothing so terrible as activity without insight”.