Beyond The Single Pane of Glass

Bad Observability! | Prometheus Unbound

Welcome to Edition #16 of the Newsletter!

Observability is a very broad and diverse field. The use of the term ‘Observability’ itself, in relation to IT, is relatively recent and it can mean different things to different users. Our headline article looks at the Single Pane of Glass paradigm and asks whether it really is an observability panacea or whether there might be other approaches. This is a big subject and if you would like to get involved in the discussion then just head over to the London Observability Engineering Slack Channel (you don’t need to be based in London to join).

There seems to be a bit of a buzz about Prometheus at the moment. Last time around, we mentioned the project’s commitment to OpenTelemetry, this week we cover two major vendors upgrading the Prometheus user experience.

The term Observability Driven Development has achieved a certain amount of currency in recent years. Like its older cousin, Test Driven Development, it is a concept which has given rise to plenty of discussion and even a certain amount of confusion. In this edition of the newsletter, we look at Digma - a tool which brings actionable observability insights right into the developer’s IDE.

Feedback

We love to hear your feedback. Let us know how we are doing at:

NEWS

Prometheus Unbound

Both Microsoft and Grafana have recently made announcements to power up the user experience for their Prometheus tooling. The Microsoft enhancement comes in the form of the new Azure Monitor Query Editor. This will, for the first time, allow users to query Prometheus metrics using PromQL within their Azure Monitor workspace.

Grafana has long supported PromQL and, interestingly, their update is a response to customers who wanted to query metrics without using PromQL. Grafana have therefore released Explore Metrics - which offers a really powerful visual experience for querying Prometheus metrics. The feature is currently in preview and scheduled to go GA next month.

📣 

By the way, if you are currently using oTel and Prometheus, the OpenTelemetry End User SIG would love to hear about your experiences in this OTel/Prometheus Interoperability Survey - it is very short and painless!

Embrace’s oTel Check-In!

Mobile Observability specialists Embrace became the latest vendor to throw their weight behind the OpenTelemetry project, with the announcement last week that all of their SDK’s are now fully OpenTelemetry compatible. This means that Embrace telemetry can now be forwarded to any OpenTelemetry-compatible backend store. Combined with the recent open-sourcing of the Embrace client SDK’s, this now gives mobile developers tremendous flexibility around debugging and observability capabilities.

ClickHouse Cloud’s Big Makeover

Hyper-scale database provider ClickHouse have announced a major upgrade for their cloud portal. ClickHouse is well known as the backend for platforms such as SigNoz, Groundcover and KloudMate, but users can also send telemetry to the ClickHouse cloud service. The cloud experience has now been re-vamped to provide a smoother and more productive UI. New features include an AI-powered SQL generator, workflows for ingesting and streaming data and a new Settings area to consolidate configuration options into a single place.

Products

Digma - Observability for the Inner Loop

The phrase Observability Driven Development has started to gain a certain amount of currency recently. One problem with the practical application of the concept though, is that developers spend most of their time in the inner loop, whilst most observability tooling is confined to the outer loop. Digma is a tool which seeks to overcome this disconnect by providing continuous feedback within the developer’s own IDE. When installed locally, the Digma engine runs in a Docker container, continually analysing code and displaying traces and analytics in the IDE in real time. At the moment, Digma has support for IntelliJ and Java but integrations with other languages and IDE’s are in the works.

Incerto - Observability at Your Service

There are many obvious benefits to self-hosting an open source observability platform - not least those of cost and extensibility. At the same time though, installing an end-to-end OSS observability solution can be highly complex and require a considerable level of expertise. It is easy to spend weeks getting bogged down in Helm charts and YAML configuration. Incerto are a startup who do all of the heavy lifting for you. The Incerto team have expertise in OpenTelemetry and tools such as Grafana and ClickHouse, and use these components to build custom observability solutions that run on your own infrastructure.

Flip AI - Intelligence As A Service

Flip AI are not a company that can be accused of lacking in ambition - billing themselves as the “future of observability intelligence”. Interestingly, Flip is not an observability platform itself. Instead, it is a kind of Intelligence As A Service layer which sits on top of your existing observability stack. It ingests telemetry from data feeds and combines these with artefacts such as IaC and CMDB data to build up a dynamic map of your infrastructure and services. It is underpinned by a specialist DevOps LLM which requires no additional training. This seems like a really compelling model for integrating AI capabilities into the observability stack. At the moment you will need to register on the web site if you want to see the system in action.

From the Blogosphere

Don’t Do This! - Learning From Bad Observability

Miles Davis once said, "It's not the notes you play, it's the notes you don't play." In a similar fashion, keeping our observability systems in tune can also be a matter of learning what not to do. Stephen Townsend, whose Slight Reliability YouTube channel we have mentioned a few times, has produced this excellent compendium of anti-patterns to avoid in building your observability strategy. The article may be over a year old, but its insights are still very much on point.

Beyond The Single Pane of Glass

The promise of a Single Pane of Glass, a unified portal breaking down silos and providing complete visibility of your IT estate, can be seductive. This in-depth article on the Observability 360 web site explores some of the technical and cultural limitations of these all-in-one systems. It asks if they can become a silo in themselves and looks at an alternative model of composable observability.

Getting a Handle on Errors in oTel

The surfacing of application errors is obviously an essential observability function. What can be problematic though, is ensuring that developers follow a consistent standard for logging errors. This is not as simple as it might seem when some languages differentiate between ‘errors’ and ‘exceptions’, whilst the C language does not even have formal support for error handling. This is an excellent article on the OpenTelemetry blog which includes guidelines on recording errors in spans vs logs as well as a discussion of the visualisation of errors in different backends.

Videos

The Magic of Pixie

Pixie is not just a lightweight and versatile observability stack, it is also one of the applications that pioneered the use of eBPF in observability tooling. Originally developed by New Relic, it was open-sourced and handed over to the CNCF in 2021. This talk by Prerit Munjal from the recent OSS Summit in Seattle is a really excellent deep dive into Pixie’s architecture and features.

The Rise and Rise of EBPF

Bill Mulligan is a community advocate for Cilium and eBPF at Isovalent and his presentations are always worth catching. In this talk, he discusses his involvement with the eBPF project as well as giving an overview of the technology and how it is being used in more and more applications. At the end of the video you will be convinced that “eBPF Inside” will one day be as ubiquitous as the old “Intel Inside” slogan.

Events RoundUp

As the name suggests, Devoxx UK is a developer-focused event. As well as developer-specific themes though, there are also sessions covering topics of interest across a number of disciplines. Observability-related sessions include Continuous Profiling, OpenTelemetry Hands-On and Distributed Tracing in Java - presented by Dotan Horovits of Logz.io. The event runs from 8-10th May.

On June 25-26th the Datadog community will be gathering for the annual DASH event in New York. A DASH ticket includes exam entry so you can actually get certified whilst attending the event!

This Saturday (May 4th) Bengaluru will be the venue for a ClickHouse MeetUp. Dale McDiarmid will be flying in from Portugal to explain how ClickHouse polished off the Trillion Row Challenge - not to be missed!

The following Saturday (May 11th) Last 9 and Probo will be co-hosting an SRE Meetup in Gurugram. This will include a discussion of how the Probo team work at ‘Cricket Scale’ (you’ll be bowled over) as well as a look at High Cardinality with Prometheus.

Back in London meanwhile, the London Observability Engineering MeetUp will be gathering on May 7th to hear Practical OpenTelemetry author Daniel Gomez Blanco talk about Building An Observability Mindset at SkyScanner.

📣 Reminder!

Don’t forget - you can find a fuller listing of events on the Observability 360 calendar.

That’s all for this edition!

If you have friends or colleagues who may be interested in subscribing to the newsletter, then please share this link!

This week’s quote is from Carla O’Dell:

“Put knowledge where people trip over it.”