Cloudflare Swoop For Baselime

Alloy Joins the Grafana Family | Full Steam Ahead For OpenTelemetry

Welcome to Edition #15 of the Newsletter!

The observability market continues to expand and attract new entrants. The latest of these is Cloudflare, who took everybody by surprise with their acquisition of full-stack vendor Baselime. It is unlikely that they will be the last big tech company eyeing up an acquisition in the observability space.

The OpenTelemetry juggernaut rolls on! There now seems to be an unstoppable momentum behind the initiative and in the past few weeks New Relic, Elastic and Prometheus have all reaffirmed their support for the project with some major announcements. OTel may have only just applied for CNCF Graduation but there seems little doubt now that it represents the future of observability.

We end this week’s newsletter on a fun note with the (almost) legendary “Shouting in the Datacenter” video. This is a flashback to the good old days of 2008 that is bound to bring a smile to your face. If you come across any similar gems, be sure to let us know!

Feedback

We love to hear your feedback. Let us know how we are doing at:

NEWS

Cloudflare Swoop for Baselime

In a move that seemed to come completely out of the blue, Cloudflare announced to the world they have acquired full-stack observability vendor Baselime. The move may have been unexpected but makes sense in terms of Cloudflare’s current trajectory. Although the company may be best known as a cybersecurity service provider, they have also built out an extensive CDN, web hosting and development platform. Underpinning that infrastructure with an observability function would seem to be a natural progression. A post on the Cloudflare blog has assured existing Baselime customers that, not only will they be able to continue to use the product, but that paid features will now be available for free.

Alloy - Grafana’s Gold-Plated oTel Collector

GrafanaCON can normally be relied on for some big product announcements - and this month’s event in Amsterdam did not disappoint. Top billing went to the launch of Alloy - which builds a Grafana stack-friendly chassis around the OpenTelemetry Collector. Whilst remaining fully compatible with the OpenTelemetry Collector specification, it builds additional layers of functionality around it such as embedded debugging and built-in clustering. The defining feature of Alloy though is Components. These are re-usable blocks of logic that, essentially, bring programmability to the oTel Collector. This is a major leap forward in envisaging the Collector as a multi-faceted pipeline rather than a passive gateway.

Thoughtworks Radar Reaches Vol 30

The Thoughtworks Radar is probably the closest thing that IT has to the music charts - and Edition 30 has just been published. So who are the hits and misses in the latest volume? In the Platforms section, the CNCF CloudEvents specification has now been assigned ‘Adopt’ status - which will sit well with those obesrvability platforms that made the decision to include event signals in their telemetry streams. Meanwhile Chronosphere, recently boosted by a $115m investment round, and HyperDX - a developer-friendly open source observability stack, have both been assigned ‘Assess’ status. In the Tools category, Tetragon - the eBPF network policy tool we recently featured - has also been assigned to the ‘Assess’ pile. The Radar covers a broad sweep of the tech horizon and is a valuable tool for keeping abreast of industry trends.

New Relic Commit To Open Source Tech

New Relic’s Chief Product officer Manav Khurana used last month’s KubeCon as the stage for announcing a major strategic shift to supporting open source tooling. Probably the biggest development is full support for OpenTelemetry - making it a first-class citizen within the New Relic platform. This means that all telemetry ingested either via an oTel SDK or the oTel Collector’s exporters will be processed and presented just as if it were emitted by a New Relic agent.

The company also announced native support for Prometheus-instrumented hosts and Kubernetes clusters in a move which was described as “meeting developers where they are“. This, of course, sounds very enlightened, but it is also a recognition of the fact that open source technologies are part of the observability landscape and commercial vendors need to adopt strategies of co-existence rather than resistance.

Products

Langtrace AI - Tracing for LLM Apps

Langtrace AI is the latest tool on the market offering observability for LLM Apps. It is very much fresh out of the traps - the initial commit to its GitHub repo was made only three weeks ago. The product is open source so you can self-host, but there is also a SAAS version. It provides full OpenTelemetry tracing support and also provides metrics around costs, accuracy and latency. It offers support for the Pinecone and ChromaDB vector databases and integrates with OpenAI and Anthropic LLM’s. There is also an integration for viewing your traces in SigNoz. There is an ambitious list of new features on the project’s backlog and it is likely to evolve quickly.

Steadybit - Chaos Engineering ‘made easy’

There are already a number of Chaos Engineering tools on the market. Steadybit is a new entrant which has an emphasis on ease of use and, on the basis of our impressions, it does deliver on this promise. Setting the application up involves installing an agent which runs a scan to discover your system infrastructure and compile a catalogue of your resources. Once that process has completed you can use a very intuitive UI to build experiments on a drag and drop canvas. You do this by pulling in ready-made tasks such as ‘Stop Container’ and selecting targets from a drop-down list. You can also add new targets and attack types by rolling your own extensions. Unfortunately, all this power does not come for free. There is a two-week free trial but thereafter pricing starts at $1k per month.

vunet - Big Picture Observability

Picture courtesy of the vunet web site

Observability is a relatively young discipline, but it has already experienced a number of evolutionary leaps. One of the forces shaping its future development is the impetus to spread out beyond the confines of the IT department and join the dots together across wider parts of the enterprise. vunet is a platform which appears to be aligning itself with this goal. It aims to enrich telemetry streams with business context to enable both technological oversight as well as operational intelligence. We have not yet evaluated the vunet platform so we can’t comment on whether it delivers this in practice. The vision they set out though, may be a glimpse of the future direction of corporate observability.

OpenTelemetry

Elastic Donate eBPF Profiler to oTel

Image courtesy of the Elastic blog

In Edition 13 of the Newsletter we featured the OpenTelemetry decision to adopt profiling as a signal. At the time, Elastic pledged to ‘donate’ their eBPF profiler to the OpenTelemetry project and they have now followed up on this in a post on their blog. The post confirms that the profiler source code has been released under an Apache 2 license. At the moment it still resides in the Elastic organisation repo but presumably ownership will be transferred to OpenTelemetry in due course. The profiler is a highly sophisticated tool that has full oTel backend support as well as the capability to correlate profiling data with distributed traces - a really powerful feature. Budding observability system builders can take a peek at the code in this GitHub repo.

Semantic Conventions for LLM Observability

More and more companies are integrating LLM’s into their development stacks and a number of vendors such as New Relic and Datadog have already incorporated LLM observability into their portfolio. It is not surprising then to hear that there is now an OpenTelemetry Working Group whose aim is to provide standards around semantics for LLM observability. This week the group celebrated a small but significant milestone with the merging of their first PR into the oTel GitHub repo. This is a very short article but it provides useful links for anyone interested in following the progress of the Working Group.

Prometheus Declare Commitment to oTel

In a major blog posting, the Prometheus team have affirmed their commitment to the OpenTelemetry project. To an extent, the announcement is about putting a positive spin on bowing to the inevitable. Obviously, there are serious practical hurdles to overcome - as the blog post notes, some of the changes involve a “fundamental departure from the original data model of Prometheus“. Prometheus is deeply embedded in countless systems so transitioning to new semantics or protocols will need to be carefully managed to avoid disruption. This is a really interesting blog article which summarises the thinking of the Prometheus team on how they intend to move forward.

Videos/Podcasts

Exploring the OpenTelemetry Resource Entity

Whilst Logs, Traces and Metrics may be at the core of the oTel specification, there are a number of other entities and types which need to be understood in order to build rich and robust telemetry pipelines. In this highly informative and carefully structured video Michele Mancioppi of Dash0 (the people behind oTelBin) explores the Resource entity and its value in providing context for your telemetry.

Clinical Troubleshooting With Dan Slimmon

If you've ever been involved in a war room for a major outage, you will know that coordination and focus can be difficult to achieve. Sometimes there is no runbook to fall back on and the pressure is on because critical services are down. This is a really informative episode of the Slight Reliability podcast where experienced SRE Dan Slimmon describes a procedure he has developed for keeping incident responses on track.

Shouting In The Datacenter!

We leave you with a sensational video from 2008 that we have only just discovered. Normally the videos we recommend are tutorials or technical discussions. This one is different, it is just a joyous celebration of engineering creativity and problem-solving. Enjoy!

That’s all for this edition!

If you have friends or colleagues who may be interested in subscribing to the newsletter, then please share this link!

This week’s quote is Hungarian mathematician Abraham Wald’s brilliant insight into ‘survivorship bias’:

“Gentlemen, you need to put the armour plate where the bullet holes aren’t because that’s where the holes were on the planes that didn’t return.”