System Initiative - IaC Reinvented!

Voyager - Observe's Next Generation | Dynatrace Business Flow

Welcome to Edition #26 of the newsletter!

Getting Down To The Basics

Observability is not merely a technical niche or just another sector in the marketplace. It is also a sphere with a distinct intellectual life and many active debates around major themes such as the Cost Crisis, the nature of Open Source, OpenTelemetry and even the perennial “what is observability?” debate. In this edition of the newsletter though, I think that more fundamental and down-to-earth themes come to the fore. Observability is about dealing with a specific set of business problems. We look at Business Flow in Dynatrace - a plug-in for joining up the dots of business processes across distributed systems, Resolve - an AI-driven tooling for incident resolution and practical guidance on FinOps from GigaOm.

In Search of Merch

In this fortnight’s edition we introduce a new section - Swag Corner. We are in peak conference season and multitudes of delegates will be milling around exhibitor booths hoovering up the usual staple of pens, laptop stickers and T-Shirts that they will never wear. Swag Corner is an occasional (and tongue in cheek) salute to the vendors who are going the extra mile with their freebies and bling.

Open Source Observability Day

As we mentioned in the last edition, Observability 360 is proud to be a media sponsor for this month’s Open Source Observability Day. This is a free, virtual event with a host of great speakers. Check out the program here.

Feedback

We love to hear your feedback. Let us know how we are doing at:

NEWS

System Overthrow - IaC Reinvented!

If you have ever had to grapple with a 3,000 line Helm chart to deploy your observability infrastructure, you may be forgiven for thinking that there must be a better way to do this. Whilst YAML has a certain formal elegance, its syntax struggles to express the architectures and relationships embedded in highly complex systems.

Whilst Pulumi have tackled this problem by enabling the use of high level programming languages for IaC, System Initiative are taking a fundamentally more radical approach. Their goal is nothing other than completely reinventing IaC from the ground up. The blog article for the launch of the product is an incredibly ambitious statement of intent. The terms ‘game changer’ and ‘paradigm shift’ tend to be thrown around somewhat casually, this might be a case where their usage is appropriate.

So, what are they proposing? Well, System Initiative is IaC without the code. It is a kind of digital canvas where you manipulate digital twins of your systems. Is the future here or is this the Platform Engineering equivalent of science fiction? Read the article and decide for yourself!

Grafana ObservabilityCon - The Big Reveals

Grafana’s ObservabilityCon can normally be relied upon to showcase some major new releases, and this year’s event at New York was no exception. Grafana were one of the first vendors to offer automated regulation of metrics ingestion with the rollout of last year's Adaptive Metrics feature.

This year, the concept has been extended to Log Management as the company launched their Adaptive Logs technology. As with its metrics counterpart, this feature analyses patterns of log querying and usage and then makes recommendations for which types of logs can be ‘dropped’. According to some estimates, up to 90% of logs collected by observability systems are redundant, so the potential for savings is significant. Indeed, Grafana have reported that the Adaptive Metrics feature has resulted in average savings of 35%.

To complete the set on the classic “three pillars” of observability, Grafana also announced that they have acquired the TrailCtrl startup to accelerate the development of an Adaptive Traces technology.

Voyager - The Next Generation of Observe

Observe are not a company who do product launches by halves and for the release of the latest Voyager iteration of their product they pulled out all the stops. The launch video included CEO Jeremy Burton being beamed up to the studio in a Star Trek-style teleporter. Beneath the whizz-bang, the video announced updates to the product in three main areas - APM, Snowflake Observability and AI-driven incident resolution. The APM updates include capabilities such as service discovery and mapping, RED (Request, Errors, Duration) metrics and telemetry correlation.

The incident resolution is driven by an Agentic AI called the O11y Investigator. Taking an Agentic approach means that O11y goes beyond merely passing an alert string to an LLM. It orchestrates interactions between the LLM and dedicated tooling which can provide context for the incident.

Overall, whilst Voyager is slick and powerful, it is probably reflective of the general state of the art rather than being light years ahead of the competition.

Victoria Metrics Unveil SaaS Product

Victoria Metrics has established itself as one of the most popular time series databases on the market. The product has clocked up an amazing 650 million downloads and built up a loyal following in the community. Up until recently, the product was only available as an on-premise install, but the company have now rolled out a cloud service.

The value proposition is that users can avoid the overhead of having to maintain their own infrastructure whilst also making considerable cost savings in comparison to other cloud vendors. They say that, depending on usage patterns, VM Cloud can be 40 times cheaper than the Amazon Managed Service for Prometheus price, and 228 times cheaper than the Google Cloud Managed Service for Prometheus.

Products

Resolve AI Breaks Out Of Stealth

Resolve AI running inside Slack

A number of recent surveys have shown that MTTRs have remained stubbornly constant in recent years. Resolve AI is new tool aiming to push that needle, harnessing the power of LLM's to support engineers by automating diagnostics and issue resolution.

A basic premise of the product is that today’s software systems are highly complex and there is an excessive cognitive load involved in trying to gain a deep understanding of infrastructure, configuration, architecture and services. Resolve works by gaining a deep understanding of your systems and is also able to autonomously connect to tools such as GitHub and Kubernetes to run commands. The system has a two part architecture, with an agent continually running in the background, whilst the UI is dynamically generated in your preferred comms tool - e.g. Slack or Zoom.

OpenText IT Operations Cloud

OpenText is a Canadian company that describes itself as a world leader in information management. Although the name may not be familiar, they claim to count 98 out of the top 100 global companies as customers and their extensive product portfolio covers a wide range of verticals - from IT Ops to Content Management. Indeed, navigating the company web site is a dizzying experience, criss-crossing a huge array of inter-related micro-products. The company’s product listing page contains no less than 200 products!

The OpenText observability offering falls under the IT Operations Coud rubric, which integrates ITSM, Observability, AIOps and FinOps solutions. The Observability solution itself comprises packages for Application Observability, Infrastructure Observability and AI Ops. The tooling also leverages AI, in the form of the IT Operations Aviator, which provides assistance with tasks such as ticket management and incident troubleshooting.

Dynatrace Business Flow

Dynatrace Business Flow is an app available on the Dynatrace marketplace and which plugs in to the Dynatrace platform. Although it is not a standalone product, it stands out as a great example of an approach which uses a key organising principle to analyse and filter observability data to provide relevant business insights.

Much observability tooling focuses on discrete data sources, which can result in an atomised view of your enterprise. Business Flow allows users knit together a number of steps across multiple system boundaries to trace out an end to end view of a particular business process. Data can be streamed from numerous sources including log files, external API’s and the Dynatrace OneAgent. Each step consists of up to five Business Events and the flow can also contain branching logic. This is a really powerful tool for creating complex business analytics using a no-code GUI.

From the Blogosphere

Fermyon Take OpenTelemetry For A Spin

In Edition 11 of the newsletter we covered the implementation of OpenTelemetry support on the WasmCloud platform. Now Fermyon have followed suit with the rollout of OpenTelemetry support in their Spin framework for developing WebAssembly components. The support is provided in the form of a plugin which emits logs, metrics and traces for your service.

Fermyon have also built their own docker image with a full set of tools for viewing your telemetry. This includes an oTel Collector as well as instances of Loki, Prometheus, Jaeger and Grafana.

In keeping with the particle physics lingo, we might say that whilst this is not a quantum leap it certainly buts Spin in a super position

Using SLO’s For Mobile Reliability

SLO’s have emerged as an important tool in the SRE armoury for ensuring reliability and quality of service. If you are developing mobile apps it is critical that your SLO’s monitor not only the performance of backend systems but also the user experience in the front end of your app. Whilst some of the differences between backend and mobile observability maybe relatively intuitive, others may be more nuanced and intricate.

Crafting mobile SLO’s entails considering not just application-specific metrics such as session load time or runtime errors, it also involves considerations at the device level - such as network connectivity or user settings. This article by Virna Sekuj of Embrace offers a number of key insights for calibrating your mobile SLO’s.

Cost Management

GigaOm Radar For Cloud Resource Optimization

In Edition 20 of the newsletter we featured the GigaOm Radar for Cloud Observability and described it as the “gold standard” for industry analysis, as the rigour and clarity of the report were quite exceptional.

The company have now released their Radar for Cloud Resource Optimisation and we think it is a valuable resource for any practitioner involved in Cloud Administration. The report analyses 11 “leading vendors” and assigns a rating across a range of criteria such as Integration with Existing Tooling or Resource Reconfiguration.

The standard of research in the report is, once again, first class and for us the value of the document lies not just in the vendor ratings but also in the rigorous methodology which underlies the analysis and which users can easily apply to their own research in this marketplace.

IBM Snap Up Kubecost

With cost management remaining a top priority for many enterprises, it is not surprising that we have seen a rapid expansion of the FinOps space. Kubecost is regarded as one of the leading tools for managing Kubernetes costs at almost any scope - from a single namespace right up to collections of clusters running across clouds. Last month it was announced that the company is being acquired by IBM, who will integrate the product into their FinOps stable alongside Apptio and Turbonomic, to provide a broad-based solution for optimising cloud costs as a whole.

IBM may be regarded by some as a creaking behemoth, but their share price is currently at a 10 year high and this acquisition seems like a savvy piece of business.

💎SWAG CORNER

Let's face it, the highlight of many events is not the main vendor's keynote speech, it is the chance to get your hands on some exclusive swag that will be the envy of your colleagues when you get back to the office. We kick off this occasional feature with some geek chic that most techies will be drooling over.

First up is this beautiful piece of booty from last week’s Grafana ObservabilityCON in New York. Yes, you are not mistaken! It is your very own miniature wind tunnel!

Not only is it a gorgeous piece of design, it is also fully functional and can send telemetry to a backend! It was built by Grafana CTO Tom Wilkie and given away to one lucky attendee.

The next item is a total revelation. Who knew that there was such a thing as a CERN shop selling tapes with unique recordings of data from the Large Hadron Collider?

The item above was on show at the recent KCD Days event in Porto and includes actual sample data from a Higgs Boson event. Not owning it leaves a black hole in our lives ⚫

Videos

An Expert View on eBPF in Observability

As we mentioned in our recent round-up of eBPF on the Observability 360 web site, not all eBPF implementations are the same. Whilst the standard libraries can open up the bonnet on the Linux kernel, it is still a highly complex engine and fine-tuning it requires considerable engineering skill. Odigos is one of the leading eBPF-based Observability stacks and in this video company co-founder Eden Federman addresses some of the engineering challenges involved in ensuring that eBPF-based systems are robust and performant.

That’s all for this edition!

If you have friends or colleagues who may be interested in subscribing to the newsletter, then please share this link!

This week’s quote is from the Dutch computer scientist Edsger Dijkstra:

“Simplicity is prerequisite for reliability”