Observability 360
Posts
Full House For Observability at FOSDEM

Full House For Observability at FOSDEM

Dynatrace Supercharge Their Pipelines, Taming K8S with AI

John Hayes
February 08, 2024

Welcome to Edition #10 of the newsletter!

FOSDEM is not just a major event for the Open Source community, it is now one of the biggest tech events in the world. The 2024 edition took place in Brussels last weekend and there was massive enthusiasm for Observability talks. We have a brief summary below and more details on the Observability 360 web site.

Pipelines are a lynchpin of Observability architecture and we cover major upgrades by both Microsoft and Dynatrace to their pipelining tech.

Amongst this week’s other goodies are a must-read article on LLM data augmentation from the incident.io blog, a new product aiming to slash logging costs and loads more.

Feedback

We love to hear your feedback. Let us know how we are doing at:

[email protected]

https://twitter.com/TheObsGuy

NEWS

Observability in the Limelight at FOSDEM

FOSDEM is one of the biggest tech conferences in the world, with over 8,000 attendees. The event takes place at the Free University in Brussels and the auditorium reserved for talks on Observability was hugely popular, with long queues and standing room only at many sessions.

Some of the highlights of the day included Nikola Grcevski and Mario Macias of Grafana Labs showcasing their Beyla product, a guide to DIY Observability by Robert Hodges and a look at Strategic Sampling with Benedikt Bongartz and Julius Hinze. If you couldn’t make it, don’t worry! We have a summary of the day with links to videos and slide decks on the Observability 360 site

K8sGPT - a Co-pilot for Kubernetes

The market seems to be awash with tools for monitoring Kubernetes instances. K8sGPT is aiming to ease the burden for K8S admins by tapping into AI backends for assistance with diagnostics. Like much AI-based tooling, it works as a co-pilot rather than an autonomous operator.

The tool is built on a set of Analysers which map to K8S resources such as pods, nodes, services etc and continually scan your cluster, looking for errors. It then sends a digest of the error context to the backend AI (it doesn’t have to be OpenAI) and presents the potential fixes to the user. At the moment, it still has an experimental look and feel to it (it is a CNCF Sandbox project) but it could be one to watch.

Dynatrace Unleash OpenPipeline

Dynatrace have really flown out of the traps in 2024. Hot on the heels of announcing new AI and Data Observability features, they have also announced the release of OpenPipeline. As telemetry volumes explode, observability vendors are having to build ever more powerful tools for ingesting data into their backends. According to Dynatrace CTO Bernd Greifeneder, OpenPipeline is architected for “petabyte-scale analytics“ and will enable real-time data analytics on ingest.

Azure Data Plane API Goes GA

In a similar vein to the Dynatrace announcement, Microsoft have now announced the General Availability of the Data Plane API for Metrics. The new API will provide a massive performance boost for customers who need to query Azure Monitor metrics at scale. Enabling faster and more efficient bulk egress from Azure Monitor should benefit customers who pipeline data into third party backends such as Datadog.

Embrace Open-source Their Mobile SDK

In Issue 6 of the newsletter we mentioned Embrace and their mobile-first observability platform. The big news for companies involved in mobile development is that Embrace have now open-sourced their react-native SDK. This follows on from the recent open-sourcing of their Android, Flutter and Apple SDK’s. You can visit the Github repo’s here.

The company have also committed to making their SDK’s fully compliant with the OpenTelemetry specification. If you are interested in mobile observability, be sure to check out our review and walkthrough of the Embrace platform.

Products

Nimbus - Compressing The Cost of Logging

Kevin Lin is a developer and observability specialist who has created Nimbus - a log management tool designed specifically with cost management in mind. Rather than acting as a standalone logging store, it plugs into your telemetry pipeline, optimises your log data and then forwards on to your backend provider. From the Nimbus UI, users can inspect their log stream and enable transformations such as grouping and compacting. The web site claims that customers can achieve savings of up to 60% on logging costs.

Groundcover: eBPF-powered Observability

Groundcover describes itself as a full stack observability platform for cloud-based infrastructure and software applications. It is one of a new breed of observability systems harnessing eBPF to provide observability with zero instrumentation at the code level. It is also one of the growing list of products using a ClickHouse backend to provide low-cost performance and high scalability. Customers on a paid plan have the option of using InCloud, a solution for deploying the Groundcover backend infrastructure into their own cloud environment.

OneUptime - Open Source Observability Tools

The name OneUptime suggests a product dedicated to a single function such as synthetic monitoring. There is, however, more to OneUptime than that - it describes itself as Six Tools In One and appears to be positioning itself as a full stack platform. One of its strongest cards is Log Management - it claims to have the “fastest log ingestion on the planet”. That is obviously a bold claim and it would be interesting to see some benchmarks.

The product suite also includes a no-code workflow builder for automating processes such as sending alerts to Slack channels. The software is open source but there are also paid plans offering additional features.

From the Blogosphere

To the Vector, the Spoils

The RAG pattern has really gained traction over the past year as it allows enterprises to leverage the power of LLM's to gain insights into their own data. This is a fascinating and (occasionally technical) article which details how Incident IO used vector embeddings to mine through their data and discover related incidents. The article explains the techniques involved with great clarity and provides really helpful advice on creating embeddings to find hidden patterns in your own data. Warning: after reading this you may feel the need to vectorize your data :-)

Avoiding oTel Pitfalls With Honeycomb

There was much rejoicing in the observability kingdom when the OpenTelemetry team announced that logging had achieved the 'stable' designation. Whilst this was a big step forward for stability in the oTel framework, it is still a project undergoing continual change on many fronts. This article from the Honeycomb blog looks at how to avoid being tripped up by breaking changes - such as updates to field names. There is also a useful advice on adapting your configuration to make it resilient to changes in expected field values.

Scaling OpenTelemetry With Kafka

There are many possible architectural patterns for building telemetry pipelines. The design will depend on a combination of factors such as scale, system topology, signal types etc. In this article on the excellent SigNoz blog, Nočnica Mellifera looks at the use cases for Kafka and how it can be incorporated into pipelines with very high levels of throughput.

VIDEOS

Optimizing Observability with the OTel Collector

This is a presentation by Bruno Fereira at the Conf 42 DevOps conference, which took place in London last month. The talk is well structured and pitched at just the right level of technical detail. There is an interesting overview of the overall observability architecture as well as a detailed discussion of tail sampling for spans. Eagle-eyed viewers may also spot that Bruno uses the Dash0 oTelBin product which we featured in Edition 7 of the newsletter.

Querying InfluxDB with SQL and Grafana

InfluxDB is a powerful (and open source) backend for storing time series data. One of its advantages is that you can query your data without having to learn a proprietary query language. This video by Jay Clifford, a Developer Advocate at InfluxDB, looks at using the Flight SQL plug in - which enables users to run SQL queries against the underlying Column Store DB. As the video is mostly concerned with querying, it assumes you already have buckets populated with data. There is, however, a GitHub repo with scripts for setting up the data pipeline.

Events

LEAP 2024: API Observability Conference

LEAP 2024 is a free, 1-day, virtual event dedicated to equipping attendees with advanced API observability skills and strategies. The event includes speakers from leading names such as Miro, DataDog, Grafana, Dynatrace, ServiceNow and Dash0. It will be of value to platform teams seeking tooling and techniques for a deep understanding of their entire platform stack. It is also billed as an opportunity to unlock the business value of observability and network with industry experts and thought leaders.

📣 Reminder!

Don’t forget - you can find a fuller listing of events on the Observability 360 calendar.

That’s all for this edition!

If you have friends or colleagues who may be interested in subscribing to the newsletter, then please share this link!

For this week’s quote, we leave you with a thought from Mr Bob Dylan:

“Sometimes it's not enough to know what things mean, sometimes you have to know what things don't mean.”.