Observability 360
Posts
Observe Inc Turn Up The Heat

Observe Inc Turn Up The Heat

Observability for LLM's | K8S Network Visibility With Retina

John Hayes
April 04, 2024

Welcome to Edition #14 of the newsletter!

Whilst controlling costs seems to be a priority for end users, investor cash continues to pour into the observability space. In our last edition, we reported on a big funding round for Chronosphere. This time round it is Observe Inc who are receiving a major cash injection. Observe profess to be gunning for “legacy players”. However, the market is still expanding, and growth is not necessarily a zero sum game.

The AI revolution continues apace and it is a theme which features strongly in this issue. As well as incorporating AI capabilities into their tech stacks, observability platforms are also having to adapt to provide visibility into the LLM systems which are increasingly being leveraged in application development.

KubeCon may have packed up and bid ‘adieu’ to Paris, but it has still left us with plenty to reflect upon. As well as product announcements, we also list some essential talks to catch up on.

Feedback

We love to hear your feedback. Let us know how we are doing at:

[email protected]

https://twitter.com/TheObsGuy

NEWS

Observe Inc Secure $115m Series B Funding

Major investors including Snowflake and Capital One have contributed to a $115m cash injection for full-stack observability vendor Observe Inc. The company has posted impressive figures for growth and earnings over the past couple of years and believes that its unified backend architecture (built on a Snowflake data lake) gives it a competitive advantage over vendors with more segmented approaches. In an extremely bullish statement, company CEO Jeremy Burton took a swipe at ‘legacy players’, describing them as “dead men walking” who are “shackled by outdated architectures”. This is fighting talk, but prising enterprise customers away from systems where they have already made a major technological investment can be a long haul. They will also face contenders such Chronosphere and Coralogix, who seem to be targetting the same end of the market.

LLM’s - The Need For New Golden Signals

As engineers in observability, we are familiar with the classic MELT (metrics, events, logs, traces) set of signals. The growing adoption of LLM’s in enterprise development though, means that the observability toolkit needs to be updated. Additional signals are now needed to monitor the particular behaviours and potential failures of LLM pipelines - one very obvious example being the need to monitor GPU as well as CPU usage. In this article Bijit Ghosh, a thought leader on AI applications, outlines the major issues involved in monitoring LLM’s, as well as looking at specialist tools such as LangSmith and Phoenix. This is an essential read if your company is incorporating LLM’s into its development processes.

Cleric - an AI ‘teammate’ for SRE

The dust had barely settled after the unveiling of Devin - the AI for Development, when the imminent of arrival of Cleric, an AI SRE ‘teammate’, was announced. The aim of Cleric is, apparently, to relieve SRE’s of the burden of on-call support. Whilst this press release makes some ambitious claims for the system, the company’s home page itself is pretty short on specifics and the system is still in closed preview. In a blog article, the founders of the company seem to see the current release as the initial step along the road to a fully autonomous operator.

FluentBit Release V3.0

The FluentBit project has been a phenomenal success - having racked up an incredible 13 billion downloads from DockerHub - with 12 billion of those occurring in the past two years. At last month’s KubeCon, Version 3.0 of the product was announced, boasting some major new features. One of the biggest updates is support for running SQL queries against your logs and traces. There is also a Metrics Selector which allows users to easily filter out certain metrics types as well as support for HTTP/2.

Products

Chaos Mesh - Open Source Chaos Engineering

In Edition 8 of the newsletter, we mentioned Red Hat’s Kraken Chaos Engineering tool. Chaos Mesh is another highly capable open source Chaos Engineering tool which combines simplicity with power and versatility. It is designed for Kubernetes and runs as a CRD, but can also be deployed to Kind and MiniKube. Chaos Experiments can be run either via the UI or by YAML scripts and there is also a Workflow engine for managing complex scenarios. The system also ships with a number of experiment templates tailored to specific clouds - e.g. Re-start an Azure VM or Detach an AWS Volume.

Parseable - Log Analytics With Agility

Platforms such as Loki and LogStash can be great for managing enterprise logging at scale. There are, however, other use cases and Parseable is perfect as a lightweight, standalone Log Analytics solution. It has a small resource footprint but is capable of ingesting large volumes at high speed. It supports a number of agents for log ingestion, including FluentBit, Vector and LogStash. It also integrates with Kafka, Grafana and Prometheus. The product is open source and free, with a pricing plan for users who need premium support.

Retina - Network Monitoring for K8S

One of the big announcements at last month’s KubeCon event in Paris was the open sourcing of Retina by Microsoft. There is already a plethora of K8S monitoring tools - but with Retina the focus is on visibility of network traffic. Rather than monitoring resource usage or pod health, Retina is designed to monitor issues such as latency, connectivity and DNS errors. It can then forward telemetry to the backend storage of your choice for further analysis. This may sound similar to the Tetragon tool we recently featured. One major difference is that Retina is purely a monitoring tool - it does not have policy enforcement capabilities.

From the Blogosphere

Causely - A Journey Into Causal AI

There are many products on the market which describe themselves as having AI capabilities. In this article on the Observability 360 web site, we take a journey to discover the meaning of Causal AI, and the crucial distinction between causality and correlation. Along the way we take in ice cream, Vitamin D, factory sirens and even Wittgenstein. The article then goes on to look at how the Causely system can leverage your OpenTelemetry traces and Causal AI for both predictive and root cause analysis of system errors.

Trace and Profile Correlation with Polar Signals

Polar Signals is an enterprise profiling tool based on the open source Parca project (and built by Parca engineers). The latest version of the tool now has the ability to map profiles to OpenTelemetry trace ids. Whilst this is not unique (Grafana recently announced the same capability), this article goes into really fascinating low-level detail on how the OpenTelemetry trace_id is retrieved, digging into a currently executing goroutine and extracting the value from a Go Map. For extra geek points you can follow links to the full source code on GitHub.

Videos/Podcasts

Cilium/eBPF Day At KubeCon

There was massive interest in Cilium and eBPF at last month’s KubeCon in Paris. As well as packed meetings, there were even long queues as people waited for Liz Rice to sign copies of her Learning eBPF book! Over the course of the event there were no less than 19 different talks on Cilium and eBPF. You can now catch up on all of them thanks to this YouTube playlist.

OpenObservability Podcast - oTel Profiling

As we mentioned in our last edition, the OpenTelemetry project has now announced full support for Profiling. In this rather timely podcast, Dotan Horowitz of Logz.io discussed the issue with Felix Geisendörfer of Datadog and Ryan Perry of Grafana. If you are interested in the topic of Profiling then this is an essential watch, as two of the leading engineers in the field discuss both the technical intracacies of profiling as well as the decision-making processes within the OpenTelemetry Profiling SIG.

Front-End Observability with OpenTelemetry

User experience is an area of critical interest for e-commerce applications. Whilst there is an OpenTelemetry SIG for Client Instrumentati on, it has not yet produced a full specification for browser telemetry - which means that vendors have had to fill the gaps with their own solutions. This talk by Purvi Kanal of Honeycomb provides really valuable and detailed guidance on issues such as context propagation and instrumenting for Core Web Vitals.

Events RoundUp

KubeCon Paris may be over but there are still plenty of major events for your diary.

On April 13th, KCD will be rolling into Pune, India for a day of keynotes, sessions and workshops. On April 15th, GitOpsCon North America will be taking place in Seattle. This is a full day event featuring speakers from Microsoft, IBM and Adobe.

On May 8th Devoxx UK 2024 will be kicking off in London. This is a three-day event featuring some 170+ sessions. Although the event is developer-focused it will include sessions on Profiling, OpenTelemetry and Distributed Tracing in Java.

Monitorama describes itself as an “event for Monitoring and Observability practitioners” and will be taking in place in Portland, Oregon from June 10-12th. There is a slightly quirky feel to the event, with sessions such as The Haters Guide to OpenTelemetry and How We Tricked Engineers into Utilizing Distributed Tracing. Also in June, stackconf will be running in Berlin from the 18t-19th. The event includes speakers from AWS, Intel, Red Hat, Isovalent and many other A-List organisations. Featured technologies include OpenTelemetry, Prometheus and Victoria Metrics.

📣 Reminder!

Don’t forget - you can find a fuller listing of events on the Observability 360 calendar.

That’s all for this edition!

If you have friends or colleagues who may be interested in subscribing to the newsletter, then please share this link!

This week’s quote is from the American computer scientist Grace Hooper:

“One accurate measurement is worth a thousand expert opinions.”