- Observability 360
- Posts
- A Milestone for OpenTelemetry
A Milestone for OpenTelemetry
plus Azure Chaos, CloudWatch AI and much more
A Logging Milestone For OpenTelemetry
Welcome to newsletter number six and, as usual, there is a lot happening in the observability space. Probably the most significant story is the announcement that Logging in OpenTelemetry has now been announced as stableđž
The giants of cloud technology have also been busy, and there are exciting releases from both the Azure and AWS stables. We also cover an eye-opening critique of verbosity in K8S metrics from the Open Observability team and much more besides.
Feedback
As practitioners in the field, you will know that every good observability system needs a feedback loop. Let us know how we are doing at:
NEWS
OpenTelemetryâs Logging Milestone
Possibly the biggest observability story at this yearâs KubeCon was the announcement by Morgan McClean, an OpenTelemetry co-founder, that âOpenTelemetry Logging hit 1.0â. The OTel logging architecture consists of four main components and all of these have now been designated as stable. Achieving a consensus around logging was always going to be a monumental task given its relatively loosely structured nature and the fact that there are so many varied implementations in the wild.
One of the major gains of the release is that Logging runs in the context of the Collector, meaning that logs can more easily be correlated with other telemetry such as traces and metrics. OpenTelemetry have a keen awareness that âlogs are relatively computationally expensive to captureâ and have attempted to tackle this issue by specifying a new logging data model, which aims to reduce both CPU consumption and storage requirements.
Azure Chaos Studio Goes GA
The Netflix Chaos Monkey has long been part of SRE folklore. Now Azure users can also officially start breaking things in the name of resilience testing, as Microsoft have announced the Generally Availability of Azure Chaos Studio. As you would expect, it offers a range of features for testing the behaviours of your distributed applications in the event of different types of failure. Chaos Studio allows users to build and run âexperimentsâ, where you can specify target resources, and then define faults to test their resilience. The experiments can then be used in DR simulations or even incorporated into CI/CD pipelines.
Boost Your Mobile Coverage With Embrace
According to this study, native mobile applications account for nearly 90% of mobile device usage. This means that native apps are not only a key component of digital strategy but also that they need to be factored into overall observability workloads. Embrace describes itself as a mobile-first observability platform. As well as providing the typical application-level instrumentation, it also captures mobile-specific diagnostics in areas such as networking, device and OS performance to provide a full picture of the user experience. The Embrace toolkit provides SDKâs to integrate with all the major mobile development platforms and their API supports integrations with Grafana, DataDog, New Relic and other providers.
AWS Logs Get Intelligent
AWS re:Invent is now in full swing and one of the new features on show is natural language querying for the AWS Cloudwatch service. This means that users can now query their logs by asking questions such as âShow me the 10 slowest Lambda requestsâ. The feature also offers line-by-line query explanation as well as refinement of existing queries. The technology is currently in preview and it would be interesting to know how it works in practice. If you have tried it out, feel free to share your experience with us via email or Twitter.
DevOps Dozen Nominations Announced
The DevOps Dozen is a set of annual awards organised by DevOps.com. The nominations make for interesting reading â not least for the absence of some big names and the presence of lesser known brands. There are some 18 products in the running for the Best Observability Solution category. Alongside established names such as New Relic, Grafana and HoneyComb, there are also nominations for newcomers such as Edge Delta and vFunction. Voting is open until 31st December and anyone can take part.
From the Blogosphere
Splunk Blog - Observability Shifts Right
Normally we hear about DevOps culture representing a shift to the left - so that practices such as testing occur early on in the development lifecycle. In this blog article, Wiliam Cappelli, a thought leader at Splunk, looks at the equally important process of shifting to the right and encompassing domains such as Service Management. His discussion of CMDBâs may possibly raise a few hackles amongst ITIL purists. In contrast to some more orthodox opinions, he argues that the IT estate of large enterprises has always been âtoo complex and volatileâ to be captured in a CMDB and that CMDBâs can at best only be loosely coupled with observability frameworks.
Measuring Service Mesh Performance
One of the choices that engineers need to make when spinning up a K8S cluster is whether to swap out the default CNI (Container Network Interface). There are a number of service mesh products to choose from, but each have different algorithms and functional priorities for features such as network traffic management. This is an interesting study by Eman Aktas on the Trendyol Tech blog, where he benchmarks network performance for different service mesh implementations. The tests compare the performance of Cilium, Calico and Flannel across a number of network traffic scenarios both on bare metal as well as on a CloudStack instance.
Cost Management
Is K8S Talking Too Much?
If you are running Kubernetes clusters, you will know that they emit very considerable volumes of metrics. The chances are that these are being funnelled into your observability system and potentially resulting in ingestion and storage costs. In this episode of Open Observability Talks, Aliaksandr Valialkin, CTO of VictoriaMetrics suggests that up to 75% of the metrics generated by Kubernetes could be superfluous for most purposes. This is a really instructive piece that calls for some standards around actionable metrics.
VIDEOS
Building Resilience at Santander
In this presentation at RoachFest 2023, Thomas Boltze of Santander stresses both the technical and also the cultural dimensions involved in building resilient systems. This video is only 18 minutes long but it covers a lot of ground, including the importance of âthe right mindsetâ, as well as giving an overview of the Santander platform (AWS/Kafka/Cockroach). This is a really lively and informative talk that will be of will be of interest not only to SREâs but anyone with an interest in the topic of resilience.
Predictive Analytics With InfluxDB
Your observability system will probably amass a huge volume of time series data. Naturally, this will provide great insights into current and past system performance. This video from InfluxDB looks at using tools such as Quix and Hugging Face to harness your data to build models for predicting future trends and identifying anomalies. One of the interesting takeaways from this video is that InfluxDb is used by CERN for processing data from the LHC. Chapeau!
Events
HoneyCombâs LLM Journey
If you are a customer of Honeycomb â or if you are just interested in how LLMâs can be leveraged in observability systems, then this webinar may be of interest. It looks at the journey of HoneyComb engineers as they built their AI-driven Query Assistant, a tool which has been well received by Honeycomb users.
Skill up in ClickHouse
Choosing the right backend storage system is a critical architectural decision for observability infrastructure. Full-stack providers such as SigNoz have achieved significant performance advantages by adopting the ClickHouse platform. It is also the storage system of choice for a number of hyper-scale businesses such as eBay, Uber and Cisco. You can now sign up for a free ClickHouse Fundamentals training course. This consists of 6 hours of expert-led tuition spread over two sessions. It should be of value for anybody with an interest in column storage technology or in observability architecture in general.
đŁ Reminder!
Donât forget - you can find a fuller listing of events on the Observability 360 calendar.
Thatâs all for this edition!
This week, we will leave you with this rather apt and succinct quote from UptimeRobot.
âMonitoring is like seeing the tip of the iceberg, while observability dives deep into the unseen layersâ.