Observability 360
Posts
The $1m Line of Code!

The $1m Line of Code!

Sumo Logic's $0 Ingestion Fees | oTel Adopts Profiling

John Hayes
March 21, 2024

Welcome to Edition #13 of the Newsletter!

Cost management continues to be one of the dominant themes in observability this year and it figures prominently in this edition. Our headline story features some amazing examples of minor errors that have resulted in huge bills.

There are also two major OpenTelemetry stories. Even though the proposal for CNCF Graduation is a procedural milestone, in practical terms, it is probably not as significant as the adoption of profiling as a telemetry signal.

The observability market may be expanding, but the competition amongst full-stack vendors is hotting up. We cover the latest moves by Chronosphere and Sumo Logic in the battle for market share.

Feedback

We love to hear your feedback. Let us know how we are doing at:

[email protected]

https://twitter.com/TheObsGuy

NEWS

The $1m Lines Of Code!

Most of us have experienced the anguish of bill shock at some point. Being hit with a huge bill for mobile roaming charges on return from your holiday or getting a penalty notice for an inadvertent motoring infringement that happened weeks back. Those are just small pinpricks though, compared to the 50,000 volts of financial burn felt by companies mentioned in this transcript of a scintillating talk by Erik Peterson, CEO of CloudZero. He argues, persuasively, that engineering decisions are buying decisions. In the case mentioned in the headline, a decision to turn on one section of debug code led to vast volumes of logs being emitted and racking up over $1m in costs.

These cautionary tales emphasise the point that successful observability practice requires a culture of collaboration across engineering terms - as well as highlighting the importance of a coherent overall observability strategy.

Chronosphere Bulk Up Again

Chronosphere have secured a further $116m in funding as they shape up to take on the major players at the top end of the full-stack observability market. The latest tranche of investment takes the value of the company up to a formidable $1.6bn. Having already developed a sophisticated core product and pulled off the strategic acquisition of the Calyptia platform, it would not be surprising to see a chunk of the new funds being spent on a marketing push. Indeed, they have already teamed up with Nike and are offering a voucher for custom Air Force Ones to prospective customers signing up for a 30 minute chat with a Chronosphere expert.

Sumo Roll Out Zero Ingestion Fees

Observability vendors are increasingly focusing their marketing on pricing, and Analytics platform Sumo Logic have thrown out a curveball to the competition with their newest pricing plan - which dangles the carrot of zero fees for unlimited ingestion of logging data. Under the new pricing model, customers only pay when they ‘scan’ their data - e.g. by running a query on logs. This is a bold move but it does have the drawback that pricing and budgeting may be more difficult to calculate compared to traditional ingest and store models.

According to the Sumo web site, scans will be charged at between $2.05 - $3.77 per TB, depending on region and usage profile. However, scans don’t just occur when running a raw query on your logs - they are also triggered by actions such as populating dashboards. You will probably need to consult with a Sumo Logic Solutions Expert to gain a proper understanding how the billing methodology will work out for your own company’s usage patterns.

Grafana Release an oTel Stack in a Container

Grafana Labs last week announced the release of grafana/otel-lgtm - which bundles up an OpenTelemetry Collector and the whole of the core Grafana stack (Prometheus/Tempo/Loki/Grafana) into a single Docker image. This means you can pretty much have a turnkey observability solution up and running with a single Docker command! The package doesn’t include a sample application, so you will need to instrument some code of your own and send telemetry to local ports 4317/8. The company have stressed that the solution is not designed for use in production environments. Obviously though, it is a great time-saver for setting up a local dev environment or for evaluating the stack.

Products

Odigos: Full-fidelity eBPF Tracing

In our last edition we featured this article on the OeC (OpenTelemetry/eBPF/ClickHouse) stack. Odigos is a product which represents a really interesting variation on this architecture. It leverages OpenTelemetry and eBPF to generate logs, metrics and tracing. It does not, however, have its own backend. Instead, you forward the telemetry generated by Odigos to your chosen backend for visualisation and further analytics. Odigos offers two key benefits over similar products. Firstly, it claims to be the only product in the space which can deliver full-fidelity eBPF-based tracing. Secondly, it is architected for high performance, so that it can scale up to environments running thousands of microservices.

Robusta - Supercharged Kubernetes Debugging

Robusta is a company that does not have a marketing department. Then again, it does not need one as it is growing rapidly purely via word of mouth amongst Kubernetes administrators. This is not surprising as Robusta is open source software that combines very powerful Kubernetes debugging capabilities with ease of use and remarkable economy. It feels like it is written for people who use K8S everyday by people who use K8S everyday. If you spend a lot of time working with Kubernetes, you will really appreciate the user experience, as it combines a rich UI with a laser-sharp focus on getting the job done quickly.

SigLens Blast Out of Stealth

If you are a regular reader of the newsletter you will be aware of ClickHouse. They most recently made the news for nonchalantly polishing off the trillion row challenge in under three minutes. Astonishingly, Siglens, a company who have recently launched out of stealth like a an F-22 Raptor, are claiming benchmark speeds up to 54 times faster than ClickHouse (on certain types of query) for their full-stack observability product. According to the SigLens docs, the secret is a revolutionary technology which massively reduces index size. It goes without saying that vendor benchmarks do not necessarily translate into real-world performance - and that these performance gains are of questionable value if they are not perceptible to the end user. The good news is that SigLens is Open Source, so these benchmark scores should be open to independent verification. This is certainly a product which has made a spectacular entrance into the space and we look forward to evaluating it in greater depth.

From the Blogosphere

Instrumenting Async Processing With oTel

The use of asynchronous systems such as message queues can present a major challenge for end-to-end tracing as they can effectively break the telemetry chain. This article by Marcin Sodkiewicz looks at how the OpenTelemetry tracing context can be serialised and then passed through a messaging system as a set of attributes. The attributes can then be extracted by the recipient process to rehydrate the context. The code in this article uses the Go programming language and the AWS SQS messaging system. The principles though, can easily be applied to other languages as well as other messaging systems. The Azure Service Bus, for example, also supports custom attributes in its messaging object model. This is a very readable article that explains OpenTelemetry tracing and Context Propagation with great clarity.

Running OpenTelemetry on wasmCloud

WASM (Web Assembly) is a technology that has the potential to revolutionise the way in which apps are built, deployed and run. Platforms such as Fermyon offer massive app density along with sub-millisecond load times. wasmCloud describes itself as a universal platform for WASM apps, and Version 1.0 of the product ships with support for OpenTelemetry logs, metrics and tracing. This is a short article but it provides a great starting point for exploring observability with WASM.

Observing Observe with Observe

It sounds like it could be a sub-plot in the film Inception, but this is a really interesting article from the Observe blog on how they use an instance of their Observe system to monitor their Observe cloud platform. Observe not only have to support fast reads for complex user queries, they also have to support ingesting one petabyte of telemetry per day. As you can see from the above diagram, Kafka and Snowflake form two of the pillars of the backend architecture. This three-part series offers a fascinating insight into Observe’s own internal observability strategy as well as being a great exemplar of the eat your own dog food principle. This is an article which is of great value to anybody with an interest in large-scale observability architectures.

OpenTelemetry

OTel Graduation Proposal (finally) Submitted

The OpenTelemetry project reached a symbolic milestone last week with the submission of a proposal for CNCF Graduation. The proposal was posted on the CNCF GitHub repo by Austin Parker, Director of Open Source at Honeycomb. Even though oTel is the second most active CNCF project, and has industry-wide backing, it will still need to complete the Due Diligence process. Although graduation should be a formality, it will also represent a recognition of the enormous efforts of the observability community and the growing stability and maturity of the project. As you can see from the somewhat humbling image above, the proposal does not receive any special treatment. It is simply positioned as item 226 on the CNCF Project Board, nestling between an issue for a broken link and a typo fix.

OpenTelemetry Announce Support for Profiling

KubeCon is often the ocasion for dropping major announcements and one of the big stories this year is the news of OpenTelemetry Support for Profiling. Profiling is a key tool for Application Performance Management and is often referred two as the fourth pillar of observability. Up until now, vendors have produced their own Profiling implementations, but without a common standard. The announcement is the culmination of two years of work by the OpenTelemetry Profiling SIG (Special Interest Group). The group has already produced a detailed Profiling Data Model and this will now be merged into the OpenTelemetry Specification. The next stage is implementation - and both Elastic and Splunk are already making major source code donations.

That’s all for this edition!

If you have friends or colleagues who may be interested in subscribing to the newsletter, then please share this link!

This week’s quote is from Anthony J. D'Angelo and I think it is one we can all relate to:

“In your thirst for knowledge, be sure not to drown in all the information.”