SQL Strikes Back!

...observing 2,800 Edge Clusters, oTel made easy, and much more

ClickHouse Blaze a Trail With SQL

Welcome to edition 7 of the Observability 360 newsletter. This will be the last newsletter of the year as we will be taking a break over Xmas. We will be ringing out the year with some jaw-dropping stats in a ClickHouse article on SQL observability pipelines, covering a new tool for OpenTelemetry visualisation, taking in some crystal-ball gazing with Splunk and lot's more.

How are we doing?

As practitioners in the field, you will know that every good observability system needs a feedback loop. Let us know how we are doing at:

NEWS

Dash0 - Visual Tooling for OpenTelemetry

Dash0 are a new startup whose mission is to simplify the implementation of observability for developers. To assist with this, they have launched a tool called OTelBin, which provides validation and visualisation of manifests for the OpenTelemetry Collector. OTelBin is a web-based tool with a clean and simple UI. On the left side of the screen there is a panel for editing the YAML definition of your Collector. The rest of the screen is used to visualise Collector pipelines. The visualisation updates itself dynamically as you edit the manifest.

Infrastructure Governance with DataDog

In our last newsletter we touched on William Cappelli's discussion of Configuration Management Databases and their place in observability architectures. This article on the DataDog blog discusses their Resource Catalog tool, which is analogous to a CMDB for resources captured by the system’s agents. It is not, however, just a static resource inventory - it can also be used to provide context when troubleshooting incidents as well as identifying misconfigurations and security risks.

2024 - the Splunk Perspective

It's that time of year where, as well as looking back, we can also speculate on what the next 12 months might have in store for observability. Splunk have now published their predictions for Observability in 2024 and, not surprisingly, AI takes centre stage. In evaluating the impact of AI they see it not only as bringing intelligence to observability tooling but also highlight the fact that AI itself represents a set of technologies which themselves will need to be observed. This is a brief and easy-going read that you can peruse over a cup of tea and a mince pie or two.

Kloudmate - Developer-focused Observability

The Cloud Native juggernaut seems to be unstoppable, so it is not surprising to see more and more dedicated cloud observability platforms appearing on the market. Kloudmate were formed in 2021 and their platform has a strong focus on microservices and meeting the needs of application developers. Naturally, the system covers the standard signals - Logs, Metrics, Events and Traces and also includes features such as Incident Management and Issue Tracking. If you are interested in evaluating the platform they have a free plan as well as big discounts for startups.

From the Blogosphere

SQL Strikes Back

The twin imperatives of scalability and agility in the internet era have led to the rise of NoSQL and a re-thinking of data storage technologies. This has led some people to predict (or even welcome) the demise of the SQL/OLAP paradigm. The people at ClickHouse are kind of bucking that trend and in this article on the ClickHouse blog, Ryadh Dahimene proposes an observability stack with the ClickHouse SQL database at its core. The cost savings that he claims for this stack are pretty astonishing - up to 300 times cheaper than the "leading commercial SaaS observability provider". There are downsides though, with the handling of metrics being less mature than that of logging and traces at the present time - although that is something they are working on.

Netflix, Spotify, Google er.. Chik-Fil-A???

Image courtesy of ZDNet

When you think of large corporations pushing the technology envelope, Chik-Fil-A might not be the first name to come to mind. However, the highly distributed nature of their infrastructure presents massive observability challenges, which they have met with some very impressive engineering. The scale of their task is daunting - 2,800 Edge Kubernetes clusters, tens of thousands of IoT devices and billions of MQTT messages each month. This is a really fascinating article on managing IoT observability at scale.

Grafana vs Elk - Platforms Compared

Debates about the relative merits of the ELK and Grafana stacks are almost on a par with the Coke vs Pepsi or Mac vs PC wars. Obviously, there are many ways of comparing products - price, features support etc. TJ Podobnik, who is a Cloud Architect at Prewave has undertaken a comprehensive two-part comparison of these two leading platforms. In the first part, he looks at features and discusses concerns such as licensing, SSO and support. In the second part he delves into a more technical analysis, looking at performance, resource management and storage requirements. This is a balanced and nuanced study with plenty of interesting findings.

How To

Up and Running with InfluxDB and Grafana

InfluxDB and Grafana are a really powerful combination for ingesting and visualising metrics data. This article on the Observability 360 web site provides a detailed walk through in using the free versions of these tools to import a sample time series dataset into InfluxDB and then generate charts for the data in Grafana. The article includes GitHub links for the sample data and offers pointers on dealing with potential gotcha's in defining InfluxDB meta data and configuring the InfluxDB data source in Grafana (it’s not as straightforward as you might think).

Monitoring K8S with Victoria Metrics

We have previously mentioned the growing popularity of Victoria Metrics as a powerful, low cost observability solution. If you are interested in taking it for a test drive you can try setting up their Kubernetes monitoring stack, which encompasses Victoria Metrics, Grafana and the Kube-State-Metrics service. This article on the RTFM blog provides a really comprehensive and impressively detailed guide to configuring the necessary Helm charts and getting the stack running in a Kubernetes environment. It is highly recommended if you are interested in evaluating the Victoria Metrics platform.

OpenTelemetry

Getting Flexible with Metrics Collection

It is tempting to see OpenTelemetry as a highly structured and codified specification - and one that might be less accommodating for custom formats. In this article, Severin Neumann, a Cloud Architect at Cisco, highlights the essential flexibility of the platform and shows how, with some relatively minor adaptations, metrics of all shapes and sizes can be ingested. The article looks at how the Carbon receiver's plaintext protocol can be used for capturing custom metrics emitted by shell scripts.

Call The (oTel) Operator

When implementing OpenTelemetry there are many design and architectural decisions to be made. There can be a bewildering number of choices - automatic or manual instrumentation, Collectors, Gateways, Exporters - and then there is the question of hosting. If you are running your Collector on Kubernetes, this article shows how you can simplify implementation by using the OpenTelemetry Operator for Kubernetes. This is an in-depth article that walks through setting up a simple service and deploying the OpenTelemetry Collector as a gateway. It is one of a number of really useful observability articles on the Aspecto blog.

That’s all for this edition and for this year!

A massive thanks to all of you for your support and readership. Wishing you all a happy Xmas and all the best for the new year!

“To acquire knowledge, one must study; but to acquire wisdom, one must observe”.

Marilyn vos Savant