Building OpenTelemetry Collection in Apache NiFi with Netty

2024-02-26 • 9 minute read • David Handermann

Background

OpenTelemetry supports the primary pillars of software observability using a common protocol with implementations in multiple programming languages. With standard specifications for logs, metrics, and traces, OpenTelemetry presents a modern alternative to historical de facto solutions and various vendor-based strategies. With an understanding of observability concepts and benefits, implementing a unified approach to application performance monitoring is essential to a stable and secure system. The OpenTelemetry Protocol specification defines the foundational building blocks for telemetry encoding and transmission, enabling integration with both OpenTelemetry components and external services. With support for gRPC or HTTP transmission, the OTLP specification enables straightforward interoperation across the landscape of languages and services.

Introduction

Apache NiFi introduced the ListenOTLP Processor in versions 2.0.0-M1 and 1.24.0. ListenOTLP combines several important characteristics of the OTLP specification in a single component with straightforward default settings. The Netty framework supports several NiFi components and also forms the basis for micro-batched processing of OpenTelemetry in the ListenOTLP Processor. With reusable components for HTTP and TLS, Netty combines high performance with a pluggable design that enables ListenOTLP to support various aspects of the OpenTelemetry Protocol without introducing excessive complexity. Coupled with Jackson for JSON processing and the HubSpot library for handling Protocol Buffers, the NiFi solution for OpenTelemetry collection combines security, performance, and ease of configuration.

OpenTelemetry Protocol Details

Reviewing the scope and features of OTLP provides a helpful background for understanding the implementation details of the ListenOTLP Processor.

OTLP defines gRPC and HTTP as the supported modes of transport for OpenTelemetry information. The gRPC protocol itself builds on HTTP/2 and defines structured communications using Protocol Buffers. OTLP defines requests and responses using Protobuf definitions that serve as interface definitions regardless of the implementing language. The Protobuf structures also define the model for encoding telemetry as JSON for transmission over HTTP.

Summarizing OTLP Specification 1.0.0, OpenTelemetry can be transmitted using any of the following strategies:

gRPC
Protobuf over HTTP
JSON over HTTP

Considering transmission from a layered perspective, all OpenTelemetry communication occurs over HTTP, with HTTP/2 supporting gRPC, Protobuf, or JSON, and HTTP/1.1 supporting Protobuf or JSON. The fact that gRPC consists of Protobuf messages means that although some header information is different, handling both gRPC and Protobuf over HTTP can be accomplished with minimal overhead.

Netty Framework Features

The Netty framework includes powerful capabilities such as multithreading for socket handling, Application-Layer Protocol Negotiation with TLS, and HTTP processing for both HTTP/1.1 and HTTP/2. These composable features make Netty an ideal foundation for building custom network services that support protocols such as OpenTelemetry.

Building servers or clients with Netty requires some level of familiarity with event-driven programming. The project user guide provides a helpful introduction to the basic concepts involved in writing a network server. The guide highlights the ChannelHandlerInboundAdapter as an introduction to asynchronous processing of bytes buffered from a socket channel. The Netty ServerBootstrap is another central class, supporting thread pool configuration and the pipeline responsible for handling inbound connections. Based on the core design of a channel handler pipeline, Netty provides numerous implementation classes supporting a variety of standard network protocols.

Netty HTTP and ALPN

The netty-codec-http module provides handler classes that implement client and server communication for HTTP/1.1. The HttpServerCodec class combines request decoding and response encoding for streamlined configuration, which allows custom code to implement the logic required to handle structured HTTP requests and responses. The FullHttpRequest interface provides a straightforward abstraction for reading HTTP header and body information after the server class has decoded the request from the socket channel.

The netty-codec-http2 module depends on netty-codec-http and provides additional protocol handling to support HTTP/2. Netty provides several approaches for handling HTTP/2, but when it is necessary to support both HTTP/2 and HTTP/1.1, the InboundHttp2ToHttpAdapter provides a convenient translation to HTTP/1.1 objects. This approach supports using a single handler for processing HTTP requests in the Netty pipeline, regardless of protocol version. The single pipeline for both protocol versions may not be suitable for applications that require custom handling for specific HTTP/2 semantics, but for many applications, adapting requests to a single request interface is the best solution.

ALPN is a generic extension to the TLS protocol that allows clients to request specific application protocol versions during the TLS handshake process. HTTP/2 uses ALPN to support transparent fallback to HTTP/1.1 for clients that do not support HTTP/2. Netty provides a flexible ApplicationProtocolNegotiationHandler that allows custom classes to read client application protocol names and configure the appropriate set of pipeline handlers. As a general strategy, clients that do not support HTTP/2 may not present any ALPN information, which means servers should default to HTTP/1.1 while it remains supported.

Netty and NiFi Integration

As part of refactoring syslog components, NiFi 1.14.0 introduced the nifi-event-transport module with reusable abstractions for building Netty clients and servers. The module includes straightforward factory classes for constructing network servers using standard properties for address, port number, and TLS negotiation. The NettyEventServerFactory removes the need for repeating common server construction steps, supporting NiFi Processors such as ListenTCP, ListenSyslog, and ListenBeats, as well as ListenOTLP. These components highlight both the flexibility of Netty and the power of reusable classes for building network services.

ListenOTLP exposes these standard settings as properties within the Processor. The Address and Port properties control the network socket on which the Processor listens for inbound requests. The SSL Context Service and Client Authentication properties control the TLS negotiation process and determine whether clients must present a certificate for mutual authentication.

The Worker Threads property determines the number of threads allocated to handle socket connection processing. This number should never exceed the number of CPU cores and should remain in the single digits for most deployment scenarios. The Queue Capacity and Batch Size properties should be considered and adjusted together based on general volume expectations. The Queue Capacity places a limit on the number of messages that can be held in memory before the NiFi framework invokes the ListenOTLP Processor to write queued messages to a FlowFile. With the OTLP Specification supporting reliable delivery and retry using standard response codes, the queue can remain limited without concern for dropping requests at peak volumes.

OpenTelemetry Content Negotiation

With Netty as the foundation, implementing support for each of the OTLP transport formats required an additional layer of protocol processing to handle gRPC, Protobuf, and JSON using a single server.

OTLP defines TCP port 4317 as the default for gRPC, and TCP port 4318 as the default for HTTP, with the official OpenTelemetry Collector requiring separate configuration for each protocol. Although ListenOTLP could have followed a similar strategy, the HTTP Content-Type header provides a standard method for determining applicable protocol handling.

The OTLP Specification defines the following Content-Type values for the respective transport formats:

application/grpc for gRPC
application/x-protobuf for Protobuf over HTTP
application/json for JSON over HTTP

Selecting the applicable request handler according to the Content-Type supports not only a single network server for all transport protocols, but also enables basic request validation.

HTTP Request Validation

Following the OTLP Specification provides a clear path for initial request validation using standard HTTP properties.

All OpenTelemetry requests use the POST method regardless of transport format, resulting in an HTTP 405 response code for requests using other HTTP methods.

The HTTP Content-Type header indicates the transport format selected, resulting in an HTTP 415 response code for anything other than gRPC, Protobuf, or JSON.

Each OpenTelemetry request type uses a standard URL path according to the transport format and telemetry type, providing a clear indication of whether the client is sending logs, metrics, or traces. URL paths outside the expected values for gRPC or HTTP result in an HTTP 404 response code, indicating that the requested path is not found.

With these validation checks applied, the ListenOTLP Processor can select the appropriate strategy for parsing the HTTP request body. Each of these HTTP header elements can be misrepresented, but header validation avoids common errors.

HTTP Content Processing

With the transport format and telemetry type determined, the next step involves decoding buffered bytes to object representations. This step provides content validation, ensuring that Protobuf or JSON payload follow the structure defined in the OTLP Specification.

Both gRPC and HTTP transport formats support gzip compression. The HTTP Content-Encoding header indicates the presence of gzip compression for Protobuf or JSON requests, whereas gRPC indicates compressed status using an initial byte flag prior to the Protobuf message itself.

After determining compressed status, message parsing uses either standard Protobuf or Jackson JSON components to decode objects. Following parsing, ListenOTLP places messages on an internal queue. If the internal queue reaches the limit defined in the Queue Capacity property, ListenOTLP returns an unavailable response code, which informs the client that the request should be retried. This approach avoids volume-related service failures, ensuring a level of operation even under stress.

Message Serialization

The last portion of ListenOTLP processing consists of writing one or more messages to NiFi FlowFiles. ListenOTLP uses a batching strategy based on the configured Batch Size property to write telemetry messages containing up to the number of records defined. Running ListenOTLP on a more frequent schedule can produce more FlowFiles containing a smaller number of records, subject to the volume of telemetry received. Optimal scheduling and batch sizing depends on subsequent pipeline operations.

Regardless of the input transport format, ListenOTLP writes all messages using JSON. This strategy enables subsequent processing in NiFi using any of the available JSON components. Although JSON is more verbose than Protobuf, it provides greater opportunity for selective adjustment using existing NiFi Processors and Controller Services. With JSON as one of the supported formats defined in the OTLP Specification, it also enables transfer to other systems that support OpenTelemetry. Serialized FlowFiles include standard attributes indicating the resource type and the count of messages included for generalized routing.

Each OpenTelemetry resource element also includes attributes for client.socket.address and client.socket.port indicating the socket information of the client that sent the request. Other NiFi components can parse these attributes for additional routing and processing decisions.

Conclusion

The ListenOTLP Processor is an important element for integrating NiFi into observability pipelines based on OpenTelemetry. With widespread adoption across languages, frameworks, and vendors, OpenTelemetry provides a clear path for building robust and interoperable monitoring solutions, without requiring architectural designs tied to a particular vendor. As one of numerous Processors in NiFi, ListenOTLP highlights the adaptability of the NiFi framework and its ability to build pipelines that bridge the gap between historical approaches and modern solutions.