Relaying Syslog UDP Events with Apache NiFi

NiFi Logging Syslog

2022-09-26 • 12 minute read • David Handermann

Background

Despite a number of modernized alternatives, sending and receiving logging events using the syslog protocol remains a common strategy. After several decades of differing vendor implementations, RFC 3164 described the conventional approach as the BSD syslog protocol in 2001. RFC 5424 standardized structural improvements to timestamp and message field formatting in 2009. Although RFC 5424 mandated support for Transport Layer Security with TCP, with specific details outlined in RFC 5425, transmission over User Datagram Protocol continued to be supported as described in RFC 5426. Most current syslog servers support the newer standard along with mutual TLS for authentication and encryption, but UDP transmission using RFC 3164 formatting continues to serve as a baseline option.

Introduction

Apache NiFi components cover a broad range of use cases and formats, from binary records with embedded schemas to unstructured lines of ASCII characters. To enable flexible integration patterns, Apache NiFi has supported sending and receiving syslog messages over both TCP and UDP since version 0.4.0. The PutSyslog Processor provides configurable properties for both transmission protocol and message formatting. The PutUDP Processor supports generalized UDP packet transmission, depending on other Processors to format messages in FlowFile contents. Both Processors enable functional and flexible sending of syslog UDP events, but neither are capable of handling high volumes of messages. Neither PutSyslog nor PutUDP support record-oriented FlowFiles, limiting transmission to one event per FlowFile. This approach burdens Apache NiFi framework resources, requiring transaction handling and repository updates for each event processed.

Apache NiFi 1.17.0 introduced the UDPEventRecordSink Controller Service to support record-oriented processing and transmission over UDP with the PutRecord Processor. The UDPEventRecordSink supports sending one UDP packet per record, enabling batched processing and higher rates of transmission for aggregated events. The syslog protocol provides one example use case for the new Record Sink, but it is capable of supporting any type of record-oriented data with a configurable Record Writer.

Security and Reliability Considerations

As described in RFC 5425 Section 2, sending unencrypted syslog messages over an untrusted network raises a number of security concerns. Logs often contain sensitive information about system operation and user behavior, requiring protection against unauthorized disclosure and modification. For these reasons, encrypted communication is essential when transmitting logs over an untrusted network. Transporting syslog events with TLS provides communication encryption, and mutual TLS enables strong authentication for sending systems. Virtual Private Networking protocols delegate communication encryption without requiring configuration of syslog servers and clients. Either VPN or TLS should be part of a general network logging strategy.

In addition to communication security, communication reliability is another important factor to consider when deploying syslog clients and servers. UDP is not a connection-oriented protocol and does not provide any delivery guarantees. TCP provides a measure of reliability with socket connection handling and delivery acknowledgement at the protocol layer. The syslog protocol itself does not include a delivery acknowledgement strategy, which means that data loss is possible in the event of unexpected host or network outages. Although acknowledgement at the TCP layer provides a stronger confirmation of delivery, other solutions such as the Reliable Event Logging Protocol are necessary for environments where a high degree of reliability is required. On the other hand, for systems with repetitive logging or less critical information, syslog over UDP can provide a high level of throughput without the overhead of TCP socket connection tracking.

Relaying Syslog Messages

RFC 3164 Section 3 defines a machine receiving and forwarding syslog messages as a relay. NiFi can function as a relay, performing additional filtering and routing to support a number of alerting and archiving use cases. For example, in an environment with current syslog processing, NiFi can be used to support sending to an existing syslog collector while enabling alternative analysis strategies.

Receiving Syslog Records

NiFi provides several options for receiving syslog UDP records. The best solution depends on overall flow design requirements, such as subsequent processing, filtering, or intended destinations. The following Processors can be configured to receive syslog messages over UDP:

The Port property is required for all Processors, indicating the UDP port number on which to receive network packets. Although port 514 is the traditional default for syslog UDP, most operating systems restrict listening on ports below 1024 for users other than root. NiFi should never run as the root user, so a high numbered port such as 5514 should be configured.

For optimal processing throughput, it is essential to batch multiple log records into a single FlowFile. All of these Processors support batching, but each one uses a different processing strategy with varying levels of validation. All three Processors include a Max Batch Size property, with different default values.

All three Processors also have a Max Size of Message Queue property, which defaults to 10000 for each Processor. This property controls the maximum number of events that will be placed on a queue within the Processor for eventual retrieval when the NiFi framework triggers the Processor. Setting this property too low can result in data loss when the queue is full and Processor is not able to keep up with the rate of incoming messages. Setting this property too high has a negative impact on memory consumption, leading to potential errors when the framework is not able to read message fast enough to avoid queue saturation. Monitoring system memory usage is an important part of selecting an optimal queue size and avoiding unexpected heap memory issues.

ListenSyslog

ListenSyslog provides a standard approach for handling one or more syslog messages. The Protocol property defaults to UDP, but also supports TCP, which can be configured together with an SSL Context Service to enable TLS.

The Parse Messages property enables or disables syslog message validation, setting FlowFile attributes on success and routing messages to the invalid relationship on parsing failures. The Parse Messages property can be set to false when sending clients are expected to produce valid messages, which reduces processing overhead for high volume flows.

The Max Batch Size property defaults to 1, which is useful for initial testing, but not optimal for production operation. The property should be set to at least 100 for record-oriented processing. Setting a value of 1000 or higher is necessary for larger deployments. With the default Message Delimiter setting, the ListenSyslog Processor will combine multiple syslog messages together with a newline character.

ListenUDP

ListenUDP presents a generic solution, capable of receiving arbitrary UDP packets. This Processor does not perform any input validation, which can beneficial for deployments with high throughput requirements. For environments where sending syslog clients provide consistent message formatting, ListenUDP supports receiving syslog messages with minimal overhead.

The Max Batch Size property in ListenUDP defaults to 1, which will result in creating a large number of small FlowFiles. Setting Max Batch Size to at least 100 enables subsequent Processors to perform record-oriented processing. The number of sending clients and the volume of messages may require a batch size of 1000 or higher to provide optimal performance. Similar to ListenSyslog, the ListenUDP Processor has a Batching Message Delimiter property that controls how the Processor combines multiple messages, defaulting to a newline character.

ListenUDPRecord

ListenUDPRecord introduces record-oriented processing at the beginning of a syslog flow. This Processor uses a configured Record Reader to parse received packets, providing basic record validation before further processing. The required Record Writer property controls the format of records written to output FlowFiles.

The Max Batch Size property in ListenUDPRecord defaults to 1000, providing a reasonable starting point new flow deployments. The Poll Timeout property defaults to 50 ms, after which the Processor will create a FlowFile containing the number of records available from the internal queue. The Poll Timeout value should be low to avoid consuming shared resources, but could be increased to produce larger FlowFiles with more records in environments with intermittent transmission of syslog events.

SyslogReader

The Record Reader property must be configured with the SyslogReader service to read incoming syslog messages. The SyslogReader parses messages using the conventional RFC 3164 structure, and also supports RFC 5424 timestamps for unambiguous dates based on the ISO 8601 standard. For relaying syslog messages, setting the Raw message property to true in the SyslogReader provides a simple strategy for maintaining the original message structure in a _raw field when passing records to a configured Record Writer.

Syslog5424Reader

The Syslog5424Reader provides an alternative Record Reader implementation capable of reading syslog events that contain structured data conforming to the RFC 5424 standard. The regular SyslogReader is also capable for reading RFC 5424 messages, but it will not perform further parsing of the message section of a record. For these reasons, the SyslogReader presents the best option for environments with a mixture of formats, or when parsing structured elements is not necessary. The Syslog5424Reader is a suitable solution for deployments where all sending clients use the RFC 5424 standard.

FreeFormTextRecordSetWriter

NiFi does not include a standard Record Writer implementation for syslog messages. With the variation in syslog message structure and the nature of syslog protocol itself, creating a reusable syslog Record Writer is not necessarily straightforward. For the purpose of relaying syslog UDP messages, however, the FreeFormTextRecordSetWriter Service supports formatting messages using record fields and FlowFile attributes.

The Text property of the FreeFormTextRecordSetWriter must be configured according to the structure of incoming records. For syslog records processed through the SyslogReader, the writer should be configured with the following properties:

Text
- ${_raw}
Character Set
- UTF-8

The Text property setting uses NiFi Expression Language to reference the raw message field, which is included in each syslog record processed through the SyslogReader with the Raw message property enabled. The FreeFormTextRecordSetWriter appends a newline character at the end of each rendered message, which integrates well with subsequent syslog processing.

Queue Capacity Logging

The ListenUDP and ListenUDPRecord Processors support extended debug logging to track the capacity of the internal event queue. Debug log messages include current capacity, space remaining, and the largest size of the queue since the Processor was started. Capacity logging messages will be written at the debug level every 60 seconds.

Add the following lines to the NiFi logback.xml configuration to enable capacity logging for either ListenUDP or ListenUDPRecord:

<logger name="org.apache.nifi.processors.standard.ListenUDP" level="DEBUG" />
<logger name="org.apache.nifi.processors.standard.ListenUDPRecord" level="DEBUG" />

Debug messages will be logged as follows:

Event Queue Capacity [10000] Remaining [9750] Size [250] Largest Size [1500]

The Largest Size is a key indicator of adequate capacity. If the Largest Size equals the Event Queue Capacity, the Max Size of Message Queue property should be increased to accommodate incoming events. Having the Largest Size reach the Event Queue Capacity indicates that the NiFi framework is not processing events fast enough to retrieve them from the internal queue. Larger message queues place a greater burden on Java Virtual Machine heap memory. General system resource utilization should be evaluated when considering queue size increases.

The logger command

The logger command on Linux and Unix systems enables simple testing of listening syslog Processors. The command is capable of sending one message, specified through a command argument, or multiple messages provided using a file argument. The logger command supports both the conventional RFC 3164 BSD syslog protocol and the RFC 5424 standard.

The following command can be used to send a message with the word running to a Processor listening on UDP port 5514:

logger -n 127.0.0.1 -P 5514 --rfc3164 running

The command will send a UDP syslog event containing the timestamp, local hostname, and local username along with the message specified, using the RFC 3164 format. The syslog event reads as follows:

<13>Sep 26 12:30:45 hostname username: running

The bracketed number 13 indicates the event Priority as defined in RFC 3164 Section 4.1.1. Following the formula described, dividing 13 by 8 leaves a remainder of 5. The number 5 indicates the Notice level for event Severity, and the number 1 indicates user-level messages as the event Facility.

Sending Syslog Records

Regardless of the Processor selected for receiving syslog messages, the optimal approach for sending records over UDP in NiFi 1.17.0 is the PutRecord Processor configured with the SyslogReader for the Record Reader property, and the UDPEventRecordSink for the Record Destination Service property.

Flows designed using the ListenUDPRecord Processor can use the same SyslogReader Service for the PutRecord Processor. Both the ListenSyslog and ListenUDP Processors combine multiple syslog messages using a newline character, which the SyslogReader is also capable of handling.

The Include Zero Record Results property in PutRecord defaults to false, which should be retained to avoid unnecessary processing.

UDPEventRecordSink

The UDPEventRecordSink has standard Hostname and Port properties to configure the destination for UDP records. Both of these properties are required as part of the standard component configuration.

The Record Writer property supports standard Controller Services that implement the RecordSetWriterFactory interface. As described for ListenUDPRecord, the FreeFormTextRecordSetWriter provides a straightforward option for formatting syslog records parsed using the SyslogReader. For flows using ListenUDPRecord, the same instance of FreeFormTextRecordSetWriter can be configured as the Record Writer in UDPEventRecordSink. For flows using other Processors, the same property values can be applied to a new instance of FreeFormTextRecordSetWriter.

UDPEventRecordSink also includes a Sender Threads property, defaulting to 2, which controls the maximum size of the thread pool for handling UDP packet processing. UDPEventRecordSink leverages the Netty framework for network protocol handling, and passes the Sender Threads property value to the Netty NioEventLoopGroup class, which is responsible for managing socket operations. More is not necessarily better when it comes to thread configuration, but in general, the number of Sender Threads should match the number of Concurrent Tasks configured for the PutRecord Processor. The number of Sender Threads should not exceed the number of CPU cores.

It is important to note that the UDPEventRecordSink will not necessarily report communication failures, and some network errors may be logged without causing PutRecord to route related FlowFiles to the failure relationship. As mentioned earlier, the nature of UDP means that receiving systems do not provide delivery acknowledgment. Based on host configuration or network firewall settings, the Netty framework may receive an ICMP packet indicating a destination unreachable error, which UDPEventRecordSink will log as an error. This error serves as a courtesy notification. The NiFi Bulletin Board should be monitored for this type of communication problem, but it is best to verify transmission on the receiving system to confirm delivery.

Conclusion

Record-oriented processing offers high throughput for content that follows a standard structure, including the conventional BSD syslog protocol. Building on existing capabilities, the UDPEventRecordSink extends record-oriented transmission to UDP packets. Relaying syslog UDP messages provides a simple use case, but more advanced flows can be designed using other record-oriented Processors and Controller Services. Leveraging record batching for initial processing is essential for optimal throughput, and selecting the appropriate amount of input validation is important design decision. Taking these concepts together, it is possible to build advanced flows that support a number of system logging use cases.