Relaying Syslog UDP Events with Apache NiFi
Background
Despite a number of modernized alternatives, sending and receiving logging events using the syslog protocol remains a common strategy. After several decades of differing vendor implementations, RFC 3164 described the conventional approach as the BSD syslog protocol in 2001. RFC 5424 standardized structural improvements to timestamp and message field formatting in 2009. Although RFC 5424 mandated support for Transport Layer Security with TCP, with specific details outlined in RFC 5425, transmission over User Datagram Protocol continued to be supported as described in RFC 5426. Most current syslog servers support the newer standard along with mutual TLS for authentication and encryption, but UDP transmission using RFC 3164 formatting continues to serve as a baseline option.
Introduction
Apache NiFi components cover a broad range of use cases and formats, from binary records with embedded schemas to
unstructured lines of ASCII characters. To enable flexible integration patterns, Apache NiFi has supported sending and
receiving syslog messages over both TCP and UDP since version 0.4.0. The
PutSyslog
Processor provides configurable properties for both transmission protocol and message formatting. The
PutUDP
Processor supports generalized UDP packet transmission, depending on other Processors to format messages in FlowFile
contents. Both Processors enable functional and flexible sending of syslog UDP events, but neither are capable of
handling high volumes of messages. Neither PutSyslog
nor PutUDP
support
record-oriented FlowFiles, limiting transmission
to one event per FlowFile. This approach burdens Apache NiFi framework resources, requiring transaction handling and
repository updates for each event processed.
Apache NiFi 1.17.0 introduced the
UDPEventRecordSink
Controller Service to support record-oriented processing and transmission over UDP with the
PutRecord
Processor. The UDPEventRecordSink
supports sending one UDP packet per record, enabling batched processing and higher
rates of transmission for aggregated events. The syslog protocol provides one example use case for the new Record Sink,
but it is capable of supporting any type of record-oriented data with a configurable Record Writer.
Security and Reliability Considerations
As described in RFC 5425 Section 2, sending unencrypted syslog messages over an untrusted network raises a number of security concerns. Logs often contain sensitive information about system operation and user behavior, requiring protection against unauthorized disclosure and modification. For these reasons, encrypted communication is essential when transmitting logs over an untrusted network. Transporting syslog events with TLS provides communication encryption, and mutual TLS enables strong authentication for sending systems. Virtual Private Networking protocols delegate communication encryption without requiring configuration of syslog servers and clients. Either VPN or TLS should be part of a general network logging strategy.
In addition to communication security, communication reliability is another important factor to consider when deploying syslog clients and servers. UDP is not a connection-oriented protocol and does not provide any delivery guarantees. TCP provides a measure of reliability with socket connection handling and delivery acknowledgement at the protocol layer. The syslog protocol itself does not include a delivery acknowledgement strategy, which means that data loss is possible in the event of unexpected host or network outages. Although acknowledgement at the TCP layer provides a stronger confirmation of delivery, other solutions such as the Reliable Event Logging Protocol are necessary for environments where a high degree of reliability is required. On the other hand, for systems with repetitive logging or less critical information, syslog over UDP can provide a high level of throughput without the overhead of TCP socket connection tracking.
Relaying Syslog Messages
RFC 3164 Section 3 defines a machine receiving and forwarding syslog messages as a relay. NiFi can function as a relay, performing additional filtering and routing to support a number of alerting and archiving use cases. For example, in an environment with current syslog processing, NiFi can be used to support sending to an existing syslog collector while enabling alternative analysis strategies.
Receiving Syslog Records
NiFi provides several options for receiving syslog UDP records. The best solution depends on overall flow design requirements, such as subsequent processing, filtering, or intended destinations. The following Processors can be configured to receive syslog messages over UDP:
The Port
property is required for all Processors, indicating the UDP port number on which to receive network packets.
Although port 514 is the traditional default for syslog UDP, most operating systems restrict listening on ports below
1024 for users other than root. NiFi should never run as the root user, so a high numbered port such as 5514 should be
configured.
For optimal processing throughput, it is essential to batch multiple log records into a single FlowFile. All of these
Processors support batching, but each one uses a different processing strategy with varying levels of validation. All
three Processors include a Max Batch Size
property, with different default values.
All three Processors also have a Max Size of Message Queue
property, which defaults to 10000
for each Processor.
This property controls the maximum number of events that will be placed on a queue within the Processor for eventual
retrieval when the NiFi framework triggers the Processor. Setting this property too low can result in data loss when the
queue is full and Processor is not able to keep up with the rate of incoming messages. Setting this property too high
has a negative impact on memory consumption, leading to potential errors when the framework is not able to read message
fast enough to avoid queue saturation. Monitoring system memory usage is an important part of selecting an optimal queue
size and avoiding unexpected heap memory issues.
ListenSyslog
ListenSyslog
provides a standard approach for handling one or more syslog messages. The Protocol
property defaults
to UDP
, but also supports TCP
, which can be configured together with an SSL Context Service
to enable TLS.
The Parse Messages
property enables or disables syslog message validation, setting FlowFile attributes on success and
routing messages to the invalid
relationship on parsing failures. The Parse Messages
property can be set to false
when sending clients are expected to produce valid messages, which reduces processing overhead for high volume flows.
The Max Batch Size
property defaults to 1
, which is useful for initial testing, but not optimal for production
operation. The property should be set to at least 100
for record-oriented processing. Setting a value of 1000
or
higher is necessary for larger deployments. With the default Message Delimiter
setting, the ListenSyslog
Processor
will combine multiple syslog messages together with a newline character.
ListenUDP
ListenUDP
presents a generic solution, capable of receiving arbitrary UDP packets. This Processor does not perform
any input validation, which can beneficial for deployments with high throughput requirements. For environments where
sending syslog clients provide consistent message formatting, ListenUDP
supports receiving syslog messages with
minimal overhead.
The Max Batch Size
property in ListenUDP
defaults to 1
, which will result in creating a large number of small
FlowFiles. Setting Max Batch Size
to at least 100
enables subsequent Processors to perform record-oriented
processing. The number of sending clients and the volume of messages may require a batch size of 1000
or higher to
provide optimal performance. Similar to ListenSyslog
, the ListenUDP
Processor has a Batching Message Delimiter
property that controls how the Processor combines multiple messages, defaulting to a newline character.
ListenUDPRecord
ListenUDPRecord
introduces record-oriented processing at the beginning of a syslog flow. This Processor uses a
configured Record Reader
to parse received packets, providing basic record validation before further processing. The
required Record Writer
property controls the format of records written to output FlowFiles.
The Max Batch Size
property in ListenUDPRecord
defaults to 1000
, providing a reasonable starting point new flow
deployments. The Poll Timeout
property defaults to 50 ms
, after which the Processor will create a FlowFile
containing the number of records available from the internal queue. The Poll Timeout
value should be low to avoid
consuming shared resources, but could be increased to produce larger FlowFiles with more records in environments with
intermittent transmission of syslog events.
SyslogReader
The Record Reader
property must be configured with the
SyslogReader
service to read incoming syslog messages. The SyslogReader
parses messages using the conventional RFC 3164 structure,
and also supports RFC 5424 timestamps for unambiguous dates based on the
ISO 8601 standard. For relaying syslog messages, setting the
Raw message
property to true
in the SyslogReader
provides a simple strategy for maintaining the original message
structure in a _raw
field when passing records to a configured Record Writer
.
Syslog5424Reader
The Syslog5424Reader
provides an alternative Record Reader
implementation capable of reading syslog events that contain structured data
conforming to the RFC 5424 standard. The regular SyslogReader
is also capable for reading RFC 5424 messages, but it
will not perform further parsing of the message section of a record. For these reasons, the SyslogReader
presents the
best option for environments with a mixture of formats, or when parsing structured elements is not necessary. The
Syslog5424Reader
is a suitable solution for deployments where all sending clients use the RFC 5424 standard.
FreeFormTextRecordSetWriter
NiFi does not include a standard Record Writer
implementation for syslog messages. With the variation in syslog
message structure and the nature of syslog protocol itself, creating a reusable syslog Record Writer is not necessarily
straightforward. For the purpose of relaying syslog UDP messages, however, the
FreeFormTextRecordSetWriter
Service supports formatting messages using record fields and FlowFile attributes.
The Text
property of the FreeFormTextRecordSetWriter
must be configured according to the structure of incoming
records. For syslog records processed through the SyslogReader
, the writer should be configured with the following
properties:
- Text
${_raw}
- Character Set
UTF-8
The Text
property setting uses NiFi
Expression Language
to reference the raw message field, which is included in each syslog record processed through the SyslogReader
with
the Raw message
property enabled. The FreeFormTextRecordSetWriter
appends a newline character at the end of each
rendered message, which integrates well with subsequent syslog processing.
Queue Capacity Logging
The ListenUDP
and ListenUDPRecord
Processors support extended debug logging to track the capacity of the internal
event queue. Debug log messages include current capacity, space remaining, and the largest size of the queue since the
Processor was started. Capacity logging messages will be written at the debug level every 60 seconds.
Add the following lines to the NiFi logback.xml
configuration to enable capacity logging for either ListenUDP
or
ListenUDPRecord
:
<logger name="org.apache.nifi.processors.standard.ListenUDP" level="DEBUG" />
<logger name="org.apache.nifi.processors.standard.ListenUDPRecord" level="DEBUG" />
Debug messages will be logged as follows:
Event Queue Capacity [10000] Remaining [9750] Size [250] Largest Size [1500]
The Largest Size
is a key indicator of adequate capacity. If the Largest Size
equals the Event Queue Capacity
, the
Max Size of Message Queue
property should be increased to accommodate incoming events. Having the Largest Size
reach the Event Queue Capacity
indicates that the NiFi framework is not processing events fast enough to retrieve them
from the internal queue. Larger message queues place a greater burden on Java Virtual Machine heap memory. General
system resource utilization should be evaluated when considering queue size increases.
The logger command
The logger command on Linux and Unix systems enables simple testing of listening syslog
Processors. The command is capable of sending one message, specified through a command argument, or multiple messages
provided using a file argument. The logger
command supports both the conventional RFC 3164 BSD syslog protocol and the
RFC 5424 standard.
The following command can be used to send a message with the word running
to a Processor listening on UDP port 5514:
logger -n 127.0.0.1 -P 5514 --rfc3164 running
The command will send a UDP syslog event containing the timestamp, local hostname, and local username along with the message specified, using the RFC 3164 format. The syslog event reads as follows:
<13>Sep 26 12:30:45 hostname username: running
The bracketed number 13
indicates the event Priority as defined in
RFC 3164 Section 4.1.1. Following the formula described,
dividing 13
by 8
leaves a remainder of 5
. The number 5
indicates the Notice
level for event Severity, and
the number 1
indicates user-level messages
as the event Facility.
Sending Syslog Records
Regardless of the Processor selected for receiving syslog messages, the optimal approach for sending records over UDP in
NiFi 1.17.0 is the
PutRecord
Processor configured with the SyslogReader
for the Record Reader
property, and the
UDPEventRecordSink
for the Record Destination Service
property.
Flows designed using the ListenUDPRecord
Processor can use the same SyslogReader
Service for the PutRecord
Processor. Both the ListenSyslog
and ListenUDP
Processors combine multiple syslog messages using a newline
character, which the SyslogReader
is also capable of handling.
The Include Zero Record Results
property in PutRecord
defaults to false
, which should be retained to avoid
unnecessary processing.
UDPEventRecordSink
The UDPEventRecordSink
has standard Hostname
and Port
properties to configure the destination for UDP records.
Both of these properties are required as part of the standard component configuration.
The Record Writer
property supports standard Controller Services that implement the RecordSetWriterFactory
interface. As described for ListenUDPRecord
, the FreeFormTextRecordSetWriter
provides a straightforward option for
formatting syslog records parsed using the SyslogReader
. For flows using ListenUDPRecord
, the same instance of
FreeFormTextRecordSetWriter
can be configured as the Record Writer
in UDPEventRecordSink
. For flows using other
Processors, the same property values can be applied to a new instance of FreeFormTextRecordSetWriter
.
UDPEventRecordSink
also includes a Sender Threads
property, defaulting to 2
, which controls the maximum size of
the thread pool for handling UDP packet processing. UDPEventRecordSink
leverages the Netty
framework for network protocol handling, and passes the Sender Threads
property value to the Netty
NioEventLoopGroup class, which is responsible
for managing socket operations. More is not necessarily better when it comes to thread configuration, but in general,
the number of Sender Threads
should match the number of Concurrent Tasks
configured for the PutRecord
Processor.
The number of Sender Threads
should not exceed the number of CPU cores.
It is important to note that the UDPEventRecordSink
will not necessarily report communication failures, and some
network errors may be logged without causing PutRecord
to route related FlowFiles to the failure
relationship. As
mentioned earlier, the nature of UDP means that receiving systems do not provide delivery acknowledgment. Based on host
configuration or network firewall settings, the Netty framework may receive an
ICMP packet indicating a
destination unreachable
error, which UDPEventRecordSink
will log as an error. This error serves as a courtesy notification. The NiFi Bulletin
Board should be monitored for this type of communication problem, but it is best to verify transmission on the receiving
system to confirm delivery.
Conclusion
Record-oriented processing offers high throughput for content that follows a standard structure, including the
conventional BSD syslog protocol. Building on existing capabilities, the UDPEventRecordSink
extends record-oriented
transmission to UDP packets. Relaying syslog UDP messages provides a simple use case, but more advanced flows can be
designed using other record-oriented Processors and Controller Services. Leveraging record batching for initial
processing is essential for optimal throughput, and selecting the appropriate amount of input validation is important
design decision. Taking these concepts together, it is possible to build advanced flows that support a number of system
logging use cases.