Managing Logging Libraries in Apache NiFi
Background
Logging is a perennial cross-cutting concern for most software applications. Providing behavior tracking is essential for monitoring and troubleshooting system health and status. Supporting configurable control over application logging at runtime is a basic requirement for most systems, but implementing clear and concise logging involves important design decisions.
Over the years, multiple Java libraries have attempted to solve these challenges through a combination of abstracted interfaces and selectable implementations. Reported vulnerabilities and concerns in multiple logging libraries highlight the importance of selecting an approach that provides clean separation between interface usage and runtime implementation. Maintaining a clear distinction between interface and implementation is essential to a sustainable application. As various software libraries have made different decisions for logging, selecting and excluding the right set of dependencies ensures consistent runtime behavior and avoids unnecessary library references.
Introduction
Apache NiFi has leveraged SLF4J as its logging interface and Logback as the corresponding runtime implementation since project inception. Through the use of hierarchical class loading, NiFi routes logging statements from a variety of library interfaces through Logback during runtime operation. This approach minimizes configuration problems related to transitive dependencies from different logging libraries. Although this strategy controls runtime behavior, it can also hide the inclusion of unnecessary dependencies. The process of evaluating Log4Shell and Apache NiFi uncovered several opportunities to exclude multiple logging libraries that were not necessary for runtime operation. Excluding unused logging libraries and adding bridge dependencies not only reduces potential false positives in security scans, but also streamlines binary packaging.
Library Overview
The number of logging libraries available for Java reflects a variety of efforts to solve the logging problem over the years. The following libraries and associated release dates provide a short summary of popular solutions for the Java platform:
- Log4j released 2001
- Java Logging released 2002
- Apache Commons Logging released 2002
- SLF4J released 2006
- Logback released 2006
- Apache Log4j 2 released 2014
Log4j and Java Logging
The original version of Log4j met a critical need and provided configurable runtime logging for Java applications before
the java.util.logging
package added a subset of similar capabilities in Java 1.4. Both approaches had similar sets of
classes and methods, but selecting one approach involved tight coupling between implementation code and runtime
configuration. This incompatibility also precluded a single configuration approach when attempting to integrate with a
library that did not use the same logging system as the main application. Although Java Logging remains part of standard
Java runtime distributions, Log4j stopped receiving maintenance updates in 2015.
Commons Logging
The integration problem spurred the creation of Apache Commons Logging, originally known as Jakarta Commons Logging. The purpose of Commons Logging was not to provide another runtime implementation, but rather to allow developers to integrate application logging using a standard interface, leaving runtime configuration to a selected implementation library. Coding to a simple logging interface lessened the potential challenges associated with integrating libraries that used disparate logging frameworks, but it was not a complete solution. Commons Logging used Java reflection to determine the runtime implementation, which minimized the amount of work necessary for application deployment. This approach was not without its own difficulties in certain environments, such as web application servers using multiple scoped class loaders.
SLF4J
The Simple Logging Facade for Java shared goals similar to Commons Logging in terms of
providing a standard interface that avoiding coupling application code to a particular logging implementation. In order
to avoid runtime issues with complex class loaders, SLF4J followed a different approach to runtime selection in the form
of specific implementation libraries. SLF4J uses reflection to find available logging implementations at runtime based
on the presence of a specific implementation class. The bootstrap process also checks the for existence of multiple
implementation libraries, writing detailed warning messages at startup. This integration pattern allowed applications
and libraries to share a single Logger
interface while selecting a different logging runtime as needed.
SLF4J provided implementation libraries and adapters to support Log4j, Java Logging, and Commons Logging. This strategy supported both runtime flexibility, and compatibility with a broad spectrum of other libraries. Through selective inclusion of SLF4J implementation and bridge libraries, an application can support consistent runtime logging.
Logback
The founder of Log4j started Logback to address a number of issues inherent in the first version of Log4j. With the foundation of SLF4J, contributors developed Logback as a direct implementation of the standard logging interfaces. In comparison to feature gaps and performance issues in Log4j and Java Logging, Logback provided an improved option that built on past approaches. For applications not directly coupled to Log4j or Java Logging, Logback served as the natural runtime implementation for SLF4J.
Apache Log4j 2
The announcement of Apache Log4j 2 in 2012 outlined several important features that motivated development of a new version. Highlighting robust log event handling during dynamic reconfiguration, as well as a flexible plugin architecture, Log4j 2 also supported integration with SLF4J and compatibility with other logging systems. Log4j 2 maintainers released the first production version in 2014, leading to widespread adoption in a number of open source products. With support for migrating from other logging libraries, Log4j 2 presented a compelling option for projects inside and outside the Apache Software Foundation.
NiFi Logging
With development beginning prior to the release of Log4j 2, and initial open source availability in 2014, Apache NiFi has relied on SLF4J and Logback from early project versions. In order to support integration with a wide array of products and services, NiFi includes SLF4J and Logback in the core set of project libraries. The NiFi hierarchical class loader grants priority to these libraries, avoiding runtime version conflicts and potential class definitions overrides present in bundled extension components.
Standard Project Libraries
NiFi provides the following SLF4J library in the project library directory:
NiFi includes the following Logback libraries to support runtime logging:
In addition to these minimum dependencies, NiFi also includes several bridge libraries to route logging from other libraries through SLF4J and on to Logback.
The following SLF4J library provides Commons Logging classes to support routing through SLF4J:
The following SLF4J library bridges Java Logging to SLF4J:
The following SLF4J library implements the original Log4j interfaces for routing to SLF4J:
These bridging and routing libraries take precedence over Commons Logging and original Log4j libraries that might be included as transitive dependencies in extension components.
SLF4J does not provide a library for routing Apache Log4j 2 events, but the Log4j 2 project includes the log4j-to-slf4j adapter library for routing Log4j 2 interface calls to SLF4J. NiFi does not include this library in the core set of project dependencies in 1.15.2 or prior versions.
Banned Logging Libraries
Although hierarchical loading effectively eliminates runtime references to unnecessary logging libraries, NiFi 1.15.0 and following included several updates to remove all copies of these unused dependencies. NiFi uses the Maven Enforcer Plugin to provide a set of rules for standardizing project dependencies and versions.
NiFi 1.15.0 excluded and banned references to the original Log4j library.
NiFi 1.15.2 used the same approach to eliminate both the Commons Logging library and the Log4j 2 log4j-core library.
The Enforcer Plugin identifies banned dependencies using a combination of group and artifact identifiers with optional version and classifier attributes. The following section illustrates the Enforcer Plugin configuration for banned logging dependencies:
<bannedDependencies>
<excludes>
<exclude>log4j:log4j</exclude>
<exclude>org.apache.logging.log4j:log4j-core</exclude>
<exclude>commons-logging:commons-logging</exclude>
</excludes>
</bannedDependencies>
Managing Logging Libraries
Updating the NiFi build configuration to meet the Enforcer Plugin rules required various adjustments based on the chain of dependencies specific to each referencing library. The general approach involved analyzing the dependency tree of referencing libraries, excluding banned dependencies, and declaring a corresponding dependency on the appropriate SLF4J adapter.
Excluding Log4j and Commons Logging
A number of extension components included transitive dependencies on both the original version of Log4j and Commons Logging. For example, the nifi-hadoop-libraries-nar module includes several direct dependencies on Apache Hadoop libraries. Eliminating both Log4j and Commons Logging involved adding the following elements to each Hadoop dependency:
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
</exclusions>
With Log4j and Commons Logging excluded, declaring the following dependencies with the provided
scope enables runtime
log routing to SLF4J and Logback:
<dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
</dependency>
</dependencies>
The NiFi parent Maven configuration defines both the provided
scope and the version for these libraries.
Excluding Apache Log4j 2
Several NiFi extension components included references to multiple Apache Log4j 2 libraries. Excluding unnecessary Log4j 2 dependencies involved a different approach based on component dependencies.
The nifi-atlas-reporting-task
module included transitive dependencies on log4j-api
and log4j-core
through the atlas-notification
library. Apache
Atlas application logging uses SLF4J, with Log4j 2 references limited to a custom log appender for failed messages.
Previous versions of NiFi routed Apache Atlas logging to Logback based on standard SLF4J behavior, but NiFi 1.15.2
removed both Log4j 2 dependencies using the following exclusions:
<exclusions>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
</exclusion>
</exclusions>
After following a similar pattern with several other components, the nifi-elasticsearch-5-processors module was the only component with direct usage of Log4j 2. Although newer versions of Elasticsearch client libraries minimized external dependencies, initial versions of Elasticsearch 5 required a direct dependency on Log4j 2. Leveraging the Log4j 2 SLF4J adapter library eliminated references to the Log4j 2 core runtime implementation library.
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-to-slf4j</artifactId>
</dependency>
</dependencies>
Conclusion
Recent vulnerabilities in Log4j 2, as well as improvements to Logback, reflect the ongoing challenge of developing a logging framework that provides both flexibility and security. No software library is free from potential problems. Avoiding tight coupling to particular implementations not only simplifies runtime configuration, but also eases the burden on integrating systems.
The effort involved in selective dependency exclusion highlights both the challenge of integrating with diverse products, and the importance of careful dependency declaration. Dependency definition requires the same level of attention as other aspects of code quality. A thorough evaluation of project dependency trees combined with the right set of inclusions and exclusions increases both maintainability and security.