ExceptionFactory

Producing content that a reasonable developer might want to read

Managing Logging Libraries in Apache NiFi

NiFi Logging Dependencies

2021-12-29 • 8 minute read • David Handermann

Background

Logging is a perennial cross-cutting concern for most software applications. Providing behavior tracking is essential for monitoring and troubleshooting system health and status. Supporting configurable control over application logging at runtime is a basic requirement for most systems, but implementing clear and concise logging involves important design decisions.

Over the years, multiple Java libraries have attempted to solve these challenges through a combination of abstracted interfaces and selectable implementations. Reported vulnerabilities and concerns in multiple logging libraries highlight the importance of selecting an approach that provides clean separation between interface usage and runtime implementation. Maintaining a clear distinction between interface and implementation is essential to a sustainable application. As various software libraries have made different decisions for logging, selecting and excluding the right set of dependencies ensures consistent runtime behavior and avoids unnecessary library references.

Introduction

Apache NiFi has leveraged SLF4J as its logging interface and Logback as the corresponding runtime implementation since project inception. Through the use of hierarchical class loading, NiFi routes logging statements from a variety of library interfaces through Logback during runtime operation. This approach minimizes configuration problems related to transitive dependencies from different logging libraries. Although this strategy controls runtime behavior, it can also hide the inclusion of unnecessary dependencies. The process of evaluating Log4Shell and Apache NiFi uncovered several opportunities to exclude multiple logging libraries that were not necessary for runtime operation. Excluding unused logging libraries and adding bridge dependencies not only reduces potential false positives in security scans, but also streamlines binary packaging.

Library Overview

The number of logging libraries available for Java reflects a variety of efforts to solve the logging problem over the years. The following libraries and associated release dates provide a short summary of popular solutions for the Java platform:

Log4j and Java Logging

The original version of Log4j met a critical need and provided configurable runtime logging for Java applications before the java.util.logging package added a subset of similar capabilities in Java 1.4. Both approaches had similar sets of classes and methods, but selecting one approach involved tight coupling between implementation code and runtime configuration. This incompatibility also precluded a single configuration approach when attempting to integrate with a library that did not use the same logging system as the main application. Although Java Logging remains part of standard Java runtime distributions, Log4j stopped receiving maintenance updates in 2015.

Commons Logging

The integration problem spurred the creation of Apache Commons Logging, originally known as Jakarta Commons Logging. The purpose of Commons Logging was not to provide another runtime implementation, but rather to allow developers to integrate application logging using a standard interface, leaving runtime configuration to a selected implementation library. Coding to a simple logging interface lessened the potential challenges associated with integrating libraries that used disparate logging frameworks, but it was not a complete solution. Commons Logging used Java reflection to determine the runtime implementation, which minimized the amount of work necessary for application deployment. This approach was not without its own difficulties in certain environments, such as web application servers using multiple scoped class loaders.

SLF4J

The Simple Logging Facade for Java shared goals similar to Commons Logging in terms of providing a standard interface that avoiding coupling application code to a particular logging implementation. In order to avoid runtime issues with complex class loaders, SLF4J followed a different approach to runtime selection in the form of specific implementation libraries. SLF4J uses reflection to find available logging implementations at runtime based on the presence of a specific implementation class. The bootstrap process also checks the for existence of multiple implementation libraries, writing detailed warning messages at startup. This integration pattern allowed applications and libraries to share a single Logger interface while selecting a different logging runtime as needed.

SLF4J provided implementation libraries and adapters to support Log4j, Java Logging, and Commons Logging. This strategy supported both runtime flexibility, and compatibility with a broad spectrum of other libraries. Through selective inclusion of SLF4J implementation and bridge libraries, an application can support consistent runtime logging.

Logback

The founder of Log4j started Logback to address a number of issues inherent in the first version of Log4j. With the foundation of SLF4J, contributors developed Logback as a direct implementation of the standard logging interfaces. In comparison to feature gaps and performance issues in Log4j and Java Logging, Logback provided an improved option that built on past approaches. For applications not directly coupled to Log4j or Java Logging, Logback served as the natural runtime implementation for SLF4J.

Apache Log4j 2

The announcement of Apache Log4j 2 in 2012 outlined several important features that motivated development of a new version. Highlighting robust log event handling during dynamic reconfiguration, as well as a flexible plugin architecture, Log4j 2 also supported integration with SLF4J and compatibility with other logging systems. Log4j 2 maintainers released the first production version in 2014, leading to widespread adoption in a number of open source products. With support for migrating from other logging libraries, Log4j 2 presented a compelling option for projects inside and outside the Apache Software Foundation.

NiFi Logging

With development beginning prior to the release of Log4j 2, and initial open source availability in 2014, Apache NiFi has relied on SLF4J and Logback from early project versions. In order to support integration with a wide array of products and services, NiFi includes SLF4J and Logback in the core set of project libraries. The NiFi hierarchical class loader grants priority to these libraries, avoiding runtime version conflicts and potential class definitions overrides present in bundled extension components.

Standard Project Libraries

NiFi provides the following SLF4J library in the project library directory:

NiFi includes the following Logback libraries to support runtime logging:

In addition to these minimum dependencies, NiFi also includes several bridge libraries to route logging from other libraries through SLF4J and on to Logback.

The following SLF4J library provides Commons Logging classes to support routing through SLF4J:

The following SLF4J library bridges Java Logging to SLF4J:

The following SLF4J library implements the original Log4j interfaces for routing to SLF4J:

These bridging and routing libraries take precedence over Commons Logging and original Log4j libraries that might be included as transitive dependencies in extension components.

SLF4J does not provide a library for routing Apache Log4j 2 events, but the Log4j 2 project includes the log4j-to-slf4j adapter library for routing Log4j 2 interface calls to SLF4J. NiFi does not include this library in the core set of project dependencies in 1.15.2 or prior versions.

Banned Logging Libraries

Although hierarchical loading effectively eliminates runtime references to unnecessary logging libraries, NiFi 1.15.0 and following included several updates to remove all copies of these unused dependencies. NiFi uses the Maven Enforcer Plugin to provide a set of rules for standardizing project dependencies and versions.

NiFi 1.15.0 excluded and banned references to the original Log4j library.

NiFi 1.15.2 used the same approach to eliminate both the Commons Logging library and the Log4j 2 log4j-core library.

The Enforcer Plugin identifies banned dependencies using a combination of group and artifact identifiers with optional version and classifier attributes. The following section illustrates the Enforcer Plugin configuration for banned logging dependencies:

<bannedDependencies>
  <excludes>
    <exclude>log4j:log4j</exclude>
    <exclude>org.apache.logging.log4j:log4j-core</exclude>
    <exclude>commons-logging:commons-logging</exclude>
  </excludes>
</bannedDependencies>

Managing Logging Libraries

Updating the NiFi build configuration to meet the Enforcer Plugin rules required various adjustments based on the chain of dependencies specific to each referencing library. The general approach involved analyzing the dependency tree of referencing libraries, excluding banned dependencies, and declaring a corresponding dependency on the appropriate SLF4J adapter.

Excluding Log4j and Commons Logging

A number of extension components included transitive dependencies on both the original version of Log4j and Commons Logging. For example, the nifi-hadoop-libraries-nar module includes several direct dependencies on Apache Hadoop libraries. Eliminating both Log4j and Commons Logging involved adding the following elements to each Hadoop dependency:

<exclusions>
  <exclusion>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
  </exclusion>
  <exclusion>
    <groupId>commons-logging</groupId>
    <artifactId>commons-logging</artifactId>
  </exclusion>
</exclusions>

With Log4j and Commons Logging excluded, declaring the following dependencies with the provided scope enables runtime log routing to SLF4J and Logback:

<dependencies>
  <dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>log4j-over-slf4j</artifactId>
  </dependency>
  <dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>jcl-over-slf4j</artifactId>
  </dependency>
</dependencies>

The NiFi parent Maven configuration defines both the provided scope and the version for these libraries.

Excluding Apache Log4j 2

Several NiFi extension components included references to multiple Apache Log4j 2 libraries. Excluding unnecessary Log4j 2 dependencies involved a different approach based on component dependencies.

The nifi-atlas-reporting-task module included transitive dependencies on log4j-api and log4j-core through the atlas-notification library. Apache Atlas application logging uses SLF4J, with Log4j 2 references limited to a custom log appender for failed messages. Previous versions of NiFi routed Apache Atlas logging to Logback based on standard SLF4J behavior, but NiFi 1.15.2 removed both Log4j 2 dependencies using the following exclusions:

<exclusions>
  <exclusion>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
  </exclusion>
  <exclusion>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-api</artifactId>
  </exclusion>
</exclusions>

After following a similar pattern with several other components, the nifi-elasticsearch-5-processors module was the only component with direct usage of Log4j 2. Although newer versions of Elasticsearch client libraries minimized external dependencies, initial versions of Elasticsearch 5 required a direct dependency on Log4j 2. Leveraging the Log4j 2 SLF4J adapter library eliminated references to the Log4j 2 core runtime implementation library.

<dependencies>
  <dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-api</artifactId>
  </dependency>
  <dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-to-slf4j</artifactId>
  </dependency>
</dependencies>

Conclusion

Recent vulnerabilities in Log4j 2, as well as improvements to Logback, reflect the ongoing challenge of developing a logging framework that provides both flexibility and security. No software library is free from potential problems. Avoiding tight coupling to particular implementations not only simplifies runtime configuration, but also eases the burden on integrating systems.

The effort involved in selective dependency exclusion highlights both the challenge of integrating with diverse products, and the importance of careful dependency declaration. Dependency definition requires the same level of attention as other aspects of code quality. A thorough evaluation of project dependency trees combined with the right set of inclusions and exclusions increases both maintainability and security.