10 Structural Improvements in Apache NiFi 2
Introduction
Released almost ten years after open source inception under the Apache Incubator, Apache NiFi 2.0.0 represents the work of many contributors, over multiple years, across four milestone releases, described in over 2000 Jira issues. NiFi 2 brings such notable features as native Python Processors, streamlined Kubernetes clustering, a rebuilt user interface, flow versioning with GitHub, and a modern baseline on Java 21. In addition to more prominent changes, the next major version of NiFi includes structural improvements that provide better performance, increased maintainability, and stronger security. Reviewing several of these structural improvements highlights the primary goal of NiFi 2: providing a solid foundation for future development through technical debt reduction.
1. Decoupled Public API Repository
NiFi 2.0.0 is the first release to incorporate the Apache NiFi API as a separate standalone library. Prior to NiFi 2, the org.apache.nifi:nifi-api library was a root module within the primary project repository. Although there are benefits to sharing code in a single repository, decoupling the public API creates a stronger distinction between modifying extensions and making fundamental changes. Build strategies exist for releasing selected modules and managing large repositories, but following a distinct release process for a dedicated repository provides a straightforward approach without additional tooling. Discussion of a separate repository for the NiFi API included introducing a formalized set of steps for fundamental NiFi changes, known as NiFi Improvement Proposals. Combining a formal improvement process with a separate repository for the public NiFi API provides a foundation for stable integration and thoughtful changes.
2. Upgraded Application Framework Libraries
Maintaining application dependencies involves a constant struggle between stability and security. Incremental upgrades can be deceptively simple, but major version changes often require substantive adjustments. Among the most significant upgrades in NiFi 2, Spring Framework 6, Jetty 12, and Angular required significant changes. Moving to modern versions of Angular required a complete frontend rewrite from the historical AngularJS framework. Upgrading to Spring Framework 6 and Jetty 12 involved changing hundreds of components to use new packages from Jakarta EE 10. These upgrades were vital to continued project maintenance, as previous versions no longer receive community support. Aligning core dependencies with current versions enables the project to incorporate incremental security updates down the road. These framework upgrades represent just a few of the hundreds of library updates incorporated in NiFi 2.
3. Rebuilt Bootstrap Launch Process
NiFi 0.0.1 introduced the bootstrap module as a strategy for
configuring Java system properties and managing the lifecycle of the application process. The bootstrap process supports
both initialization and application monitoring, providing resiliency in traditional deployment environments. The
historical implementation used a custom socket protocol to communicate with the application process, and required a
second Java process to perform lifecycle operations. With the popularity of containerized deployments, however, the
additional monitoring process added unnecessary resource consumption. For better integration with deployments in
Kubernetes, NiFi 2 refactored the nifi.sh run
command to launch the primary application process without continuing to
run the secondary bootstrap monitor process. The redesigned bootstrap implementation also makes use of the Java
ProcessHandle interface for
application process control, removing the need for a custom socket protocol. This redesign continues to support standard
installations on bare metal or virtual machines, but also provides a much better approach for containerized deployments.
4. Redesigned Runtime Management Server
In conjunction with rebuilding the bootstrap launch process, NiFi 2 includes a redesigned runtime module with an HTTP management server. The org.apache.nifi.NiFi class no longer includes a combination of configuration and initialization. The refactored class includes minimal logging initialization before delegating startup handling to the Application class. The new approach builds on the embedded Java HttpServer to provide process and cluster status information previously implemented using a custom socket protocol. The HTTP server implementation aligns with the bootstrap process to support existing status commands. For deployments in orchestration platforms like Kubernetes, the new runtime management server integrates with HTTP probes for liveness and readiness, providing status without requiring invocation of a separate command. The simple Java HTTP server enables runtime monitoring using a standard protocol without introducing additional dependencies.
5. Streamlined Sensitive Property Algorithms
With continuous improvements in modern cryptography, one the primary lessons learned is that less is more when it comes to algorithm agility. Examples such as TLS 1.3 and age encryption emphasize that protocol versioning rather than runtime algorithm selection avoids a certain class of security issues. Historical versions of NiFi supported dozens of strategies with various cryptographic hashing and encryption options for protecting sensitive properties. NiFi 1.12.0 introduced algorithm options for deriving the sensitive properties key using Argon2 and PBKDF2 together with AES GCM for encryption. NiFi 1.14.0 changed the default algorithm to PBKDF2 with AES-GCM, providing better security than the previous default based on MD5 and AES-CBC. NiFi 2 removed historical insecure strategies and focused the scope of options to one based on PBKDF2, and one based on Argon2. This change is transparent to deployments using the default algorithm, but it improved the maintainability and security surface of the core framework.
6. Unified Framework Key and Certificate Loading
The NiFi framework uses TLS for encrypted communication in several different contexts, including cluster protocol and REST API services. Prior to NiFi 2.0.0, each framework component handled loading server key and certificate files directly, involving duplicative code and repetitive file processing. This approach also prevented consistent behavior for automatic reloading as described in NIFI-12125. NiFi 2 refactored SSLContext loading to a shared configuration and standardized components. The updated implementation loads the configured key store and trust store once on startup, also ensuring consistent behavior with automatic reloading enabled. The redesigned loading strategy provided the foundation for supporting PEM keys and certificates in NiFi 2.1.0.
7. Added Client Credentials Flow for OpenID Connect
OpenID Connect is a protocol that provides standard solutions for authentication. OIDC enables single sign-on through an external identity provider, enforcing a consistent security policy across multiple applications. NiFi has supported OIDC for browser-based access since version 1.4.0 using the Authorization Code Flow, but required mutual TLS for programmatic access. NiFi 2 introduced support for the Client Credentials Flow with OIDC, allowing access to NiFi using tokens obtained from the configured identity provider. NiFi 2 implements support for identity provider Access Tokens using standard JSON Web Keys that the framework obtains from the OpenID Connect Discovery configuration. Based on the token issuer, the NiFi framework selects either the identity provider or the generated public key for verification. Support for the Client Credentials Flow with OIDC brings managed authentication to both machine-based and user-based access.
8. Updated Date and Time Parsing and Formatting
NiFi supports a wide variety of data transformation operations, including record-oriented processing for structured
filtering, projection, and storage at scale. NiFi RecordPath
includes a number of functions for scalar field operations, including date parsing and formatting. These parsing and
formatting operations previously relied on
java.text.SimpleDateFormat
to create instances of
java.util.Date with custom string
patterns. The lack of thread safety in java.text.SimpleDateFormat
and the ambiguity of java.util.Date
presented
challenges for implementation when processing various types of timestamp patterns. With the availability of
java.time.format.DateTimeFormatter
together with more expressive classes in the
java.time package, NiFi
2 replaced historical approaches with updated object conversion strategies. Although DateTimeFormatter
is largely
similar to SimpleDateFormat
for pattern definition, subtle differences, such as the u
pattern character handling,
motivated waiting for a major version release to introduce potential breaking changes. The new implementation provides
more efficient field conversion with more descriptive field types.
9. Modernized Application Bearer Token Signing
NiFi has used JSON Web Tokens for stateless application session tracking since early versions of the framework. NiFi 1.15.0 introduced significant improvements to JWT handling, replacing symmetric key hashing with rotated asymmetric key signing using RSA. With JEP 339 introducing support for the Edwards-Curve Digital Signature Algorithm in Java 15 and NiFi 2 requiring Java 21, application bearer tokens now default to EdDSA with Ed25519 for signing and verification. Moving from RSA to Ed25519 provides as good or better security with smaller keys and more compact signatures. This change reduces both processing cycles and bandwidth consumption, as it applies to all authenticated interaction with the NiFi REST API. NIST added Ed25519 to the list of approved digital signature algorithms in FIPS 186-5 in February 2023, although Java security providers are still in the process of requesting and receiving approval for updated implementations.
10. Added Range Header Support for Content Downloads
As part of work to redesign data content viewing, NiFi 2 added support for the
HTTP Range header. The Range
request header allows HTTP clients
to request partial selections of bytes for the server to return. When examining queued FlowFiles or reviewing processing
lineage, selecting a partial range of bytes enables the client to view portions of large files. This enables features
such as the hexadecimal viewer to present initial file sections without handling the entire payload. The NiFi REST API
returns the Content-Range header indicating the set of bytes
returned, which can be less than the requested range depending on the actual file size. Range requests can specify bytes
at the middle or end of a file, support efficient spot checking.
Conclusion
The release of version 2.0.0 is a major milestone for the NiFi ecosystem. Combining both significant changes and subtle differences, NiFi 2 retains compatibility with the majority of framework features while incorporating substantial technical debt reduction through removing, refactoring, and upgrading vital components. These changes represent a commitment to the health project over the long term, positioning NiFi for future development and continuous improvement in the coming years.