Modernizing Streaming Encryption with age in Apache NiFi
Background
Apache NiFi has supported several data flow encryption strategies from initial incubator
releases. The
EncryptContent
Processor evolved over multiple versions to incorporate support for various algorithms and encoding options, including
OpenPGP as well as custom serialization strategies. Following the development and
standardization of newer key derivation functions and cipher algorithms, many of the options in the EncryptContent
Processor no longer provide sufficient security or interoperability guarantees. More recent releases of NiFi have added
specialized Processors for decrypting historical information and interoperating with OpenPGP. With these use cases
addressed, the requirement remained for encrypting new information using an interoperable specification and modern
algorithms.
Introduction
Apache NiFi 2.0.0 Milestone 1 and version 1.24.0 introduced new Processors supporting the
age encryption specification. The age standard is an open source specification with a
reference implementation written in Go. Building on the
Jagged implementation for Java, the NiFi
EncryptContentAge
and
DecryptContentAge
Processors support creating data flows using the age specification for streaming encryption and decryption. These
Processors follow best practices for stream handling, enabling encryption or decryption of files with sizes ranging from
kilobytes to gigabytes, without excessive memory consumption. The EncryptContentAge
and DecryptContentAge
Processors
support the standard elliptic curve
X25519 Recipient Type, which provides
asymmetric key agreement for one or more recipients. Following the age specification, these processors use the
ChaCha20-Poly1305 cipher algorithm for
payload encryption, which incorporates integrity checking as a
standard feature of algorithm. Based on modern algorithms and open standards, EncryptContentAge
and DecryptContentAge
combine content security, scalable streaming, and simplified configuration, providing an optimal
solution for flow-based encryption with Apache NiFi.
Specification Summary
Encryption terminology includes a number of acronyms and phrases outside the common software development lexicon. Evaluating the relative strengths and weaknesses of particular security solutions is often the domain of specialized practitioners. With that background, the present goal not to cover the mathematical complexities behind the age encryption specification, but rather to provide a summary of key concepts and analogous capabilities.
Selected Algorithms
One primary goal of the age-encryption.org specification is a limited number of
configurable options. The first example of this goal in practice is the selection of ChaCha20-Poly1305
as the only
supported cipher algorithm for payload encryption and decryption. Although this algorithm is somewhat less common than
the ubiquitous Advanced Encryption Standard,
ChaCha20-Poly1305
is the only other cipher algorithm besides AES-GCM
supported for
Transport Layer Security version 1.3. Where TLS 1.2 and
earlier versions supported dozens of cipher suite combinations, TLS 1.3 follows a similar principal of simplicity,
supporting less than ten options, with only two cipher algorithms.
Both AES-GCM
and ChaCha20-Poly1305
support
authenticated encryption which incorporates cryptographic
integrity checking in the algorithm itself, without requiring additional processing. This feature provides a layer of
protection against tampering that might otherwise allow some amount of data modification prior to decryption.
The ChaCha20-Poly1305
algorithm uses a standard key length of 256 bits, avoiding the need to decide between different
key sizes.
The native algorithm for asymmetric key agreement in the age specification is X25519
, which provides strong security
with short key pairs using elliptic curve cryptography. Although the age specification allows for extensible types of
recipients, X25519
enables common public and private key encryption use cases without the historical concerns
surrounding other algorithms.
As opposed to RSA keys, which can be up to several kilobytes in
size, an encoded X25519
private key is only 74 characters long, while still providing strong security guarantees. The
age specification uses a specialized format named Bech32 for encoding public and private keys, described in
Bitcoin Enhancement Proposal 173.
Although Bech32 differs from the popular Base64 encoding, Bech32 avoids the ambiguities present in Base64 strategies,
and also incorporates a checksum that is useful for detecting malformed keys before performing cipher operations.
The difference between an age public key and private key is readily apparent from a visual inspection.
- Public Key:
age1lvyvwawkr0mcnnnncaghunadrqkmuf9e6507x9y920xxpp866cnql7dp2z
- Private Key:
AGE-SECRET-KEY-1N9JEPW6DWJ0ZQUDX63F5A03GX8QUW7PXDE39N8UYF82VZ9PC8UFS3M7XA9
All age X25519
public keys begin with age1
and consist of lowercase characters. All age X25519
private keys begin
with AGE-SECRET-KEY-1
and consist of uppercase characters. The Bech32 format restricts the set of valid characters.
Taken together, these qualities provide unambiguous definition and straightforward processing.
Strict Encoding
In addition to mandating specific cipher and key agreement algorithms, the age specification defines strict rules for encoding header and payload information. The standard encoding requires an ASCII header with recipient information and a binary payload consisting of segments containing no more than 64 kilobytes each. The age specification also supports strict PEM encoding with standard header and footer lines bounding a Base64 content section.
The header portion of a file encrypted using age contains a message authentication code that implementations must validate prior to payload processing. These strict encoding rules enable robust parsing logic that is capable of detecting tampering prior to decryption processing. The segmentation strategy for encrypted payloads enables streamed processing of large messages without unnecessary buffering.
These encoding properties avoid the need for complex parsing that can be difficult to implement and maintain. The age specification maintainers also created an extensive set of community cryptographic test vectors that implementations can use to verify correctness in various positive and negative processing scenarios.
Processor Overview
Building on the age encryption specification, the EncryptContentAge
and DecryptContentAge
Processors provide modern
streaming encryption with straightforward configuration.
EncryptContentAge Processor
The EncryptContentAge Processor is capable of encrypting input FlowFiles to one or more recipients using standard properties. The Processor also supports externalizing recipient configuration so that public keys can be provided using files or HTTP resources. Based on the default configuration, the Processor is ready to use after providing a public key.
The EncryptContentAge
Processor supports the following configuration properties:
File Encoding
Public Key Source
Public Key Recipients
Public Key Recipient Resources
The Processor requires either Public Key Recipients
or Public Key Recipient Resources
depending on the value
selected for the Public Key Source
property.
File Encoding
The File Encoding
property supports either BINARY
or ASCII
for the selected value. The default BINARY
setting
provides the best performance without the overhead of Base64 output encoding, but the ASCII
option is available for
specialized use cases where binary representation is not suitable. Base64 encoding for ASCII
increases the size of
encrypted files, so it is not ideal for large files. The ASCII
option writes encrypted files using strict PEM encoding
with header and footer lines.
Public Key Source
The Public Key Source
property supports either PROPERTIES
or RESOURCES
for providing public key recipient
information. With PROPERTIES
set as the default value, the Public Key Recipients
property supports configuring and
storing recipients within the flow configuration. The RESOURCES
setting enables the Public Key Recipient Resources
property, which supports one or more paths or URLs for external storage and retrieval of age public keys.
Public Key Recipients
The Public Key Recipients
property, required in the default configuration, expects one or more X25519 age public keys
to be configured. According to the standard encoding, each public key must begin with age1
and must be limited to
Bech32 encoded characters. Multiple public keys can be specified with newline separators. Although the name itself
suggests that public keys do not need additional protection, the Public Key Recipients
property is sensitive so that
the application will encrypt values for storage within the flow configuration.
Protecting access to public keys can provide an additional layer of data assurance as described in age and Authenticated Encryption from Filippo Valsorda, author of the age specification and Go reference implementation. Limiting access to age public keys is not necessary for general use cases, but the Processor handles the property as a sensitive value to support various flow designs.
Public Key Recipient Resources
The Public Key Recipient Resources
property is required when the Public Key Source
is configured with the
RESOURCES
value. The property supports one or more file paths or HTTP URLs, which avoids persisting public keys in the
flow configuration. This approach provides a layer of flexibility for use cases where public key recipients must be
managed outside the flow configuration. Multiple file paths or URLs can be specified with comma separators. Following
the same approach as the Public Key Recipients
property value, files referenced as resources must use newlines to
separate multiple public keys. Following the pattern of the --recipients
argument for the
age command, files can contain commented lines that will be ignored when parsing.
DecryptContentAge Processor
The
DecryptContentAge
Processor supports decrypting input FlowFiles with one or more configured private key identities. Following the pattern
of the EncryptContentAge
Processor, DecryptContentAge
supports reading private key identities from external
locations. The Processor can decrypt files encoded with either binary or ASCII formatting, performing content detection
based on the initial bytes of each stream.
The DecryptContentAge
Processor supports the following configuration properties:
Private Key Source
Private Key Identities
Private Key Identity Resources
The Private Key Source
property determines whether the Private Key Identities
or Private Key Identity Resources
property is required.
Private Key Source
The Private Key Source
property supports either PROPERTIES
or RESOURCES
as the source of private key identities.
The PROPERTIES
setting is the default value, requiring one or more private keys to be specified using the
Private Key Identities
property. The RESOURCES
property value enables the Private Key Identity Resources
property, supporting one or more paths to external resources containing age private keys.
Private Key Identities
The Private Key Identities
property expects one or more X25519
age private keys, with multiple keys separated using
newlines. Each private key line must begin with AGE-SECRET-KEY-1
and contain valid Bech32 encoding. The property is
required when the Private Key Source
value is set to PROPERTIES
, which is the default setting.
The DecryptContentAge
Processor attempts to read encrypted file keys using each private key identity until it finds a
successful match. As a result of this processing strategy, configuring multiple private keys incurs a marginal cost. The
processing cost is minimal for small numbers of keys. Supporting multiple private keys enables key rotation in
coordination with senders responsible for encryption.
Private Key Identity Resources
The Private Key Identity Resources
property depends on selecting RESOURCES
as the value of the Private Key Source
property. Similar to the Public Key Recipient Resources
property on the EncryptContentAge
Processor, this property
supports one or more file paths or HTTP URLs.
Using the Private Key Identity Resources
property enables use cases such as external secrets mounted as files in
containerized deployments. Retrieving private keys over HTTP presents potential security concerns, some of which can be
mitigated through the use of transport encryption with HTTPS or limiting access using a localhost connection.
Flow Design
Implementing a secure flow design requires more than configuring encryption and decryption processors. It is essential to consider the source and protection of private keys, as well as potential attack vectors based on inputs and outputs. Encrypting information for external recipients presents less of a concern based on the nature of public keys, but such flow designs must consider repository configuration and retention policies for data provenance.
Data flows designed to receive and decrypt files should consider incorporating message verification before and after decryption to avoid processing untrusted information. This is less of a concern for environments with limited access to public keys, but it highlights the fact that the age specification is focused on file encryption as opposed to digital signature verification.
Key Pair Generation
Encryption and decryption processing requires an X25519 key pair. The age-keygen command provides the standard method for generating encoded X25519 public and private keys. The command can be installed on most modern operating systems using common package managers as described in the installation section of the age reference implementation.
For demonstration purposes, the Age Tool is a browser-based interface that uses WebAssembly to run age operations without executing code on the hosting server. The interface supports generating age key pairs, as well as encryption and decryption operations.
Key Sources
Protected access to private key information is essential to maintaining a secure configuration. When configuring keys as property values, the NiFi framework encrypts sensitive property values and stores the encrypted bytes in the local flow configuration. The security of sensitive property values depends on protecting the NiFi sensitive properties key stored in application properties.
For abstracted and reusable configuration values, NiFi Parameter Contexts support defining sensitive values and referencing those values in component properties. The framework uses the same local encryption approach for handling sensitive parameters and sensitive property values. The framework also supports integrating with external configuration services through Parameter Providers. Standard Parameter Providers include support for various secrets management services. Integration with an external secrets management solution simplifies the steps required for rotating keys.
Property Verification
Both EncryptContentAge
and DecryptContentAge
Processors support verification for configured properties. The standard
property validation ensures that public keys start with age1
and private keys start with AGE-SECRET-KEY-1
. Property
validation also requires that both keys contain the expected number of Bech32 characters, lowercase for public
keys, and uppercase for private keys.
In addition to property validation, the NiFi user interface supports interactive property verification. This feature
allows flow designers to run additional manual checks to confirm configuration settings. For EncryptContentAge
and
DecryptContentAge
, manual verification ensures that the system can read the supplied key using Bech32 decoding and
create required X25519 key objects. Bech32 decoding evaluates the trailing checksum for each configured key. Loading
X25519 key objects ensures that the configured Java Security Provider supports the required algorithms.
Manual verification is not necessary for successful configuration, but it provides helpful diagnostic information prior
to attempting encryption or decryption operations. The verification process also reports the number of public keys for
EncryptContentAge
and the number of private keys for DecryptContentAge
, providing confirmation of expected settings
when using multiple keys.
Transfer Considerations
Although the age encryption specification uses modern algorithms, flow designs performing encryption and transfer should constrain additional information that could be sent along with encrypted files. For example, sending an encrypted file over HTTP involves some number of request headers. Basic information such as content length is not a concern, but sending other headers that indicate the original filename, content type, or other identifying information may weaken the overall security of the system. The same considerations apply to event messaging or object storage solutions. Some amount of identification is often necessary for tracking, but it is important to consider the full scope of information when designing flows with file encryption.
Conclusion
The EncryptContentAge
and DecryptContentAge
Processors in Apache NiFi present notable improvements over previous
solutions. Building on an open standard avoids tight coupling to custom solutions. Supporting selected modern algorithms
reduces the potential for misconfiguration or content manipulation. Providing scalable streaming enables data flows to
handle files from kilobytes to gigabytes without complicated segmentation strategies. With NiFi Processors supporting
the age encryption standard, data flow engineers can build scalable, reliable, and secure processing solutions.