Modernizing Streaming Encryption with age in Apache NiFi
Background
Apache NiFi has supported several data flow encryption strategies from initial incubator
releases. The
EncryptContent
Processor evolved over multiple versions to incorporate support for various algorithms and encoding options, including
OpenPGP as well as custom serialization strategies. Following the development and
standardization of newer key derivation functions and cipher algorithms, many of the options in the EncryptContent
Processor no longer provide sufficient security or interoperability guarantees. More recent releases of NiFi have added
specialized Processors for decrypting historical information and interoperating with OpenPGP. With these use cases
addressed, the requirement remained for encrypting new information using an interoperable specification and modern
algorithms.
Introduction
Apache NiFi 2.0.0 Milestone 1 and version 1.24.0 introduced new Processors supporting the
age encryption specification. The age standard is an open source specification with a
reference implementation written in Go. Building on the
Jagged implementation for Java, the NiFi
EncryptContentAge
and
DecryptContentAge
Processors support creating data flows using the age specification for streaming encryption and decryption. These
Processors follow best practices for stream handling, enabling encryption or decryption of files with sizes ranging from
kilobytes to gigabytes, without excessive memory consumption. The EncryptContentAge and DecryptContentAge Processors
support the standard elliptic curve
X25519 Recipient Type, which provides
asymmetric key agreement for one or more recipients. Following the age specification, these processors use the
ChaCha20-Poly1305 cipher algorithm for
payload encryption, which incorporates integrity checking as a
standard feature of algorithm. Based on modern algorithms and open standards, EncryptContentAge
and DecryptContentAge combine content security, scalable streaming, and simplified configuration, providing an optimal
solution for flow-based encryption with Apache NiFi.
Specification Summary
Encryption terminology includes a number of acronyms and phrases outside the common software development lexicon. Evaluating the relative strengths and weaknesses of particular security solutions is often the domain of specialized practitioners. With that background, the present goal not to cover the mathematical complexities behind the age encryption specification, but rather to provide a summary of key concepts and analogous capabilities.
Selected Algorithms
One primary goal of the age-encryption.org specification is a limited number of
configurable options. The first example of this goal in practice is the selection of ChaCha20-Poly1305 as the only
supported cipher algorithm for payload encryption and decryption. Although this algorithm is somewhat less common than
the ubiquitous Advanced Encryption Standard,
ChaCha20-Poly1305 is the only other cipher algorithm besides AES-GCM supported for
Transport Layer Security version 1.3. Where TLS 1.2 and
earlier versions supported dozens of cipher suite combinations, TLS 1.3 follows a similar principal of simplicity,
supporting less than ten options, with only two cipher algorithms.
Both AES-GCM and ChaCha20-Poly1305 support
authenticated encryption which incorporates cryptographic
integrity checking in the algorithm itself, without requiring additional processing. This feature provides a layer of
protection against tampering that might otherwise allow some amount of data modification prior to decryption.
The ChaCha20-Poly1305 algorithm uses a standard key length of 256 bits, avoiding the need to decide between different
key sizes.
The native algorithm for asymmetric key agreement in the age specification is X25519, which provides strong security
with short key pairs using elliptic curve cryptography. Although the age specification allows for extensible types of
recipients, X25519 enables common public and private key encryption use cases without the historical concerns
surrounding other algorithms.
As opposed to RSA keys, which can be up to several kilobytes in
size, an encoded X25519 private key is only 74 characters long, while still providing strong security guarantees. The
age specification uses a specialized format named Bech32 for encoding public and private keys, described in
Bitcoin Enhancement Proposal 173.
Although Bech32 differs from the popular Base64 encoding, Bech32 avoids the ambiguities present in Base64 strategies,
and also incorporates a checksum that is useful for detecting malformed keys before performing cipher operations.
The difference between an age public key and private key is readily apparent from a visual inspection.
- Public Key:
age1lvyvwawkr0mcnnnncaghunadrqkmuf9e6507x9y920xxpp866cnql7dp2z - Private Key:
AGE-SECRET-KEY-1N9JEPW6DWJ0ZQUDX63F5A03GX8QUW7PXDE39N8UYF82VZ9PC8UFS3M7XA9
All age X25519 public keys begin with age1 and consist of lowercase characters. All age X25519 private keys begin
with AGE-SECRET-KEY-1 and consist of uppercase characters. The Bech32 format restricts the set of valid characters.
Taken together, these qualities provide unambiguous definition and straightforward processing.
Strict Encoding
In addition to mandating specific cipher and key agreement algorithms, the age specification defines strict rules for encoding header and payload information. The standard encoding requires an ASCII header with recipient information and a binary payload consisting of segments containing no more than 64 kilobytes each. The age specification also supports strict PEM encoding with standard header and footer lines bounding a Base64 content section.
The header portion of a file encrypted using age contains a message authentication code that implementations must validate prior to payload processing. These strict encoding rules enable robust parsing logic that is capable of detecting tampering prior to decryption processing. The segmentation strategy for encrypted payloads enables streamed processing of large messages without unnecessary buffering.
These encoding properties avoid the need for complex parsing that can be difficult to implement and maintain. The age specification maintainers also created an extensive set of community cryptographic test vectors that implementations can use to verify correctness in various positive and negative processing scenarios.
Processor Overview
Building on the age encryption specification, the EncryptContentAge and DecryptContentAge Processors provide modern
streaming encryption with straightforward configuration.
EncryptContentAge Processor
The EncryptContentAge Processor is capable of encrypting input FlowFiles to one or more recipients using standard properties. The Processor also supports externalizing recipient configuration so that public keys can be provided using files or HTTP resources. Based on the default configuration, the Processor is ready to use after providing a public key.
The EncryptContentAge Processor supports the following configuration properties:
File EncodingPublic Key SourcePublic Key RecipientsPublic Key Recipient Resources
The Processor requires either Public Key Recipients or Public Key Recipient Resources depending on the value
selected for the Public Key Source property.
File Encoding
The File Encoding property supports either BINARY or ASCII for the selected value. The default BINARY setting
provides the best performance without the overhead of Base64 output encoding, but the ASCII option is available for
specialized use cases where binary representation is not suitable. Base64 encoding for ASCII increases the size of
encrypted files, so it is not ideal for large files. The ASCII option writes encrypted files using strict PEM encoding
with header and footer lines.
Public Key Source
The Public Key Source property supports either PROPERTIES or RESOURCES for providing public key recipient
information. With PROPERTIES set as the default value, the Public Key Recipients property supports configuring and
storing recipients within the flow configuration. The RESOURCES setting enables the Public Key Recipient Resources
property, which supports one or more paths or URLs for external storage and retrieval of age public keys.
Public Key Recipients
The Public Key Recipients property, required in the default configuration, expects one or more X25519 age public keys
to be configured. According to the standard encoding, each public key must begin with age1 and must be limited to
Bech32 encoded characters. Multiple public keys can be specified with newline separators. Although the name itself
suggests that public keys do not need additional protection, the Public Key Recipients property is sensitive so that
the application will encrypt values for storage within the flow configuration.
Protecting access to public keys can provide an additional layer of data assurance as described in age and Authenticated Encryption from Filippo Valsorda, author of the age specification and Go reference implementation. Limiting access to age public keys is not necessary for general use cases, but the Processor handles the property as a sensitive value to support various flow designs.
Public Key Recipient Resources
The Public Key Recipient Resources property is required when the Public Key Source is configured with the
RESOURCES value. The property supports one or more file paths or HTTP URLs, which avoids persisting public keys in the
flow configuration. This approach provides a layer of flexibility for use cases where public key recipients must be
managed outside the flow configuration. Multiple file paths or URLs can be specified with comma separators. Following
the same approach as the Public Key Recipients property value, files referenced as resources must use newlines to
separate multiple public keys. Following the pattern of the --recipients argument for the
age command, files can contain commented lines that will be ignored when parsing.
DecryptContentAge Processor
The
DecryptContentAge
Processor supports decrypting input FlowFiles with one or more configured private key identities. Following the pattern
of the EncryptContentAge Processor, DecryptContentAge supports reading private key identities from external
locations. The Processor can decrypt files encoded with either binary or ASCII formatting, performing content detection
based on the initial bytes of each stream.
The DecryptContentAge Processor supports the following configuration properties:
Private Key SourcePrivate Key IdentitiesPrivate Key Identity Resources
The Private Key Source property determines whether the Private Key Identities or Private Key Identity Resources
property is required.
Private Key Source
The Private Key Source property supports either PROPERTIES or RESOURCES as the source of private key identities.
The PROPERTIES setting is the default value, requiring one or more private keys to be specified using the
Private Key Identities property. The RESOURCES property value enables the Private Key Identity Resources
property, supporting one or more paths to external resources containing age private keys.
Private Key Identities
The Private Key Identities property expects one or more X25519 age private keys, with multiple keys separated using
newlines. Each private key line must begin with AGE-SECRET-KEY-1 and contain valid Bech32 encoding. The property is
required when the Private Key Source value is set to PROPERTIES, which is the default setting.
The DecryptContentAge Processor attempts to read encrypted file keys using each private key identity until it finds a
successful match. As a result of this processing strategy, configuring multiple private keys incurs a marginal cost. The
processing cost is minimal for small numbers of keys. Supporting multiple private keys enables key rotation in
coordination with senders responsible for encryption.
Private Key Identity Resources
The Private Key Identity Resources property depends on selecting RESOURCES as the value of the Private Key Source
property. Similar to the Public Key Recipient Resources property on the EncryptContentAge Processor, this property
supports one or more file paths or HTTP URLs.
Using the Private Key Identity Resources property enables use cases such as external secrets mounted as files in
containerized deployments. Retrieving private keys over HTTP presents potential security concerns, some of which can be
mitigated through the use of transport encryption with HTTPS or limiting access using a localhost connection.
Flow Design
Implementing a secure flow design requires more than configuring encryption and decryption processors. It is essential to consider the source and protection of private keys, as well as potential attack vectors based on inputs and outputs. Encrypting information for external recipients presents less of a concern based on the nature of public keys, but such flow designs must consider repository configuration and retention policies for data provenance.
Data flows designed to receive and decrypt files should consider incorporating message verification before and after decryption to avoid processing untrusted information. This is less of a concern for environments with limited access to public keys, but it highlights the fact that the age specification is focused on file encryption as opposed to digital signature verification.
Key Pair Generation
Encryption and decryption processing requires an X25519 key pair. The age-keygen command provides the standard method for generating encoded X25519 public and private keys. The command can be installed on most modern operating systems using common package managers as described in the installation section of the age reference implementation.
For demonstration purposes, the Age Tool is a browser-based interface that uses WebAssembly to run age operations without executing code on the hosting server. The interface supports generating age key pairs, as well as encryption and decryption operations.
Key Sources
Protected access to private key information is essential to maintaining a secure configuration. When configuring keys as property values, the NiFi framework encrypts sensitive property values and stores the encrypted bytes in the local flow configuration. The security of sensitive property values depends on protecting the NiFi sensitive properties key stored in application properties.
For abstracted and reusable configuration values, NiFi Parameter Contexts support defining sensitive values and referencing those values in component properties. The framework uses the same local encryption approach for handling sensitive parameters and sensitive property values. The framework also supports integrating with external configuration services through Parameter Providers. Standard Parameter Providers include support for various secrets management services. Integration with an external secrets management solution simplifies the steps required for rotating keys.
Property Verification
Both EncryptContentAge and DecryptContentAge Processors support verification for configured properties. The standard
property validation ensures that public keys start with age1 and private keys start with AGE-SECRET-KEY-1. Property
validation also requires that both keys contain the expected number of Bech32 characters, lowercase for public
keys, and uppercase for private keys.
In addition to property validation, the NiFi user interface supports interactive property verification. This feature
allows flow designers to run additional manual checks to confirm configuration settings. For EncryptContentAge and
DecryptContentAge, manual verification ensures that the system can read the supplied key using Bech32 decoding and
create required X25519 key objects. Bech32 decoding evaluates the trailing checksum for each configured key. Loading
X25519 key objects ensures that the configured Java Security Provider supports the required algorithms.
Manual verification is not necessary for successful configuration, but it provides helpful diagnostic information prior
to attempting encryption or decryption operations. The verification process also reports the number of public keys for
EncryptContentAge and the number of private keys for DecryptContentAge, providing confirmation of expected settings
when using multiple keys.
Transfer Considerations
Although the age encryption specification uses modern algorithms, flow designs performing encryption and transfer should constrain additional information that could be sent along with encrypted files. For example, sending an encrypted file over HTTP involves some number of request headers. Basic information such as content length is not a concern, but sending other headers that indicate the original filename, content type, or other identifying information may weaken the overall security of the system. The same considerations apply to event messaging or object storage solutions. Some amount of identification is often necessary for tracking, but it is important to consider the full scope of information when designing flows with file encryption.
Conclusion
The EncryptContentAge and DecryptContentAge Processors in Apache NiFi present notable improvements over previous
solutions. Building on an open standard avoids tight coupling to custom solutions. Supporting selected modern algorithms
reduces the potential for misconfiguration or content manipulation. Providing scalable streaming enables data flows to
handle files from kilobytes to gigabytes without complicated segmentation strategies. With NiFi Processors supporting
the age encryption standard, data flow engineers can build scalable, reliable, and secure processing solutions.