ExceptionFactory

Producing content that a reasonable developer might want to read

Modernizing Streaming Encryption with age in Apache NiFi

age Encryption NiFi Security

2023-12-04 • 12 minute read • David Handermann

Background

Apache NiFi has supported several data flow encryption strategies from initial incubator releases. The EncryptContent Processor evolved over multiple versions to incorporate support for various algorithms and encoding options, including OpenPGP as well as custom serialization strategies. Following the development and standardization of newer key derivation functions and cipher algorithms, many of the options in the EncryptContent Processor no longer provide sufficient security or interoperability guarantees. More recent releases of NiFi have added specialized Processors for decrypting historical information and interoperating with OpenPGP. With these use cases addressed, the requirement remained for encrypting new information using an interoperable specification and modern algorithms.

Introduction

Apache NiFi 2.0.0 Milestone 1 and version 1.24.0 introduced new Processors supporting the age encryption specification. The age standard is an open source specification with a reference implementation written in Go. Building on the Jagged implementation for Java, the NiFi EncryptContentAge and DecryptContentAge Processors support creating data flows using the age specification for streaming encryption and decryption. These Processors follow best practices for stream handling, enabling encryption or decryption of files with sizes ranging from kilobytes to gigabytes, without excessive memory consumption. The EncryptContentAge and DecryptContentAge Processors support the standard elliptic curve X25519 Recipient Type, which provides asymmetric key agreement for one or more recipients. Following the age specification, these processors use the ChaCha20-Poly1305 cipher algorithm for payload encryption, which incorporates integrity checking as a standard feature of algorithm. Based on modern algorithms and open standards, EncryptContentAge and DecryptContentAge combine content security, scalable streaming, and simplified configuration, providing an optimal solution for flow-based encryption with Apache NiFi.

Specification Summary

Encryption terminology includes a number of acronyms and phrases outside the common software development lexicon. Evaluating the relative strengths and weaknesses of particular security solutions is often the domain of specialized practitioners. With that background, the present goal not to cover the mathematical complexities behind the age encryption specification, but rather to provide a summary of key concepts and analogous capabilities.

Selected Algorithms

One primary goal of the age-encryption.org specification is a limited number of configurable options. The first example of this goal in practice is the selection of ChaCha20-Poly1305 as the only supported cipher algorithm for payload encryption and decryption. Although this algorithm is somewhat less common than the ubiquitous Advanced Encryption Standard, ChaCha20-Poly1305 is the only other cipher algorithm besides AES-GCM supported for Transport Layer Security version 1.3. Where TLS 1.2 and earlier versions supported dozens of cipher suite combinations, TLS 1.3 follows a similar principal of simplicity, supporting less than ten options, with only two cipher algorithms.

Both AES-GCM and ChaCha20-Poly1305 support authenticated encryption which incorporates cryptographic integrity checking in the algorithm itself, without requiring additional processing. This feature provides a layer of protection against tampering that might otherwise allow some amount of data modification prior to decryption. The ChaCha20-Poly1305 algorithm uses a standard key length of 256 bits, avoiding the need to decide between different key sizes.

The native algorithm for asymmetric key agreement in the age specification is X25519, which provides strong security with short key pairs using elliptic curve cryptography. Although the age specification allows for extensible types of recipients, X25519 enables common public and private key encryption use cases without the historical concerns surrounding other algorithms.

As opposed to RSA keys, which can be up to several kilobytes in size, an encoded X25519 private key is only 74 characters long, while still providing strong security guarantees. The age specification uses a specialized format named Bech32 for encoding public and private keys, described in Bitcoin Enhancement Proposal 173. Although Bech32 differs from the popular Base64 encoding, Bech32 avoids the ambiguities present in Base64 strategies, and also incorporates a checksum that is useful for detecting malformed keys before performing cipher operations.

The difference between an age public key and private key is readily apparent from a visual inspection.

All age X25519 public keys begin with age1 and consist of lowercase characters. All age X25519 private keys begin with AGE-SECRET-KEY-1 and consist of uppercase characters. The Bech32 format restricts the set of valid characters. Taken together, these qualities provide unambiguous definition and straightforward processing.

Strict Encoding

In addition to mandating specific cipher and key agreement algorithms, the age specification defines strict rules for encoding header and payload information. The standard encoding requires an ASCII header with recipient information and a binary payload consisting of segments containing no more than 64 kilobytes each. The age specification also supports strict PEM encoding with standard header and footer lines bounding a Base64 content section.

The header portion of a file encrypted using age contains a message authentication code that implementations must validate prior to payload processing. These strict encoding rules enable robust parsing logic that is capable of detecting tampering prior to decryption processing. The segmentation strategy for encrypted payloads enables streamed processing of large messages without unnecessary buffering.

These encoding properties avoid the need for complex parsing that can be difficult to implement and maintain. The age specification maintainers also created an extensive set of community cryptographic test vectors that implementations can use to verify correctness in various positive and negative processing scenarios.

Processor Overview

Building on the age encryption specification, the EncryptContentAge and DecryptContentAge Processors provide modern streaming encryption with straightforward configuration.

EncryptContentAge Processor

The EncryptContentAge Processor is capable of encrypting input FlowFiles to one or more recipients using standard properties. The Processor also supports externalizing recipient configuration so that public keys can be provided using files or HTTP resources. Based on the default configuration, the Processor is ready to use after providing a public key.

The EncryptContentAge Processor supports the following configuration properties:

The Processor requires either Public Key Recipients or Public Key Recipient Resources depending on the value selected for the Public Key Source property.

File Encoding

The File Encoding property supports either BINARY or ASCII for the selected value. The default BINARY setting provides the best performance without the overhead of Base64 output encoding, but the ASCII option is available for specialized use cases where binary representation is not suitable. Base64 encoding for ASCII increases the size of encrypted files, so it is not ideal for large files. The ASCII option writes encrypted files using strict PEM encoding with header and footer lines.

Public Key Source

The Public Key Source property supports either PROPERTIES or RESOURCES for providing public key recipient information. With PROPERTIES set as the default value, the Public Key Recipients property supports configuring and storing recipients within the flow configuration. The RESOURCES setting enables the Public Key Recipient Resources property, which supports one or more paths or URLs for external storage and retrieval of age public keys.

Public Key Recipients

The Public Key Recipients property, required in the default configuration, expects one or more X25519 age public keys to be configured. According to the standard encoding, each public key must begin with age1 and must be limited to Bech32 encoded characters. Multiple public keys can be specified with newline separators. Although the name itself suggests that public keys do not need additional protection, the Public Key Recipients property is sensitive so that the application will encrypt values for storage within the flow configuration.

Protecting access to public keys can provide an additional layer of data assurance as described in age and Authenticated Encryption from Filippo Valsorda, author of the age specification and Go reference implementation. Limiting access to age public keys is not necessary for general use cases, but the Processor handles the property as a sensitive value to support various flow designs.

Public Key Recipient Resources

The Public Key Recipient Resources property is required when the Public Key Source is configured with the RESOURCES value. The property supports one or more file paths or HTTP URLs, which avoids persisting public keys in the flow configuration. This approach provides a layer of flexibility for use cases where public key recipients must be managed outside the flow configuration. Multiple file paths or URLs can be specified with comma separators. Following the same approach as the Public Key Recipients property value, files referenced as resources must use newlines to separate multiple public keys. Following the pattern of the --recipients argument for the age command, files can contain commented lines that will be ignored when parsing.

DecryptContentAge Processor

The DecryptContentAge Processor supports decrypting input FlowFiles with one or more configured private key identities. Following the pattern of the EncryptContentAge Processor, DecryptContentAge supports reading private key identities from external locations. The Processor can decrypt files encoded with either binary or ASCII formatting, performing content detection based on the initial bytes of each stream.

The DecryptContentAge Processor supports the following configuration properties:

The Private Key Source property determines whether the Private Key Identities or Private Key Identity Resources property is required.

Private Key Source

The Private Key Source property supports either PROPERTIES or RESOURCES as the source of private key identities. The PROPERTIES setting is the default value, requiring one or more private keys to be specified using the Private Key Identities property. The RESOURCES property value enables the Private Key Identity Resources property, supporting one or more paths to external resources containing age private keys.

Private Key Identities

The Private Key Identities property expects one or more X25519 age private keys, with multiple keys separated using newlines. Each private key line must begin with AGE-SECRET-KEY-1 and contain valid Bech32 encoding. The property is required when the Private Key Source value is set to PROPERTIES, which is the default setting.

The DecryptContentAge Processor attempts to read encrypted file keys using each private key identity until it finds a successful match. As a result of this processing strategy, configuring multiple private keys incurs a marginal cost. The processing cost is minimal for small numbers of keys. Supporting multiple private keys enables key rotation in coordination with senders responsible for encryption.

Private Key Identity Resources

The Private Key Identity Resources property depends on selecting RESOURCES as the value of the Private Key Source property. Similar to the Public Key Recipient Resources property on the EncryptContentAge Processor, this property supports one or more file paths or HTTP URLs.

Using the Private Key Identity Resources property enables use cases such as external secrets mounted as files in containerized deployments. Retrieving private keys over HTTP presents potential security concerns, some of which can be mitigated through the use of transport encryption with HTTPS or limiting access using a localhost connection.

Flow Design

Implementing a secure flow design requires more than configuring encryption and decryption processors. It is essential to consider the source and protection of private keys, as well as potential attack vectors based on inputs and outputs. Encrypting information for external recipients presents less of a concern based on the nature of public keys, but such flow designs must consider repository configuration and retention policies for data provenance.

Data flows designed to receive and decrypt files should consider incorporating message verification before and after decryption to avoid processing untrusted information. This is less of a concern for environments with limited access to public keys, but it highlights the fact that the age specification is focused on file encryption as opposed to digital signature verification.

Key Pair Generation

Encryption and decryption processing requires an X25519 key pair. The age-keygen command provides the standard method for generating encoded X25519 public and private keys. The command can be installed on most modern operating systems using common package managers as described in the installation section of the age reference implementation.

For demonstration purposes, the Age Tool is a browser-based interface that uses WebAssembly to run age operations without executing code on the hosting server. The interface supports generating age key pairs, as well as encryption and decryption operations.

Key Sources

Protected access to private key information is essential to maintaining a secure configuration. When configuring keys as property values, the NiFi framework encrypts sensitive property values and stores the encrypted bytes in the local flow configuration. The security of sensitive property values depends on protecting the NiFi sensitive properties key stored in application properties.

For abstracted and reusable configuration values, NiFi Parameter Contexts support defining sensitive values and referencing those values in component properties. The framework uses the same local encryption approach for handling sensitive parameters and sensitive property values. The framework also supports integrating with external configuration services through Parameter Providers. Standard Parameter Providers include support for various secrets management services. Integration with an external secrets management solution simplifies the steps required for rotating keys.

Property Verification

Both EncryptContentAge and DecryptContentAge Processors support verification for configured properties. The standard property validation ensures that public keys start with age1 and private keys start with AGE-SECRET-KEY-1. Property validation also requires that both keys contain the expected number of Bech32 characters, lowercase for public keys, and uppercase for private keys.

In addition to property validation, the NiFi user interface supports interactive property verification. This feature allows flow designers to run additional manual checks to confirm configuration settings. For EncryptContentAge and DecryptContentAge, manual verification ensures that the system can read the supplied key using Bech32 decoding and create required X25519 key objects. Bech32 decoding evaluates the trailing checksum for each configured key. Loading X25519 key objects ensures that the configured Java Security Provider supports the required algorithms.

Manual verification is not necessary for successful configuration, but it provides helpful diagnostic information prior to attempting encryption or decryption operations. The verification process also reports the number of public keys for EncryptContentAge and the number of private keys for DecryptContentAge, providing confirmation of expected settings when using multiple keys.

Transfer Considerations

Although the age encryption specification uses modern algorithms, flow designs performing encryption and transfer should constrain additional information that could be sent along with encrypted files. For example, sending an encrypted file over HTTP involves some number of request headers. Basic information such as content length is not a concern, but sending other headers that indicate the original filename, content type, or other identifying information may weaken the overall security of the system. The same considerations apply to event messaging or object storage solutions. Some amount of identification is often necessary for tracking, but it is important to consider the full scope of information when designing flows with file encryption.

Conclusion

The EncryptContentAge and DecryptContentAge Processors in Apache NiFi present notable improvements over previous solutions. Building on an open standard avoids tight coupling to custom solutions. Supporting selected modern algorithms reduces the potential for misconfiguration or content manipulation. Providing scalable streaming enables data flows to handle files from kilobytes to gigabytes without complicated segmentation strategies. With NiFi Processors supporting the age encryption standard, data flow engineers can build scalable, reliable, and secure processing solutions.