ExceptionFactory

Producing content that a reasonable developer might want to read

Backward Compatible Content Decryption in Apache NiFi

NiFi Security Encryption

2023-02-20 • 11 minute read • David Handermann

Background

RSA Laboratories published version 1.5 of Public Key Cryptography Standard number 5 in 1993, outlining a set of strategies for password-based encryption. RFC 2898 codified PKCS 5 version 2.0, incorporating new key derivation functions in the year 2000. Seventeen years later, RFC 8018 obsoleted RFC 2898 and added new hash functions for PKCS 5 version 2.1. After three decades, the PKCS 5 standard includes a combination of modern algorithms and historical encryption schemes. Password Based Encryption Scheme 1, described in RFC 8018 Section 6.1 should not be used for new encryption operations, but supporting PBES1 for decryption enables access to information protected using legacy strategies.

Introduction

Apache NiFi 0.5.0 and following have supported a number of algorithms and encryption strategies using the EncryptContent Processor. With an extensive set of configurable properties, EncryptContent supports either encryption or decryption using password-based encryption or hexadecimal-encoded keys. Supported options include weak cipher algorithms and unsafe approaches to key derivation. The Processor is also capable of modern authenticated encryption with strong key derivation functions such as Argon2. One of the drawbacks, however, is that EncryptContent uses binary formatting that is specific to the Processor. In order to maintain backward compatibility with historical algorithms and specialized formatting, NiFi 1.20.0 introduced the DecryptContentCompatibility and DecryptContent Processors. These Processors enable streamlined and selective decryption strategies for existing data flows and historical data sources.

Component Capabilities

Both DecryptContent and DecryptContentCompatibility support a simplified number of properties as compared to the EncryptContent Processor. Selecting the appropriate Processor depends on the format of incoming files. The new Processors implement different file processing strategies and different algorithms to provide a clearer distinction between legacy encryption schemes and custom binary formatting.

At a basic level, DecryptContentCompatibility supports legacy password-based encryption schemes with algorithms and key derivation functions that do not follow current cryptographic best practices. The DecryptContentCompatiblity Processor enables interoperation with historical formats, but should not be used for new data flow solutions.

The DecryptContent Processor uses the Advanced Encryption Standard algorithm with configurable cipher modes as well as modern key derivation functions. DecryptContent is suitable for integrating with other Apache NiFi systems using the EncryptContent Processor, but DecryptContent is not capable of reading information encrypted using other services. The EncryptContentPGP and DecryptContentPGP Processors can support interoperable exchanges using the OpenPGP standard.

Both DecryptContent and DecryptContentCompatibility Processors include documentation with additional details that provides a mapping from EncryptContent property values to new property values.

DecryptContentCompatibility Processor

As the name implies, the DecryptContentCompatibility Processor is designed for the purpose of compatibility with legacy algorithms and historical formats. The Processor supports Password-Based Encryption Scheme 1 standards defined in RFC 8018 Section 6.1 using a configurable strategy that specifies how to read a cryptographic salt from binary input files.

The Processor supports the following properties:

The Processor requires values for each property.

Versions of Java prior to 1.8.0-161 required the export-controlled Unlimited Strength Java Cryptography Extension Policy to use longer passwords with certain algorithms. Java 1.8.0-161 and following do not have this limitation.

Encryption Schemes

The Encryption Scheme property supports several options from PBES1 as well as other options that EncryptContent supported. The property values follow a standard convention consisting of a standard prefix indicating Password-Based Encryption. The next element in the scheme defines the hash algorithm for key derivation. The final elements define the cipher algorithm and key size in bits. For example, PBE_WITH_MD5_AND_DES describes password-based encryption with the MD5 hash algorithm for key derivation and the deprecated Data Encryption Standard cipher algorithm for decryption.

Although supported encryption schemes include options with AES_CBC, the Cipher-Block-Chaining mode of operation can be subject to timing attacks. AES-CBC does not provide integrity protection, which one of the reasons service providers such as Cloudflare have highlighted the decline AES-CBC for Transport Layer Security communication.

Aside from potential attacks, the primary problem with supported AES-CBC encryption schemes is the use of weak key derivation methods. Although SHA-256 is considered a strong hashing algorithm, it does not provide sufficient security for encryption key derivation. The relative speed of computing an SHA-256 hash increases the possibility of brute force attacks, which is a more serious concern as processing power continues to grow.

With backward compatibility being the primary focus of the DecryptContentCompatibility Processor, the additional details documentation includes a table that maps new Encryption Scheme property values to corresponding Encryption Algorithm settings from the EncryptContent Processor. For example, MD5_AES256 in EncryptContent corresponds to PBE_WITH_MD5_AND_AES_CBC_256 in DecryptContentCompatibility. The new property values provide closer association to the internal algorithm names and also incorporate additional cipher mode information.

Key Derivation Strategies

The Key Derivation Strategy property controls how the Processor generates the symmetric key for decryption. The property also influences how the Processor parses the cryptographic salt from incoming binary files. The available strategies align with historical capabilities in EncryptContent and other sources.

The OPENSSL_EVP_BYTES_TO_KEY option is a reference to the OpenSSL Envelope BytesToKey method of key derivation. As noted in the documentation, the function is compatible with PKCS 5 version 1.5 when using the MD5 hash algorithm. The function expects to read cryptographic salt bytes after a conventional header consisting of eight ASCII character bytes. When the Processor does not find Salted__ in the first eight bytes, the OPENSSL_EVL_BYTES_TO_KEY option derives the decryption key without a salt. The OpenSSL method generates a symmetric key using a single iteration of the configured hash algorithm.

The JASYPT_STANDARD option is a reference to the Jasypt library for Java Simplified Encryption. This JASYPT_STANRDARD option is equivalent to the NiFi Legacy KDF setting in the EncryptContent Processor. The Jasypt method derives from the default implementation in org.jasypt.encryption.pbe.StandardPBEByteEncryptor. The Jasypt encryptor class determined the cryptographic salt length based on the block size of the selected cipher algorithm, and used 1000 iterations of the specified hash algorithm. The JASYPT_STANDARD option implements the same approach. When configured with an Encryption Scheme that uses the AES cipher algorithm, the Jasypt strategy uses a salt length of 16 bytes. The DecryptContentCompatibility Processor reads the first 16 bytes of input to derive the decryption key and then transforms the remaining bytes using the configured cipher algorithm.

DecryptContent Processor

The DecryptContent Processor supports decryption using either a raw key or a password-derived key. Unlike the DecryptContentCompatibility Processor, DecryptContent supports modern key derivation and cipher algorithms, including AES-GCM, which incorporates integrity checking. The DecryptContent Processor provides a streamlined set of configuration properties based on the ability to select the required key derivation function according to the binary header information from incoming files.

The Processor supports the following properties:

Cipher Algorithm Mode and Padding

The Cipher Algorithm Mode and Cipher Algorithm Padding properties control the behavior of the AES cipher transformation.

The DecryptContent Processor includes documentation with additional details describing the mapping between the EncryptContent properties and DecryptContent properties. For example, the AES_GCM setting for Encryption Algorithm in EncryptContent corresponds to GCM for the Cipher Algorithm Mode and NoPadding for Cipher Algorithm Padding in the DecryptContent Processor.

In practice, the PKCS5Padding option for Cipher Algorithm Padding is applicable only to the CBC mode, which corresponds to AES_CBC for Encryption Algorithm in the EncryptContent Processor.

It is important to note that selecting the wrong mode and padding will result in failures and unexpected content when attempting to decrypt files with settings that do not match the original encryption.

Key Specification and Format Properties

The Key Specification Format instructs the Processor how to handle the Key Specification property value.

The default PASSWORD value for Key Specification Format indicates that the associated Key Specification value should be handled as a password and used to derive the symmetric key for decryption. The binary header of each incoming file determines the key derivation function and parameters for generating the symmetric key. Following the implementation from the EncryptContent Processor, the derived key will have a length of 128 bits.

The RAW value for Key Specification Format instructs the Processor to decode the Key Specification value as a hexadecimal string. DecryptContent will use the decoded bytes as the symmetric key and will not attempt to derive a key. This corresponds to the Raw Key property on the EncryptContent Processor.

Component Implementation

The DecryptContentCompatibility Processor uses standard Java cryptography interfaces together with the Bouncy Castle Security Provider to implement password-based encryption. The Processor also includes basic binary processing for each of the key derivation strategies.

The DecryptContent Processor also uses standard Java cryptography interfaces for cipher operations, but includes an additional set of components to process structured header information.

Structured Binary Header

The DecryptContent Processor requires incoming FlowFiles to be structured according to project specifications implemented in Apache NiFi 0.5.0. The internal structure uses standard delimiters to indicate a required cryptographic initialization vector and a cryptographic salt, which is required for password-based key derivation. The initialization vector delimiter consists of the ASCII character bytes for NiFiIV and the salt delimiter consists of the ASCII character bytes for NiFiSALT within the first 256 bytes of incoming FlowFiles.

For content encrypted using a raw key, the binary header does not include a salt. In this format, the first 16 bytes of the incoming file contain the IV, followed by the NiFiIV delimiter and the encrypted binary payload.

For content encrypted using a password, the binary header begins with a variable length salt header, followed by the NiFiSALT delimiter. After the salt delimiter, the file follows the standard structure consisting of an IV, NiFiIV delimiter, and encrypted binary payload.

The length and structure of the salt header depends on the key derivation function that EncryptContent used during the original encryption process. The use of distinct formatting for each key derivation function enables DecryptContent to determine the necessary function and parameters.

Supported Key Derivation Functions

The DecryptContent Processor supports the following key derivation functions as implemented in Apache NiFi 0.5.0 and subsequent versions:

Each function algorithm supports different types of configurable parameters that influence the amount of resources necessary to derive a symmetric key. The supported functions correspond to the allowable values for the Key Derivation Function in the EncryptContent Processor.

Argon2 Header

The Argon2 implementation follows the recommendation of RFC 9106 Section 7.4 and uses the Argon2id hybrid variant. The implementation follows RFC 9106 Section 1 and expects Argon2 version 1.3 to be specified in the salt header. As implemented for the EncryptContent Processor, the Argon2 salt header uses the Password Hashing Competition string format to indicate the required time cost, memory cost, and parallelism cost parameter values, along with the random cryptographic salt bytes encoded in Base64.

The Argon2 header consists of ASCII character bytes at the beginning of encrypted FlowFiles.

$argon2id$v=19$m=65536,t=3,p=1$QXJnb24yU2FsdFN0cmluZw

The header contains the following parameter values:

bcrypt Header

The bcrypt implementation builds on the password hashing algorithm designed for OpenBSD. The key derivation process applies an SHA-512 digest to the bcrypt hash and selects the first 16 bytes as the derived symmetric key.

The bcrypt header consists of ASCII character bytes that include a specified work factor.

$2a$12$R9h/cIPz0gi.URNNX3kh2O

The header contains the following parameter values:

The OpenBSD bcrypt hash uses Base64 for the salt bytes, but uses an alphabet that does not follow RFC 4648, requiring an alternative decoder to read the bytes.

PBKDF2 Header

The PBKDF2 implementation follows the standard outlined in RFC 8018 Section 5.2 using HMAC-SHA-512 as the pseudo-random function. The binary header does not include the number of iterations required, so the DecryptContent Processor relies on the historical default value of 160,000. The PBKDF2 header does not include any parameter information and contains an array of raw bytes.

scrypt Header

The scrypt implementation is based on RFC 7914, which defines three configurable parameters. A Java implementation of scrypt from Will Glozer defined a modified version of the modular crypt format that encodes configurable parameters as a 32-bit hexadecimal integer. The DecryptContent Processor uses bit shifting to decode the cost, block size, and parallelization parameters.

The scrypt header consists of ASCII character bytes with the parameters and salt bytes.

$s0$e0801$epIxT/h6HbbwHaehFnh/bw

The header contains the following parameter values:

Key Derivation Implementation

The key derivation components in DecryptContent use the Bouncy Castle library to perform specific algorithm operations. This approach provides a solid foundation for supporting advanced algorithms while maintaining a minimal layer of header parsing to determine the required parameters. The nifi-security-crypto-key module contains the interfaces and classes that support header parsing and key derivation.

Flow Definitions

Apache NiFi Jira issue NIFI-11022 covered the initial implementation of both new decryption Processors. GitHub Pull Request 6821 included a number of unit tests, and NIFI-11022 also included a Flow Definition using EncryptContent together with DecryptContent or DecryptContentCompatibility to illustrate different use cases. The included Process Groups have different algorithms configured to exercise a number of possible scenarios. This Flow Definition also provides useful examples when considering potential migration paths from historical algorithms to better alternatives.

Conclusion

The new DecryptContent and DecryptContentCompatibility Processors do not introduce new features as compared to what the EncryptContent Processor supports, however, these Processors provide a transition plan for flows that depend on historical algorithms or formats specific to Apache NiFi. Data with a short lifespan is easier to transition to new protocols, but the nature of technology often requires backward compatibility for years. The new decryption Processors provides some concrete examples of backward compatible solutions.