Backward Compatible Content Decryption in Apache NiFi
Background
RSA Laboratories published version 1.5 of Public Key Cryptography Standard number 5 in 1993, outlining a set of strategies for password-based encryption. RFC 2898 codified PKCS 5 version 2.0, incorporating new key derivation functions in the year 2000. Seventeen years later, RFC 8018 obsoleted RFC 2898 and added new hash functions for PKCS 5 version 2.1. After three decades, the PKCS 5 standard includes a combination of modern algorithms and historical encryption schemes. Password Based Encryption Scheme 1, described in RFC 8018 Section 6.1 should not be used for new encryption operations, but supporting PBES1 for decryption enables access to information protected using legacy strategies.
Introduction
Apache NiFi 0.5.0 and following have supported a number of algorithms and encryption strategies using the
EncryptContent
Processor. With an extensive set of configurable properties, EncryptContent
supports either encryption or decryption
using password-based encryption or hexadecimal-encoded keys. Supported options include weak cipher algorithms and
unsafe approaches to key derivation. The Processor is also capable of modern authenticated encryption with strong
key derivation functions such as Argon2. One of the drawbacks, however, is that
EncryptContent
uses binary formatting that is specific to the Processor. In order to maintain backward compatibility
with historical algorithms and specialized formatting, NiFi 1.20.0 introduced the
DecryptContentCompatibility
and
DecryptContent
Processors. These Processors enable streamlined and selective decryption strategies for existing data flows and
historical data sources.
Component Capabilities
Both DecryptContent
and DecryptContentCompatibility
support a simplified number of properties as compared to the
EncryptContent
Processor. Selecting the appropriate Processor depends on the format of incoming files. The new
Processors implement different file processing strategies and different algorithms to provide a clearer distinction
between legacy encryption schemes and custom binary formatting.
At a basic level, DecryptContentCompatibility
supports legacy password-based encryption schemes with algorithms and
key derivation functions that do not follow current cryptographic best practices. The DecryptContentCompatiblity
Processor enables interoperation with historical formats, but should not be used for new data flow solutions.
The DecryptContent
Processor uses the
Advanced Encryption Standard algorithm with configurable
cipher modes as well as modern key derivation functions. DecryptContent
is suitable for integrating with other Apache
NiFi systems using the EncryptContent
Processor, but DecryptContent
is not capable of reading information encrypted
using other services. The
EncryptContentPGP
and
DecryptContentPGP
Processors can support interoperable exchanges using the OpenPGP standard.
Both DecryptContent
and DecryptContentCompatibility
Processors include documentation with additional details that
provides a mapping from EncryptContent
property values to new property values.
DecryptContentCompatibility Processor
As the name implies, the DecryptContentCompatibility
Processor is designed for the purpose of compatibility with
legacy algorithms and historical formats. The Processor supports Password-Based Encryption Scheme 1 standards defined
in RFC 8018 Section 6.1 using a configurable strategy that
specifies how to read a cryptographic salt from binary input files.
The Processor supports the following properties:
Encryption Scheme
Key Derivation Strategy
Password
The Processor requires values for each property.
Versions of Java prior to 1.8.0-161 required the export-controlled Unlimited Strength Java Cryptography Extension Policy to use longer passwords with certain algorithms. Java 1.8.0-161 and following do not have this limitation.
Encryption Schemes
The Encryption Scheme
property supports several options from PBES1 as well as other options that EncryptContent
supported. The property values follow a standard convention consisting of a standard prefix indicating Password-Based
Encryption. The next element in the scheme defines the hash algorithm for key derivation. The final elements define
the cipher algorithm and key size in bits. For example, PBE_WITH_MD5_AND_DES
describes password-based encryption with
the MD5
hash algorithm for key derivation and the deprecated
Data Encryption Standard cipher algorithm for decryption.
Although supported encryption schemes include options with AES_CBC
, the Cipher-Block-Chaining mode of operation
can be subject to timing attacks. AES-CBC does not provide integrity protection, which one of the reasons service
providers such as Cloudflare have
highlighted the decline AES-CBC
for Transport Layer Security communication.
Aside from potential attacks, the primary problem with supported AES-CBC encryption schemes is the use of weak key derivation methods. Although SHA-256 is considered a strong hashing algorithm, it does not provide sufficient security for encryption key derivation. The relative speed of computing an SHA-256 hash increases the possibility of brute force attacks, which is a more serious concern as processing power continues to grow.
With backward compatibility being the primary focus of the DecryptContentCompatibility
Processor, the
additional details
documentation includes a table that maps new Encryption Scheme
property values to corresponding Encryption Algorithm
settings from the EncryptContent
Processor. For example, MD5_AES256
in EncryptContent
corresponds to
PBE_WITH_MD5_AND_AES_CBC_256
in DecryptContentCompatibility
. The new property values provide closer association
to the internal algorithm names and also incorporate additional cipher mode information.
Key Derivation Strategies
The Key Derivation Strategy
property controls how the Processor generates the symmetric key for decryption. The
property also influences how the Processor parses the cryptographic salt from incoming binary files. The available
strategies align with historical capabilities in EncryptContent
and other sources.
The OPENSSL_EVP_BYTES_TO_KEY
option is a reference to the OpenSSL
Envelope BytesToKey method of key derivation. As noted
in the documentation, the function is compatible with PKCS 5 version 1.5 when using the MD5
hash algorithm. The
function expects to read cryptographic salt bytes after a conventional header consisting of eight ASCII character
bytes. When the Processor does not find Salted__
in the first eight bytes, the OPENSSL_EVL_BYTES_TO_KEY
option
derives the decryption key without a salt. The OpenSSL method generates a symmetric key using a single iteration of the
configured hash algorithm.
The JASYPT_STANDARD
option is a reference to the Jasypt library for Java Simplified
Encryption. This JASYPT_STANRDARD
option is equivalent to the NiFi Legacy KDF
setting in the EncryptContent
Processor. The Jasypt method derives from the default implementation in
org.jasypt.encryption.pbe.StandardPBEByteEncryptor.
The Jasypt encryptor class determined the cryptographic salt length based on the block size of the selected cipher
algorithm, and used 1000 iterations of the specified hash algorithm. The JASYPT_STANDARD
option implements the same
approach. When configured with an Encryption Scheme that uses the AES cipher algorithm, the Jasypt strategy uses a salt
length of 16 bytes. The DecryptContentCompatibility
Processor reads the first 16 bytes of input to derive the
decryption key and then transforms the remaining bytes using the configured cipher algorithm.
DecryptContent Processor
The DecryptContent
Processor supports decryption using either a raw key or a password-derived key. Unlike the
DecryptContentCompatibility
Processor, DecryptContent
supports modern key derivation and cipher algorithms,
including AES-GCM, which incorporates integrity checking. The DecryptContent
Processor provides a streamlined set of
configuration properties based on the ability to select the required key derivation function according to the binary
header information from incoming files.
The Processor supports the following properties:
Cipher Algorithm Mode
Cipher Algorithm Padding
Key Specification Format
Key Specification
Cipher Algorithm Mode and Padding
The Cipher Algorithm Mode
and Cipher Algorithm Padding
properties control the behavior of the AES cipher
transformation.
The DecryptContent
Processor includes documentation with
additional details
describing the mapping between the EncryptContent
properties and DecryptContent
properties. For example, the
AES_GCM
setting for Encryption Algorithm
in EncryptContent
corresponds to GCM
for the
Cipher Algorithm Mode
and NoPadding
for Cipher Algorithm Padding
in the DecryptContent
Processor.
In practice, the PKCS5Padding
option for Cipher Algorithm Padding
is applicable only to the CBC
mode, which
corresponds to AES_CBC
for Encryption Algorithm
in the EncryptContent
Processor.
It is important to note that selecting the wrong mode and padding will result in failures and unexpected content when attempting to decrypt files with settings that do not match the original encryption.
Key Specification and Format Properties
The Key Specification Format
instructs the Processor how to handle the Key Specification
property value.
The default PASSWORD
value for Key Specification Format
indicates that the associated Key Specification
value
should be handled as a password and used to derive the symmetric key for decryption. The binary header of each incoming
file determines the key derivation function and parameters for generating the symmetric key. Following the
implementation from the EncryptContent
Processor, the derived key will have a length of 128 bits.
The RAW
value for Key Specification Format
instructs the Processor to decode the Key Specification
value as a
hexadecimal string. DecryptContent
will use the decoded bytes as the symmetric key and will not attempt to derive a
key. This corresponds to the Raw Key
property on the EncryptContent
Processor.
Component Implementation
The DecryptContentCompatibility
Processor uses standard Java cryptography interfaces together with the
Bouncy Castle Security Provider to implement password-based encryption. The Processor also
includes basic binary processing for each of the key derivation strategies.
The DecryptContent
Processor also uses standard Java cryptography interfaces for cipher operations, but includes an
additional set of components to process structured header information.
Structured Binary Header
The DecryptContent
Processor requires incoming FlowFiles to be structured according to project specifications
implemented in Apache NiFi 0.5.0. The internal structure uses standard delimiters to indicate a required cryptographic
initialization vector and a
cryptographic salt, which is required for password-based key
derivation. The initialization vector delimiter consists of the ASCII character bytes for NiFiIV
and the salt
delimiter consists of the ASCII character bytes for NiFiSALT
within the first 256 bytes of incoming FlowFiles.
For content encrypted using a raw key, the binary header does not include a salt. In this format, the first 16 bytes
of the incoming file contain the IV, followed by the NiFiIV
delimiter and the encrypted binary payload.
For content encrypted using a password, the binary header begins with a variable length salt header, followed by the
NiFiSALT
delimiter. After the salt delimiter, the file follows the standard structure consisting of an IV, NiFiIV
delimiter, and encrypted binary payload.
The length and structure of the salt header depends on the key derivation function that EncryptContent
used during the
original encryption process. The use of distinct formatting for each key derivation function enables DecryptContent
to
determine the necessary function and parameters.
Supported Key Derivation Functions
The DecryptContent
Processor supports the following key derivation functions as implemented in Apache NiFi 0.5.0 and
subsequent versions:
Each function algorithm supports different types of configurable parameters that influence the amount of resources
necessary to derive a symmetric key. The supported functions correspond to the allowable values for the
Key Derivation Function
in the EncryptContent
Processor.
Argon2 Header
The Argon2 implementation follows the recommendation of
RFC 9106 Section 7.4 and uses the Argon2id
hybrid variant. The
implementation follows RFC 9106 Section 1 and expects Argon2 version
1.3 to be specified in the salt header. As implemented for the EncryptContent
Processor, the Argon2 salt header uses
the Password Hashing Competition string format
to indicate the required time cost, memory cost, and parallelism cost parameter values, along with the random
cryptographic salt bytes encoded in Base64.
The Argon2 header consists of ASCII character bytes at the beginning of encrypted FlowFiles.
$argon2id$v=19$m=65536,t=3,p=1$QXJnb24yU2FsdFN0cmluZw
The header contains the following parameter values:
- Variant:
argon2id
- Version:
19
- Memory:
65536
- Time:
3
- Parallelism:
1
- Salt:
QXJnb24yU2FsdFN0cmluZw
bcrypt Header
The bcrypt implementation builds on the password hashing algorithm designed for OpenBSD. The key derivation process applies an SHA-512 digest to the bcrypt hash and selects the first 16 bytes as the derived symmetric key.
The bcrypt header consists of ASCII character bytes that include a specified work factor.
$2a$12$R9h/cIPz0gi.URNNX3kh2O
The header contains the following parameter values:
- Version:
2a
- Cost:
12
- Salt:
R9h/cIPz0gi.URNNX3kh2O
The OpenBSD bcrypt hash uses Base64 for the salt bytes, but uses an alphabet that does not follow RFC 4648, requiring an alternative decoder to read the bytes.
PBKDF2 Header
The PBKDF2 implementation follows the standard outlined in
RFC 8018 Section 5.2 using HMAC-SHA-512 as the pseudo-random
function. The binary header does not include the number of iterations required, so the DecryptContent
Processor relies
on the historical default value of 160,000. The PBKDF2 header does not include any parameter information and
contains an array of raw bytes.
scrypt Header
The scrypt implementation is based on RFC 7914, which defines three
configurable parameters. A Java implementation of scrypt from Will Glozer defined a
modified version of the modular crypt format that encodes configurable
parameters as a 32-bit hexadecimal integer. The DecryptContent
Processor uses bit shifting to decode the cost,
block size, and parallelization parameters.
The scrypt header consists of ASCII character bytes with the parameters and salt bytes.
$s0$e0801$epIxT/h6HbbwHaehFnh/bw
The header contains the following parameter values:
- Version:
s0
- Encoded Parameters:
e0801
- Decoded Cost:
16384
- Decoded Block Size:
8
- Decoded Parallelization:
1
- Salt:
epIxT/h6HbbwHaehFnh/bw
Key Derivation Implementation
The key derivation components in DecryptContent
use the Bouncy Castle library to perform
specific algorithm operations. This approach provides a solid foundation for supporting advanced algorithms while
maintaining a minimal layer of header parsing to determine the required parameters. The
nifi-security-crypto-key
module contains the interfaces and classes that support header parsing and key derivation.
Flow Definitions
Apache NiFi Jira issue NIFI-11022 covered the initial implementation
of both new decryption Processors. GitHub Pull Request 6821 included a
number of unit tests, and NIFI-11022 also included a
Flow Definition using
EncryptContent
together with DecryptContent
or DecryptContentCompatibility
to illustrate different use cases.
The included Process Groups have different algorithms configured to exercise a number of possible scenarios. This
Flow Definition also provides useful examples when considering potential migration paths from historical algorithms to
better alternatives.
Conclusion
The new DecryptContent
and DecryptContentCompatibility
Processors do not introduce new features as compared to what
the EncryptContent
Processor supports, however, these Processors provide a transition plan for flows that depend on
historical algorithms or formats specific to Apache NiFi. Data with a short lifespan is easier to transition to new
protocols, but the nature of technology often requires backward compatibility for years. The new decryption Processors
provides some concrete examples of backward compatible solutions.