Restructuring Apache NiFi Support for OpenPGP
Background
The OpenPGP specification has its roots in the Pretty Good Privacy software program released in 1991. As a public standard for encryption and digital signatures, OpenPGP supports secure communication across multiple platforms. As of this writing, RFC 4880 represents the current official specification for message formatting and supported algorithms. With its technological history, OpenPGP has accumulated both capabilities and criticisms, but ongoing support continues in a number of programming languages and applications. See Surveying Pretty Good Privacy After Three Decades for additional background on both the technical specification and various implementations.
Original Implementation
Support for OpenPGP in Apache NiFi goes back to version 0.1.0, which added encryption and decryption capabilities to the EncryptContent Processor. NiFi leverages the Bouncy Castle cryptographic library to handle all OpenPGP processing functions. Aside from the introduction of a configurable symmetric cipher property in NiFi 1.10.0, OpenPGP support has remained largely unchanged since the original implementation.
Design Considerations
Although the EncryptContent
Processor supports a number of OpenPGP encryption and decryption features, it also
includes a handful of latent issues. Aside from overloading the Encryption Algorithm
property to include PGP
and PGP_ASCII_ARMOR
as supported values, configuring the Processor requires selecting a different combination of
properties depending on whether it is intended to perform encryption or decryption. As a result of the initial design,
the Processor also lacks support for alternative public key algorithms. Although NiFi 1.10.0 included support for
configurable symmetric cipher algorithms, the internal encryption process uses Zip compression as the default setting.
Performance and Error Handling
The original implementation suffered from poor performance in some scenarios due to loading and searching keyring files for every invocation of the Processor. The flexibility of the OpenPGP format together with the Processor design complicated failure handling, resulting in a large variety of potential error conditions. Lack of logging and discarding of some exceptions also made troubleshooting more difficult than necessary. These concerns presented serious maintenance challenges and slowed progress on potential enhancements.
Redesigned Components
Issues with the original implementation and the scope of requested features presented an opportunity for a new approach to OpenPGP processing. At a fundamental level, public-key cryptography requires different inputs for different operations: a public key for encryption and a private key for decryption. Although OpenPGP also supports password-based encryption, requiring a shared secret for encryption and decryption, splitting the cipher operations into separate components provides a much clearer indication of the necessary properties. NiFi provides a high degree of flexibility when it comes to both flow design and component implementation, so developing with the right level of abstraction is important when it comes to supporting composable processing capabilities.
In addition to splitting encryption and decryption operations, abstracting the retrieval of public and private keys also provides a helpful separation between loading required resources and processing files. Through the use of Controller Services, loading keys for encryption or decryption becomes both a generalized concern and an opportunity for reusing common resources. A combination of these concepts provided the basis for a new solution with new components.
New Features
Before walking through the implementation details, it is worth highlighting a number of new and improved features. In comparison to the original implementation, the new Processors and Controller Services incorporate the following capabilities:
- Configure public or private keyrings as component properties using ASCII Armor formatting
- Configure ElGamal public or private keys for encryption and decryption
- Configure individual public or private keys or keyrings with multiple entries
- Configure hexadecimal key identifier search for public key encryption
- Configure compression algorithm applied prior to encryption
- Write algorithm names and identifiers as FlowFile attributes following encryption
- Write algorithm identifiers and file metadata as FlowFile attributes following decryption
New Processors
NiFi 1.14.0 includes two new Processors supporting OpenPGP messages:
EncryptContentPGP
and
DecryptContentPGP
. The EncryptContentPGP
Processor includes and requires several properties while the DecryptContentPGP
Processor is
capable of parsing OpenPGP messages to determine format details. Each Processor depends on a corresponding Controller
Service to handle messages using public-key cryptography. The Processors do not require Controller Services to perform
password-based encryption and decryption.
New Controller Services
As part the refactored implementation, NiFi 1.14.0 defines two new Controller Service interfaces: PGPPublicKeyService
and PGPPrivateKeyService
. Implementing these interfaces,
StandardPGPPublicKeyService
provides support for EncryptContentPGP
and
StandardPGPPrivateKeyService
provides support for DecryptContentPGP
. Abstracting configuration and retrieval of public and private keys streamlined
the testing process for both Processors and allowed greater configuration flexibility for the Controller Services.
Encryption Properties
EncryptContentPGP
includes several configurable properties with default values to control encryption processing. The
Processor requires the following properties and specifies the corresponding default values:
- Symmetric-Key Algorithm:
AES_256
- Compression Algorithm:
ZIP
- File Encoding:
BINARY
These default properties are equivalent to the following properties from the EncryptContent
Processor:
- Encryption Method:
PGP
- PGP Symmetric Cipher:
AES_256
Symmetric-Key Algorithm Configuration
The Symmetric-Key Algorithm
property in EncryptContentPGP
supports a subset of available OpenPGP encryption
algorithms and does not include the following options due to proven or potential cipher weaknesses:
Supported symmetric-key algorithms include AES and Camellia with key sizes of 128, 192, or 256 bits. The OpenPGP implementation of AES operates in Cipher Feedback mode, which does not incorporate integrity checking associated with other modes such as GCM or CCM.
The default property value of AES_256
uses the largest available key size together with the most widely supported
symmetric-key algorithm. The EncryptContentPGP
Processor also includes
the Modification Detection Code Packet on all encrypted
messages to support basic integrity checking.
Compression Algorithm Configuration
The Compression Algorithm
property in EncryptContentPGP
defaults to the common ZIP
algorithm, and also supports
new options not previously available in the EncryptContent
Processor. These additional compression algorithm options
include:
Although the OpenPGP specification indicates that implementations should compress messages prior to encryption, there
are different perspectives regarding the relative security of using compression with encryption. The default ZIP
option provides a high level of compatibility, while using UNCOMPRESSED
may be a better option for smaller automated
messages. For larger messages, BZIP2
can provide more effective compression at the expense of greater CPU usage during
compression.
File Encoding Configuration
The File Encoding
property in EncryptContentPGP
defaults to BINARY
and also supports Base64-encoded ASCII
, also
known as ASCII Armor. Binary encoding is much more efficient in terms of message size, but ASCII encoding supports
transfer over textual protocols such as SMTP.
Password-Based Configuration
The Passphrase
property in EncryptContentPGP
specifies the string source for protecting messages using
password-based encryption. The Bouncy Castle implementation of OpenPGP uses the
Iterated and Salted String-to-Key function for deriving
an encryption key from the configured passphrase. EncryptContentPGP
uses
the SHA-1 hash function and leverages the Bouncy Castle default setting of 65536
iterations with a random salt of eight bytes. As with any password-based algorithm, the strength of the encryption rests
ultimately on the length and complexity of the passphrase.
Public Key Configuration
In contrast to the original EncryptContent
implementation, EncryptContentPGP
delegates public key retrieval to a
configurable Controller Service. When configured with a value for Public Key Service
, the EncryptContentPGP
Processor also requires a value for the Public Key Search
property. With this approach, the configured service can
reference a keyring containing multiple public keys, and the processor must be configured to select a specific public
key for encryption operations.
The StandardPGPublicKeyService
supports configuring a file path reference using the Keyring File
property, or
providing the contents of a public key encoded using ASCII Armor in the Keyring
property. Public keys encoded using
ASCII Armor contain multiple Base64 lines with a standard header:
-----BEGIN PGP PUBLIC KEY BLOCK-----
Public Key Search
Interpretation of the Public Key Search
property depends on the configured Public Key Service
. The
StandardPGPPublicKeyService
supports searching
the User ID packet of each public key, matching on name,
email address, or the combined user identifier string. The standard service implementation also supports matching
against the numeric key identifier when the Public Key Search
is configured with 16 hexadecimal characters.
The OpenPGP public key for the NiFi Security email address provides an example of potential configuration options for
the Public Key Search
property. The public key is available for download
from keys.openpgp.org. For the purposes of the following
examples, the public key should be saved to a file named public.key.asc
.
The following GNU Privacy Guard command can be used to print the packet information contained in a
file named public.key.asc
:
gpg --list-packets public.key.asc
The command output displays the contents of each packet in a separate section. The first packet contains the public key identifier and the second packet contains the user identifier:
# off=0 ctb=c6 tag=6 hlen=3 plen=525 new-ctb
:public key packet:
version 4, algo 1, created 1490292491, expires 0
pkey[0]: [4096 bits]
pkey[1]: [17 bits]
keyid: AFF2B36823B944E9
# off=528 ctb=cd tag=13 hlen=2 plen=47 new-ctb
:user ID packet: "Apache NiFi Security <security@nifi.apache.org>"
The keyid
field of the public key packet
provides the hexadecimal representation that can be specified in the
Public Key Search
property of EncryptContentPGP
. Configuring the Processor with AFF2B36823B944E9
in Public Key Search
requires an exact match against the key identifier, avoiding any potential ambiguity related to
the user identifier.
The user ID packet
contains both the name and email address that can also be specified in the Public Key Search
property. Using the full user identifier of Apache NiFi Security <security@nifi.apache.org>
in Public Key Search
provides the most precise approach to matching against the user identifier. The service also supports partial matching
using the name or email address. Configuring the full user identifier or email address avoids potential unexpected
matches when the supplied keyring contains multiple entries.
Decryption Properties
Configuring content decryption is much simpler than encryption since OpenPGP messages contain all the necessary
algorithm information. DecryptContentPGP
provides two configurable properties: Passphrase
and Private Key Service
.
The Passphrase
property supports password-based encryption and the Private Key Service
property supports public key
encryption.
OpenPGP messages indicate the type of encryption strategy, and messages encrypted using a public key include the
associated key identifier. With this information, DecryptContentPGP
attempts to read messages using configured
properties. When processing public-key encrypted messages, DecryptContentPGP
searches for matching private keys based
on the key identifier listed in the message itself.
Private Key Configuration
Similar to the public key service implementation, the StandardPGPPrivateKeyService
supports configuring a file path
using Keyring File
, or providing the ASCII-encoded contents of a private key in the Keyring
property. The service
requires the Key Password
property to read private keys. The service is capable of reading multiple private keys from
configured properties as long as all private keys have the same password.
Flow Configuration
A full consideration of NiFi flow design with OpenPGP components in beyond the scope of the current discussion, but
describing some basic examples provides a starting point for integration. Both EncryptContentPGP
and DecryptContentPGP
include the de facto standard success
and failure
routing relationships.
When configuring DecryptContentPGP
, it is important to note that it does not incorporate digital signature
verification. For this reason, content entering the Processor should not be considered trusted without some other means
of authenticating the data source. Although EncryptContentPGP
includes a modification detection code, it does not sign
messages. With these caveats, the new Processors can support a number of use cases, and the new Controller Services can
be leveraged for additional development efforts related to signing and verification.
Interoperation with GNU Privacy Guard
GNU Privacy Guard provides several capabilities necessary for building a functional flow using public key encryption. Leveraging GPG for key generation and initial processing demonstrates the capabilities of the new NiFi components as well as other potential integration options.
Key Pair Generation
For the purpose of demonstrating public key encryption, the first step is obtaining a public and private key. Generating
a key pair requires a user identifier. Most user identifiers consist of a name and an email address. The following
command generates a key pair with nifi-flow
as the user identifier using the default GPG algorithm:
gpg --quick-generate-key nifi-flow
The command prompts for a passphrase to protect the private key and also sets an expiration based on the default GPG configuration. The command output includes the hexadecimal key identifier as well as the key fingerprint, expiration, and algorithm:
gpg: key 19FE266F132D8430 marked as ultimately trusted
public and secret key created and signed.
pub rsa3072 2021-09-14 [SC] [expires: 2023-09-14]
2CCAE4781C90BBFDCB830EB719FE266F132D8430
uid nifi-flow
sub rsa3072 2021-09-14 [E]
As indicated in the output, the key identifier is 19FE266F132D8430
, the user identifier is nifi-flow
, and the key
algorithm is RSA with a size of 3072 bits. The command stores the generated key pair in the default GPG keyring of the
user running the command.
Exporting Public Keys
In order for NiFi to perform encryption operations, it is necessary to export the public key from the GPG keyring. The
following command exports the public key to a file named nifi-flow.public.key
encoded using ASCII Armor:
gpg --export --armor nifi-flow > /tmp/nifi-flow.public.key
The file contains Base64-encoded lines with standard header and footer lines indicating the PGP public key contents.
Exporting Private Keys
NiFi requires a private key to decrypt OpenPGP messages. The following command exports the private key to a file
named nifi-flow.private.key
encoded using ASCII Armor and protected using the provided passphrase:
gpg --export-secret-keys --armor nifi-flow > /tmp/nifi-flow.private.key
Although the file is protected with a passphrase, read permissions on the file should be restricted to the NiFi user.
Encrypting Files using GPG
To provide an encrypted input file for testing the DecryptContentPGP
Processor, run the following command to generate
a file containing a random string:
uuidgen > /tmp/generated
Run the following GPG command to encrypt the file using the nifi-flow
public key with ASCII Armor encoding:
gpg --encrypt --armor --recipient nifi-flow /tmp/generated
The GPG command leaves the input file unchanged and creates a new encrypted file with the following name:
/tmp/generated.asc
Decrypting Files using DecryptContentPGP
The DecryptContentPGP
Processor requires an input relationship for processing, and the GetFile
Processor provides a
simple method for reading input files.
Configure a GetFile
Processor with the following properties and values:
- Input Directory
- /tmp
- Path Filter
- generated.asc
- Recurse Subdirectories
- false
Configure a StandardPGPPrivateKeyService
Controller Service with the following properties and values, substituting
KEY PASSWORD with the passphrase entered during key pair generation:
- Keyring File
- /tmp/nifi-flow.private.key
- Key Password
- KEY PASSWORD
Enable the Controller Service after entering the required properties.
Configure a DecryptContentPGP
Processor with the following properties:
- Private Key Service
- StandardPGPPrivateKeyService
To log attributes after decryption processing, configure a LogAttribute
Processor with the success
relationship
selected for automatic termination. The Bulletin Level
option should be set to INFO
. This LogAttribute
configuration is useful for the purposes of demonstration, but should not be used for production flows.
After configuring each Processor, connect the GetFile
relationship named success
to DecryptContentPGP
, and connect
both the success
and failure
relationships from DecryptContentPGP
to LogAttribute
.
Start the DecryptContentPGP
and LogAttribute
Processors, and use the Run Once
option on GetFile
to trigger
processing. The LogAttribute
Processor should generate a Bulletin Board entry that includes the standard FlowFile
attributes as well as the following OpenPGP attributes after successful decryption:
- pgp.literal.data.filename
- pgp.literal.data.modified
- pgp.symmetric.key.algorithm.id
The pgp.literal.data.filename
attribute contains the name of file prior to encryption. The pgp.literal.data.modified
attribute contains the timestamp in milliseconds when the file was encrypted. The pgp.symmetric.key.algorithm.id
contains the numeric identifier of
the Symmetric-Key Algorithm that the originator used for
encryption.
Encrypting Files using EncryptContentPGP
The EncryptContentPGP
Processor requires an input relationship, and the GenerateFlowFile
Processor is a convenient
option for producing files.
Configure a GenerateFlowFile
Processor with the following properties and values, including a custom filename
property:
- Custom Text
- Lorem ipsum dolor sit amet
- filename
- generate-flow-file
Configure a StandardPGPPublicKeyService
Controller Service with the following properties:
- Keyring File
- /tmp/nifi-flow.public.key
Enable the Controller Service after entering the required properties.
Configure an EncryptContentPGP
Processor with the following properties:
- Public Key Service
- StandardPGPPublicKeyService
- Public Key Search
- nifi-flow
Configure a LogAttribute
Processor with the Bulletin Level option set to INFO
.
Configure a PutFile
Processor with the following properties, and select both the success
and failure
relationships
for automatic termination:
- Directory
- /tmp
Connect the GenerateFlowFile
relationship named success
to EncryptContentPGP
, and connect both the success
and failure
relationships from EncryptContentPGP
to LogAttribute
. Connect the success
relationship
from LogAttribute
to PutFile
.
Start the EncryptContentPGP
and LogAttribute
Processors along with PutFile
, then use the Run Once
option
on GenerateFlowFile
to trigger processing. The LogAttribute
Processor should generate a Bulletin Board entry that
includes the following OpenPGP attributes after successful encryption:
- pgp.compression.algorithm
- pgp.compression.algorithm.id
- pgp.file.encoding
- pgp.symmetric.key.algorithm
- pgp.symmetric.key.algorithm.block.cipher
- pgp.symmetric.key.algorithm.id
- pgp.symmetric.key.algorithm.key.size
The PutFile
Processor should write an encrypted file to the following location:
/tmp/generate-flow-file
Decrypting Files using GPG
Run the following command to decrypt and display the contents of the file that NiFi encrypted using EncryptContentPGP
:
gpg --decrypt /tmp/generate-flow-file
The GPG command should prompt for the passphrase entered when generating the key pair, and then print the following
information along with the content entered in GenerateFlowFile
:
gpg: encrypted with 3072-bit RSA key, ID 19FE266F132D8430, created 2021-09-14
"nifi-flow"
Lorem ipsum dolor sit amet
Removing Generated Keys
Removing keys from the internal GPG keyring requires running several commands. After completing the interoperation steps described, the following command can be used to remove the generated private key:
gpg --delete-secret-keys nifi-flow
Press y
when prompted to confirm deletion of the selected key. After removing the private key, the following command
can be used to delete the corresponding public key:
gpg --delete-keys nifi-flow
Press y
when prompted to confirm deletion. Run the following command to confirm removal of the generated private key:
gpg --list-secret-keys
Conclusion
The new OpenPGP components released in NiFi 1.14.0 bring improvements to both flow configuration and processing
capabilities. The EncryptContent
Processor remains functional, but existing flows should be migrated to use
EncryptContentPGP
and DecryptContentPGP
for all OpenPGP message handling. The new Controller Services supporting
these Processors provide optimized access to public and private keys, avoiding configuration issues inherent in
the EncryptContent
design. More development effort is necessary to implement message signing and verification, but the
new Controller Services provide a starting point for future work. Building on the foundation of the Bouncy Castle
library, NiFi can support interoperation with a variety of OpenPGP applications.