Feature Evolution Strategies for Apache NiFi Processors

2025-06-25 • 14 minute read • David Handermann

Introduction

Processors are the foundation for building data pipelines in Apache NiFi. As interfaces define contracts between collaborating classes, Processors define contracts for FlowFile handling. In Apache NiFi, FlowFiles are the basic unit of exchange between Processors, containing metadata attributes and providing a reference to content for processing. In addition to contractual requirements for metadata and content, Processors define configuration contracts through Property Descriptors and Relationships supporting multiple use cases based on specified settings and FlowFile routing decisions. Processors can also use the framework State Manager to implement specific behavior using persistent information.

Understanding these contract surfaces is essential during both initial implementation and subsequent development efforts. Considering the implications of particular changes, together with strategies for evolving Processor implementations, is critical to building and maintaining robust processing solutions using the NiFi framework.

Processor Interface Surfaces

The Apache NiFi API defines several methods on the Processor interface for configuration and FlowFile handling purposes. Through the implementation of these interface methods, each Processor class defines its own interface surface based on supported FlowFile structure, configurable properties, routable relationships, and in some cases, persistent state information. Behavior annotations can be used to define additional contractual expectations for Processors. Reviewing each of these interface surfaces, as related to Processor interface methods, provides the background necessary to evaluate strategies for introducing different types of features.

FlowFile Handling

The onTrigger method is the entry point for metadata and content handling in Processor classes. With access to the ProcessSession interface, Processors perform various FlowFile operations including creation, deletion, transformation, and routing. Scoping Processors to specific operations supports composable flow design, enabling multiple use cases.

FlowFile attributes serve as an extensible map of name-value pairs. Similar to HTTP headers, there are several reserved attribute names for core FlowFile properties, but names and values are otherwise open to definition within the context of one or more Processors. The NiFi framework provides the ReadsAttribute and WritesAttribute annotations as a means for Processors to declare expected behavior. These annotations do not influence framework handling, but they drive generated documentation, which is useful when designing NiFi flows. Reading or writing core attributes such as mime.type enables a standard method of content identification, analogous to the Content-Type header for HTTP requests and responses. Custom attributes can influence Processor behavior when providing information such as directories, record offsets, or timestamps.

The NiFi framework provides access to FlowFile content using java.io.InputStream and java.io.OutputStream abstractions. Most Processors use libraries to transform a stream of bytes into concrete structures, as the framework delegates content handling decisions to individual components. The stream abstractions are fundamental to NiFi processing scalability, generally requiring an iterative approach for processing structured information to avoid memory exhaustion. Processors that read the entire FlowFile content into memory should include maximum size restrictions to avoid exceeding available memory.

Property Descriptors

The PropertyDescriptor is one of the most prominent public-facing Processor interface surfaces. Declared using the getPropertyDescriptors method, or the more common getSupportedPropertyDescriptors method on abstract Processor classes, each item describes the shape of a configurable property. With the ability to describe required status, default values, enumerated options, and extensible validation, Processor implementations define the scope of supported configuration through standard and dynamic Property Descriptors. Property names express a public configuration contract that applies to both REST API and browser-based operations. Property validation enforces the boundaries of acceptable input values. Other Property Descriptor fields inform validation and expected property value handling.

Processor Relationships

The Relationship class defines the potential outputs for FlowFiles handled in a Processor. Relationships such as success and failure are a common convention for numerous NiFi Processors. With a configurable name and associated description, Relationships in a Processors can implement coarse-grained or fine-grained routing. Relationships represent an essential interface surface, enabling the construction of data flows with understandable processing semantics. The getRelationships method defines supported Relationships, as reflected in both configurable options and generated documentation.

Persistent State Information

Processors can declare integration with NiFi framework persistent state using the Stateful annotation. The Scope defined in the Stateful annotation instructs the framework how to read and write persistent information for an instance of the Processor. Selecting LOCAL indicates that each node in a NiFi cluster maintains its own version of state information. Specifying CLUSTER instructs the framework to provide persistent storage from a shared location to every node in a NiFi cluster. Many Processors do not need persistent state, when configuration properties and FlowFile information are sufficient to inform behavior. In some cases, however, Processors require persistent state to avoid repetitive operations. State information itself consists of Processor-defined names and values. Although persistent state tracking is a powerful feature of Apache NiFi, it also raises the implementation complexity as it requires conditional execution according to external information.

Compatible Interface Changes

Regular upgrades are essential to the maintenance of secure and stable software. With composability as a primary feature of Apache NiFi, compatibility across versions is a vital characteristic of individual components. The Apache NiFi project follows the Semantic Versioning Specification for public extension interfaces, and also applies the same principles to Processors and Controller Services.

The release of NiFi 2 demonstrated this commitment to compatibility through successive deprecation across milestone releases, introducing targeted breaking changes for selected components and features. Although the framework does not require extensions outside the project to follow the same conventions, applying best practices for compatible changes simplifies the upgrade process. Although new features can be introduced across multiple Processors, each Processor class stands alone in terms of interface surface definition.

Adapting FlowFile Attribute Processing

Implementing compatible FlowFile attribute processing starts with expecting nullable attribute values. Except for core framework attributes such as uuid and entryDate, the NiFi framework does not require the presence of attribute values. This means that method calls such FlowFile.getAttribute and NiFi Expression Language references must handle null results for any custom FlowFile attribute values.

After accounting for null values, FlowFile attribute processing must perform some amount of format validation. With all FlowFile attributes being stored and retrieved as strings, regular expression patterns provide a flexible and robust approach to validating the shape of expected values. Standard patterns should be defined as static final members of a Processor class, or inside collaborating classes responsibility for handling groups of FlowFile attributes.

private static final Pattern HASH_PATTERN = Pattern.compile("^[a-f0-9]{64}$");

Although structured values can be serialized to JSON and passed as FlowFile attributes, larger strings put pressure on Java heap memory. Reading and writing structured values also requires additional processing cycles for each Processor. Changing the shape of structured values in FlowFile attributes requires additional consideration when upgrading. Following the same approach as scalar FlowFile attributes, new fields in structured values must be nullable for compatible handling when upgrading. For JSON processing libraries such as Jackson, features such as ignoring unknown fields must be enabled for reading objects.

Each Processor that includes FlowFile attribute handling must employ one or more of these strategies to avoid breaking deployed pipelines when upgrading to a new version. Standard NiFi Processors such as RouteOnAttribute can provide some amount of input validation in front of other Processors, but each Processor remains responsible for its own set of expected FlowFile attribute values.

Evolving FlowFile Content Handling

Efficient FlowFile content handling with stream abstractions can make it difficult to implement support for new content types in an existing Processor. If the shape of new FlowFile content has significant differences from the original input expectations, there are several implementation paths, including type detection and attribute-based processing.

For content that is closer to initial design expectations, reading and evaluating content header information is an option that avoids introducing new configuration requirements. The java.io.PushbackInputStream can be used to wrap a FlowFile InputStream, read a number of initial bytes, and return the data for subsequent processing using one of several unread methods. This approach works for content that includes unique identification details within a certain number of initial bytes, such as compressed archives or binary content with standard headers.

try (
    InputStream flowFileStream = session.read(flowFile);
    PushbackInputStream stream = new PushbackInputStream(flowFileStream)
) {
    final byte[] header = stream.readNBytes(256);
    processHeader(header);
    stream.unread(header);
}

For content that has different structural characteristics, FlowFile attributes provide another strategy for alternative processing. The mime.type FlowFile attribute serves as the conventional standard for content identification, and defining custom values using subtypes is an extensible approach for introducing new content handling. Following the strategies described for adaptive FlowFile attribute processing enables an existing Processor to begin handling new content types without disrupting deployed pipelines when upgrading.

More complex use cases involve introducing new properties or collaborating Controller Services for delegated processing. Adding new properties requires manual configuration, but starting with defaults that maintain existing behavior enables upgrading without requiring immediate flow design changes.

Compatible Introduction for Property Descriptors

The concept of compatible introduction for Property Descriptors requires that new configuration surfaces maintain existing behavior without manual intervention. This principle is foundational to a release strategy built on frequent upgrades. For new Property Descriptors, this means assigning a default value that describes and retains original behavior. For changes to existing Property Descriptors, this means implementing lifecycle methods for renaming or rewriting the configuration to upconvert prior settings an in idempotent manner.

Implementing compatible property changes starts with scoping supported values. In some cases, a Boolean property with a value of true or false might appear sufficient, but more often than not, a third value may become necessary. In such situations, starting with a list of Allowable Values provides a more flexible path for future iteration, even if the initial options present a binary choice such as Enabled and Disabled.

Defining an enum that implements the DescribedValue interface provides strong typing for allowable property values using the name() method. The getDisplayName() method can be used to present the value using title case.

enum OutputStrategy implements DescribedValue {
    FLOW_FILE("FlowFile", "Write one FlowFile for each Record"),
    
    RECORD("Record", Write one FlowFile containing multiple Records");
    
    private final String displayName;
    
    private final String description;
    
    OutputStrategy(final String displayName, final String description) {
        this.displayName = displayName;
        this.description = description;
    }
    
    @Override
    public String getValue() {
        return name();
    }
    
    @Override
    public String getDisplayName() {
        return displayName;
    }
    
    @Override
    public String getDescription() {
        return description;
    }
}

In situations where a Boolean property is defined in the initial version of a Processor, defining a new Property Descriptor and adding a dependsOn relationship to the existing Boolean property provides a way to expand configuration options.

The dependsOn method of the PropertyDescriptor.Builder is a powerful feature for extending the configuration surface of a Processor without breaking deployed pipelines. New Property Descriptors that depend on specific values from other properties can be marked as required, making it clear that the new property is only required when the configuration satisfies the indicated dependencies. The InvokeHTTP Processor is one of the more flexible and complicated components in the standard NiFi distribution, but the use dependsOn in several properties narrows the configuration scope for some use cases.

Migrating Property Names and Values

Prior to the release of NiFi 2, the displayName field served as a method for adjusting the presentation of property names without changing the configuration contract. To provide better alignment between presentation and implementation, NiFi 2 added the migrateProperties method to the Processor interface. Using the provided PropertyConfiguration interface, implementing Processors can read, rename, or remove properties when the NiFi framework loads a flow configuration.

The renameProperty method takes the existing property name and renames it to the supplied new property name. Processors using this method should define the existing property name in a local or static string, and should use the getName() method of the new Property Descriptor to ensure alignment with supported Property Descriptors.

@Override
public void migrateProperties(final PropertyConfiguration config) {
    config.renameProperty("disable-http2", HTTP2_DISABLED.getName());
}

The removeProperty method takes an existing property name, returning true or false depending on whether the flow configuration contained the named property. Removing existing properties is useful when new Property Descriptors supersede current configuration. The removeProperty method can be used in conjunction with setProperty to translate equivalent property values when migration involves more than simple renaming.

@Override
public void migrateProperties(final PropertyConfiguration config) {
    config.removeProperty("proxy-type");
}

The createControllerService method supports more advanced use cases for moving features from a Processor to an external Controller Service class. Migrating features such as authentication or connection management are examples where moving from Processor properties to external Controller Services provides better decoupling for extensible feature development. The createControllerService method requires the class name of a specific Controller Service implementation. For this reason, the NiFi deployment must include the Controller Service implementation class for a successful migration.

Adding Processor Relationships

Changing Processor Relationships is limited to additive methods because removing a relationship would otherwise result in data loss. For this reason, Processors should define a minimum number of initial Relationships and only add a Relationship after exhausting other alternatives.

Before adding a new relationship, it is important to consider intended use cases. Although Relationships can support discrete failure handling, in many cases, FlowFile routing should follow the same path. In this situation, supplemental information such as error codes can be added as FlowFile attributes, without requiring a new Relationship. Documented error codes can support optional retry scenarios or selective alerting. FlowFile attributes should avoid exposing Java class information and should instead provide status independent of the programming language itself.

Migrating Relationship Configuration

The NiFi framework provides configurable retry with backoff for Relationships, without additional Processors, which is one advantage of specific Relationships for certain error conditions. For Processors that start out with conventional success and failure Relationships, introducing new Relationships requires implementing the migrateRelationships method and invoking the splitRelationship method on the provided RelationshipConfiguration interface. The splitRelationship method takes the current Relationship name as the first argument. Subsequent arguments can include the same Relationship and one or more new Relationships. Following an upgrade, the flow configuration includes the new Relationships, and the Processor can route to the new Relationships. This preserves existing FlowFile routing, and allows for alternative flow configuration using the new Relationships after upgrading.

@Override
public void migrateRelationships(final RelationshipConfiguration config) {
    config.splitRelationship(FAILURE.getName(), FAILURE.getName(), RETRYABLE.getName());
}

In cases where new Relationships support optional behavior, the Relationship.Builder provides the autoTerminateDefault method to indicate that the Relationship is terminated in the default configuration. Automatically terminating a Relationship instructs the NiFi framework to drop FlowFiles transferred to the Relationship. Setting this flag on a Relationship avoids manual flow configuration changes when upgrading, but it could cause data loss in Processors responsible for transformation and routing.

static final Relationship RETRYABLE = new Relationship.Builder()
    .name("retryable")
    .description("FlowFiles that encounter temporary processing failures")
    .autoTerminateDefault(true)
    .build();

Processors that set autoTerminateDefault on a new Relationship should also check isAutoTerminated on Relationships prior to making transfer decisions.

Adjusting Persistent State Information

The StateMap interface provides the primary abstraction for reading persistent information in stateful Processors. The StateMap functions as a schemaless storage facade within a NiFi node or cluster of nodes. State values must be stored as a string, requiring binary information to be encoded using strategies such as Base64. With state storage limited to 1 MiB per Processor in common providers, design decisions related to persistent information require careful consideration. Introducing changes to the structure of persistent information also involves thoughtful implementation.

Similar to FlowFile attribute handling, StateMap values must be considered nullable. The NiFi framework allows authorized users to clear state information for a stopped Processor, requiring Processors to handle null values on each invocation. Processors can introduce new named keys in subsequent versions, but it is essential to expect and handle null values. The getStateVersion of the StateMap interface allows state providers to implement optimistic updates.

Serializing state values as JSON is a common solution for associating multiple properties with a single key. Just as with FlowFile attributes, however, reading a JSON object requires ignoring unknown properties and expecting new fields to be null when deserializing previous values. Following the convention of other schemaless services, storing persistent information with a version field is one strategy for tracking changes to structured object representations.

Conclusion

Introducing progressive and compatible changes is a common challenge and frequent requirement in software engineering. Apache NiFi Processors not only implement framework interfaces but also define contractual interface surfaces using Property Descriptors, Relationships, and behavioral annotations. With an understanding of both common interface design principles and Processor interface surfaces, it is possible to extend initial capabilities without compromising deployed pipelines.

Adding new Processor features is often possible without increasing configuration complexity or requiring manual migration. In situations where current requirements have pushed the boundaries of an initial approach, awareness of available framework migration methods provides a path to seamless upgrades. Knowing and applying these strategies enables Processors to provide the level of adaptability necessary for deploying scalable data pipelines.