ExceptionFactory

Producing content that a reasonable developer might want to read

Investigation Questions for Evaluating Java Libraries

Dependencies Security Programming

2025-08-25 • 9 minute read • David Handermann

Introduction

Writing software is rarely a de novo activity. The final product may be novel, but it is often a composite of existing code. The Java platform provides a strong foundation for numerous use cases, and the ecosystem of open source software available from Maven Central makes it easy to create solutions from component parts. The relative simplicity of declaring dependencies using Apache Maven or Gradle can hide inherent complexity, leading to glossing over essential questions for selecting Java libraries. Considering several investigation questions before building with a new library promotes both security and maintainability throughout the software development lifecycle. Asking and answering five basic questions during the development process can avoid more challenging questions during the postmortem review.

Who are the maintainers?

Software development is both an art and a science, placing a great responsibility on the author for successful implementation. The prevalence and ease of access to development and distribution platforms makes it easy for anyone to share their library with the world, providing the means for both established and aspiring engineers to make notable contributions.

This ease of access can also mask the parties responsible for certain libraries. Determining whether a given piece of software is the property of an individual, the product of a company, or the project of a foundation is an important question. Answering the maintainer question bears on both current stability and future reliability. The provenance of a project should not be the sole criteria for adoption or rejection, but the people behind the code can make or break a software library in several ways.

CVE-2024-3094 is a recent example of a vulnerability in the XZ Utils project where lack of available maintainer cycles opened the door for a threat actor to gain maintainer privileges, leading to the publication of malicious binaries. Aside from direct manipulation, the maintainers of a project shape the direction, and control the boundaries of acceptable changes. Understanding the general maintenance strategy for a project is an important criteria for evaluating the relative stability and security of any library.

The Apache Software Foundation and the Eclipse Foundation are prominent examples of open source organizations with standard policies and procedures governing associated projects. Companies large and small release applications and frameworks for open source consumption. GitHub and GitLab enable individuals to build robust development and distribution pipelines. Popular libraries such as Jackson and SnakeYAML highlight the success of individual contributors in building de facto standards for common features. Foundation backing does not guarantee success, and single authorship does not lead to failure, but differentiating core maintainers from occasional contributors provides a window into the health of a project.

For Java libraries published to Maven Central, the developers section of the Project Object Model provides a standard point of reference for maintainer attribution. Linked project pages often provide additional background. Reviewing project maintainer information is an important step to selecting stable dependencies.

What is the license?

Aside from the code itself, the license associated with a software project is one of the most important factors of library selection. Although a library may provide desired functionality, licensing may prove to be the deciding factor for inclusion or exclusion. There are many licensing flavors, and particular licenses may have multiple versions, so precise identification is essential.

Most software libraries include a file named LICENSE as part of binary and source distributions. The Maven Project Object Model includes a licenses section that declares the name and URL of one or more licenses for the associated project. The Software Package Data Exchange maintains list of licenses with standard short identifiers that provide clear linking to associated licenses. The SPDX ID provides a concise and unambiguous reference for license identification.

The SPDX list includes several status indicators, including whether the Open Source Initiative has approved the license according to its review process for validating community expectations. Starting with a license from the list of OSI Approved Licenses provides a stable basis for further evaluation.

After license identification, it is necessary to understand the associated terms and conditions before building with a particular library. This is an important area where the intended use of the library drives the decision process. Corporate legal policy often sets the boundaries of acceptable licenses. The Apache Software Foundation maintains a 3rd Party License Policy that specifies license categories that are approved or acceptable under certain conditions for use in Apache projects.

At a general level, copyleft licenses such as the GNU General Public License mandate certain publication requirements when building on licensed libraries, whereas other more permissive licenses allow for redistribution or monetization. Obtaining professional legal guidance may be necessary depending on the library license and the intended distribution strategy for new capabilities. Identifying and understanding the license associated with each direct and transitive dependency of a project is an essential part of the library selection process.

When was the latest version released?

Release cadence is an important measure of relative momentum, reflecting the level of maintenance energy in a project. Many factors impact software release cycles, from proactive planning to reactive vulnerability resolution. The development phase of a project can also influence how often maintainers release new versions, with feature additions prompting frequent releases, and minor adjustments requiring infrequent updates. The latest release of a project serves as a useful indicator of potential future changes. Starting with the latest version and reviewing the release history, along with versioning strategy, can provide some insights into library stability.

The Maven Central Repository provides a Versions section for every library, indicating the date published for each version available in the repository. Although the project.build.outputTimestamp is a standard property in the Maven Project Object Model, it is not required. Projects in source repositories such as GitHub have timestamps associated with every commit and tag, providing a definitive date for version releases.

As a relative indicator, evaluating the date published for the latest version should be considered on a sliding scale. A date within the last week may be coincidental, or it may indicate a pattern of frequent releases. A date within a number of months is a positive sign of recent maintenance, whereas a date within a number of years could be reason for concern. Every project requires some amount of regular maintenance to keep up with new versions of Java, new versions of build tools, or new versions of dependencies. Not all of these items require releasing a new version, and some small libraries implementing specific algorithms may not need direct updates.

Reviewing version history provides a window into project versioning strategy. The Semantic Versioning specification is a common pattern, providing a basic approach to incrementing major and minor versions, based on breaking changes or incremental improvements. Although many projects follow a major and minor version convention, adherence to Semantic Versioning should not be assumed. Other projects reach production stability with 0 as the major version. It is less common for Java libraries, but Calendar Versioning provides a different approach that uses dates as versions, avoiding the potential for confusion related to major version changes.

Regardless of the versioning strategy, determining the date of last release and the general release cadence provide important background for shaping expectations around future releases.

Where is the source code?

Source code availability is the sine qua non of open source software. Reliable access to the source for a Java library is essential for development, troubleshooting, and maintenance.

The Software Configuration Management section of the Maven Project Object Model provides the canonical location for library source code. The url field provides a link for browser-based access, while the connection and developerConnection fields define both the remote location and the version control system.

GitHub and GitLab are among of the most prominent hosting platforms for public source code, but others such as Bitbucket from Atlassian, Azure DevOps from Microsoft, and SourceForge are also common options. Open source software foundations such as Apache also host source code repositories. Libraries published to Maven Central can include source packages for download.

Regardless of the hosting platform, direct access to source code is an important resource for software design. API documentation is useful, but does not always contain sufficient details. Whether confirming expected behavior or tracking down specific error conditions, searchable source code is a powerful tool. Although some bugs may be hiding in plain site, public source code enables researchers to find and report potential vulnerabilities. Security through obscurity presents a surface appeal when it comes to software behavior, but security through availability provides a much stronger foundation.

Why is the library necessary?

Perhaps the most important question to answer during evaluation is: why is the library necessary? This includes considering why the library exists and why the library is needed in the context of a particular project. The question of necessity cuts in multiple directions, requiring careful consideration from different angles.

Many libraries exist to provide clear solutions to common problems. Handling the nuances of socket communication or structured data parsing can be challenging and tedious, so popular libraries support standards such as HTTP and JSON, enabling software engineers to focus on other concerns. At the same time, other libraries take shape as prototypical experiments, not suitable for extension or integration.

In the world of Java, choosing between multiple libraries providing the same basic set of features is often needed. Popular application frameworks such as Spring and Quarkus often support multiple libraries for common purposes, leaving the decision to the integrating engineer. In this situation, comparing available implementations based on the current set of investigation questions should help narrow down the options. At times, frameworks have more opinionated recommendations, requiring certain fundamental choices for features such as abstracted logging.

In some areas, open source options are more limited. Bouncy Castle and Google Tink are examples of specialized features that either do not exist, or do not have solid implementations elsewhere. Libraries with direct security implications require more detailed analysis to avoid more subtle vulnerabilities.

The most basic question of necessity relates to capabilities available in the Java platform itself. As new Java versions introduce additional features, some convenience libraries become less necessary. Libraries such as Apache Commons Lang and Google Guava provide numerous useful utilities, but including a dependency for a few utility methods may be unnecessary. Components such as the Java HttpClient do not have every single feature of advanced HTTP libraries, but may be sufficient for common use cases. Considering library features in light of current platform capabilities is key.

Conclusion

Although not exhaustive, answering the preceding questions of who, what, when, where, and why as part of evaluating a Java library is an important part of the software engineering process. Rather than chasing new libraries for project inclusion, careful consideration is vital to project health and security. Scanning software abounds for reviewing direct and transitive dependencies, and answering factual questions about particular libraries is straightforward. Answering the motivation behind selecting a library for inclusion is more difficult for machines, as it connects directly to the reasons for software engineering work. Regardless of available tooling, asking and answering common questions for each library considered for project inclusion is foundational to engineering lasting solutions.