John Mikael Lindbakk

View Original

The joy of self-updating dependencies

As developers, we've constantly faced the choice between reusing existing solutions or writing them ourselves. Most of the time, we end up reusing something already existing, be it an ORM library, a web framework or something else. Most developers work on projects with a whole bunch of dependencies.

Where I currently work, we have an initialiser inspired by the spring initializr - a project generator. It is designed to work within our solution and have sensible defaults for the things we most commonly need. It is also designed not to contain anything more than we need. By default, it comes with:

  • Test libraries to run unit, system and integration tests.
  • Web framework (Spring Boot, which also comes with DI and a bunch of other things).
  • An ORM framework.
  • Logging.
  • Libraries to support generating clients and interfaces based on OpenAPI.
  • Database schema versioning (Flyway).

The above should not sound like too much, but just the above results in 177(!) dependencies, most of which are transient. That is a hell of a lot of "other people's code" we're running! Furthermore, that is just the "by default" setup - it will likely only go up when developers build their solutions.

The cost of reusing code

Reusing code is a great thing. For one, it is less code that a developer needs to write. Why reinvent the wheel, right? Reusing code allows us to focus on delivering value to our users and give businesses a better shot at staying profitable. Throw in an open-source license, and it's a match made in heaven!

Most seasoned devs reading the previous paragraph will likely think, "Well... sure... in theory, that is correct..." - and I agree. The above is only a perfect theoretical scenario. The reality is much more complicated and messy than so.

The truth is that we need to keep our dependencies up to date. If we do not, we will run into problems later on. For example, an old dependency version might not work with a new language version or cause conflicts with other libraries (often indirectly through transitive dependencies). I have encountered these issues countless times, and t has been a joyless experience every single time.

In fact, it becomes harder to update dependencies the more outdated they are. Skipping a bunch of versions in one go means we, indirectly, introduce a bunch of code changes - most of which are invisible to us unless we actually read and understand each line of code that has changed in the dependency we're updating.

The only reasonable conclusion I can think of is that updating dependencies should be continuous, and outdated dependencies are technical debt.

The cost doesn't just come from dependencies being outdated but also dependencies being abandoned by their original creator. I've seen systems stuck on ancient language versions and on outdated (and insecure) frameworks/libraries because some core library was abandoned. The team were busy working on their own stuff, so they didn't notice or notice but did not do anything (for whatever reason). So the result of at least one of these projects was that the cost of simply greenfielding a replacement became cheaper than updating the project itself.

Auto-updating dependencies

All systems I've developed over the last few years have shared one feature: They've all patched themselves with new versions of dependencies.

Around 99% of all dependencies in these systems have been automatically updated and put into production without developer intervention. Some developers look at me in horror when I mention this, so let's go through how this works in practice.

Do note that I'm not implying this is the only viable approach. It's just an approach I've discovered and one that seems to check most of my boxes.

It all starts with a new version of some dependency being available. I've been mostly working in Grail, but this approach would work in any dependency management system, like PIP, Maven, NPM, Cargo, or whatever.

Once this new version exists somewhere, then Dependabot or Renovate picks up that change and issues a pull request.

Once there's a pull request, a new GitHub action triggers, which only triggers when, for example, Depenabot issues a PR. This workflow looks like this:

  1. Checks out the branch
  2. Builds the code
  3. Runs unit, system and integration tests
    • If tests pass: Merge PR automatically
    • If tests fail: Add relevant developers to the PR for manual intervention

It is important to mention that to do this in a safe way we need a test suit we can really trust. That means we need good system and integration tests that really cover a lot of the codebase (and indirectly our dependencies). We won't write tests that target our dependencies, but our tests will rely on the dependencies doing their job.

Once a PR is merged, it'll be put into a test environment and eventually automatically pushed to production, where it'll first exist as canary releases and be automatically monitored that way. Developers will be notified if they show signs of a defect, and the new version won't replace the old version of our service.

The trick here is protective infrastructure combined with comprehensive tests.

The joy

I've talked a lot about the process so far, so let's dig into why this actually matter.

The biggest benefit is that we're always up-to-date. For example, Spring is a framework we use a lot. A while back, it had a vulnerability - as most large frameworks have from time to time. So what's the cool thing? All the systems with automated dependency updates were already patched - and they had done so before we even knew about the vulnerability!

Security is a great argument, but so is the one where developers are only bothered with dependency upgrades when they actually need to make a change. Most dependencies can be upgraded without issues - maintainers are really good at keeping things working and deserve a lot of praise. Occasionally there's a dependency where they've deprecated some feature that we're using, or maybe they fixed a bug we accidentally relied on or some other change that caused incompatibilities. It is completely normal, and it will happen eventually.

With self-updating dependencies, we don't need to worry about all the versions that can just be upgraded without further thought. They just happen. Developers are only notified when there's something that doesn't work.

As an added benefit: If a larger portion of the solution operates with auto-upgrading dependencies, then they'll most likely encounter the same problem at the same time, which means we don't have to "rediscover" solutions when every team hits the same issue at different times as we'd normally would. Instead, we can identify and share a solution with all the teams/developers.

Self-updating dependencies in highly secure codebases

One counterargument to this entire approach is for codebases that must be secure. We are talking about industries such as banking, healthcare and the military. These are industries where you can't just pop in a new dependency without knowing what it does. For all we know, a foreign entity might have bought up some obscure transitive dependency and changed it to suit their own agenda.

That a dependency going rouge is not some far-fetched dystopian idea either, we have NPM packages that come with crypto miners](https://www.bleepingcomputer.com/news/security/npm-packages-posing-as-speed-testers-install-crypto-miners-instead/). We have examples of developers breaking things by simply removing their packages. It is not unlikely that this will become an even more common thing going forward.

The big question, therefore, becomes: Can we have automated dependency upgrades in such critical applications? I believe so, yes - but we must change our process.

It is not uncommon for critical industries to store their dependencies in their own package repository - heck, it is not uncommon for businesses, in general, to do so. We can utilise that to our benefit.

Before we update the dependency in our repositories, we first need to evaluate it (as we should, right?). Instead, we run it through whatever security and regulatory tests we need to ensure it is safe. Once we're confident that the dependency is okay, then we put it into our internal package repository - once that happens, the systems using that dependency can automatically pick it up and start updating themselves.

By using this approach, we can ensure that we meet industry regulations and whatever security standards we set for ourselves while also leveraging automation.

This approach can also enable such organisations to move quicker. Let's say that a security vulnerability is discovered, but they still need to go through the regulatory process. At least then, you have some guarantee that systems will be updated once a new version is out - and it will happen automatically. By automating dependency upgrades we're enabling a more effective form of distribution.

Risks

Of course, nothing is perfect, and there's a risk to any form of automation. Security vulnerabilities introduced in newer versions of dependencies will be patched in, and so will whatever code malicious actors inject into the dependencies we use.

We have to be much more critical about the dependencies we introduce once we use automated dependency updates. We need to be confident that they will respond to discovered vulnerabilities in a timely fashion, and we need to trust that they're not bad actors.

I'd argue that the above should be true regardless of automation.

The above are the most common counterarguments I've heard; They're valid, but not compared to how we do things today. Most developers bump the version number today, check that their application works, and call it a day. There's actually very little verification that would discover bad actors or security vulnerabilities.

Another point is that a dependency might accidentally break the system in some way, but that should not happen if one has the adequate testing and protective infrastructure in place. If something breaks, it should be things like logging - stuff that don't affect the users and won't result in a phone call at night after deployment.

Wrapping it up

I believe that automated dependency updates should be standard practice. Getting to that point requires investment in automated testing and the infrastructure, but it is definitely worth it. So even if one can't use automated dependency updates due to regulatory, company policy or whatever reason, the system should be solid enough that one would have been comfortable turning on such automation if one could.

The goal of this post isn't to reduce the importance of adopt-vs-build. Instead, self-updating dependencies should come with more vetting of the code we adopt and the people/organisations behind them. So we're reducing the work required to maintain external code, but it comes at the tradeoff where we need to focus on who code we're putting into our codebases and whether we can trust that code. I'd argue we're putting the focus where it should be.

Regardless whether you actually end up adding self-updating dependencies as a technique isn't really the goal of this post either. The goal is to communicate that our codebases and processes' quality should be good enough to use practices such as auto-updating dependencies.