Definitions are hard. It would be so much easier to write something if I could rely on the reader just knowing something. I.e. what an integration test is. The issue is that I’ve seen many developers confuse integration tests with unit tests (because they were written in the form of a unit test), and I’ve seen high-level system tests been mistaken for integration tests but weren’t. Therefore, it is clear that before we do anything else, we must first define what an integration test is.
Wikipedia states the following:
“Integration testing (sometimes called integration and testing, abbreviated I&T) is the phase in software testing in which individual software modules are combined and tested as a group.”
I’m not too fond of this definition, because it implies that tests which mock dependencies to other systems are not integration tests, as that is not combining the systems as a group.
Guru99 has a similar issue with its definition:
“INTEGRATION TESTING is defined as a type of testing where software modules are integrated logically and tested as a group.”
The third result on google is an article by Martin Fowler where he brings the problem of defining integration tests, and he arrived at a conclusion which I agree with, that integration testing is a polysemy.
The essence of integration testing is to figure out that things are talking to each other correctly. We want to verify that the service contract is adhered to, but also interpreted correctly.
I recommend reading Martin Fowler's post as it goes more into detail about the differences between the two types of integration testing than I will here. Summarized we have two levels of integration testing:
Commonly, the broad integration tests are on a very high level written as black box styled tests. They trace the effects of a call throughout the overall solution to figure out that everything is talking together correctly. In the broad category, we are running entire databases with real services which transfer data to each other.
Narrow integration tests can seem a bit superfluous compared to broad integration tests. After all, don’t broad integration tests already test the communication between the various systems in the solutions? The answer is yes. If the goal is to verify that something is communicating correctly, it is better to test the actual systems running live and not mock the services. Note that this does not take away from narrow integration tests.
To determine the value of narrow integration tests, we must see where its broad counterpart falls short.
In comparison, narrow integration tests make up for the shortcomings of broad integration tests:
Like many things in software development, a practice cannot be determined in a vacuum, as techniques often make up for the downsides of other techniques. We see this happening with companies which only want to adopt parts of agile. They don’t get the full benefit because they implement just the parts they deem necessary without realizing that everything builds on something else. It is the same with narrow and broad integration tests.
To have a healthy test portfolio, we should have both broad and narrow integration tests, but their goals are slightly different:
To distil the differences in the goal of narrow and broad integration tests even further, we can say that:
Does broad integration tests verify the contract by having systems interact with it? Definitely, but it is not as detailed as with the narrow integration tests. The question is not necessarily whether or not the systems did the right thing; rather, it is whether or not the information was transferred and interpreted correctly.
Now that we have successfully managed to define what an integration test is, we can start talking about how even to write tests for our application. As the title of this post indicates, we will only focus on narrow integration tests and leave broad integration tests for another day.
Narrow integration tests can be viewed as I/O tests for our application, often written in the style of unit tests. So let’s define the most common I/O our systems tend to have:
NOTE
When we write narrow integration tests, we don’t usually want to test the input of our application. Our system’s input will be covered and defined by automated high-level functional tests and automated acceptance tests. Broad integration tests will also touch it. We are potentially looking at a bunch of duplicated tests, so we usually don’t want to write narrow integration tests for the input of our system and instead focus on data which our system requests.
Now that we know what we are supposed to test, we should then look at the rules our narrow integration tests should adhere to. A narrow integration test should:
A narrow integration test has the shape and form similar to a unit test, but it also has to deal with the service’s I/O in some way. Narrow integration tests use the standard unit test framework and are indistinguishable from regular unit tests except that it often requires a bit more set up to deal with whatever dependency that is used.
As with unit tests we often have one test class per class in our production code with a bunch of tests and the same goes for narrow integration tests, and we have the same for our narrow integration tests. Consider this class from the spring boots guides:
@SpringBootApplication
public class ConsumingRestApplication {
private static final Logger log = LoggerFactory.getLogger(ConsumingRestApplication.class);
public static void main(String[] args) {
SpringApplication.run(ConsumingRestApplication.class, args);
}
@Bean
public RestTemplate restTemplate(RestTemplateBuilder builder) {
return builder.build();
}
@Bean
public CommandLineRunner run(RestTemplate restTemplate) throws Exception {
return args -> {
Quote quote = restTemplate.getForObject(
"URL_TO_RANDOM_QUOTE_API", Quote.class);
log.info(quote.toString());
};
}
}
This might be a very simplistic example, but a narrow integration test for this code would be to verify the call to a random quote API.
In short, narrow integration tests are unit tests for the I/O, but they are distinguishable from unit tests because they deal with I/O. We should not make the mistake of calling narrow integration tests for unit tests even if they are written as such.
If the definition of a unit test is a test which tests a single aspect of the code, then the definition of a narrow integration test is a test which tests a single aspect of the code that interfaces with I/O in some way.
Maybe the most common component which one must figure out how to integration test must be the database. The good thing about databases is that they are very self-contained - they don’t rely on anything else. There are two ways of reliably do this:
This is my least favourite way of dealing with database dependencies, but it is a valid one considering that the wide integration tests will cover the actual database interaction. In this scenario, we pretend that the database exists by using a standard mocking framework (Mockito, etc.) and replace the DB/ORM layer entirely.
The main benefit of this is that we don’t need a database running and our tests will be blazing fasts.
Database migrations should be easy to replicate, as I have previously stated in my post about database migrations. That means that we should be able to automate this process, even on a local database.
By running a local database, it might seem that we are breaking our “no external dependencies” rule. I view it as we are bending the rules a little. Some might argue that we are trespassing on the territory of wide integration tests, but I disagree. We are limited to how many wide integration tests we can have without feeling pain, so we should limit them wherever possible, and doing database tests as narrow integration tests makes much more sense. There are two main reasons why I think that:
This approach is, to me, last resort, when there’s absolutely no other way of running a database. The reason for this is that this way of testing doesn’t make sure that the database understands our objects/data structure and vice versa. We never get to test that conversion. Even if we found some mocking system to throw around the ORM/Database framework, we still cannot be sure that the actual database will understand.
When we’re running tests towards an actual database, we must do a few things as part of the setup for our test:
There’s no lack of in-memory databases on the market, and the most well known might be SQLite. Using an in-memory database allows for quick start and teardown, which is excellent for our tests. Spring, for example, will spin up an h2 database as default.
If we use a database framework which allows us to switch out our database provider at will, and our schema is so generic that it can be transferred from one SQL based to another, then this can be a practical way. We might be other types of databases, like NoSQL, distributed and so forth. There might not be possible to switch over to a different kind of database provider that is used in production, or there might not be an easy way to spin up your database locally.
What is so great about these in-memory databases is that we can spin one up without requiring any external dependencies (like docker) to make things work.
Most database servers have their docker instance at docker hub. We have MySQL, MongoDB, progress, etc. are all on the docker hub. If that is not what is required, it is pretty easy to make your docker image which can be uploaded to whatever preferred image repository. Therefore, the first step will be to either find an official (or official enough) docker image or make your own.
The second step is to include this in your build tool. Whether it is Maven or MSBuild, most have dependencies which can interface with Docker and automatically start various containers. Or you can have a docker-compose file which gets executed before the build taking place. The point is that we’re spoiled for choice and there are multiple ways on how we can achieve spin up a new database instance on the fly.
While docker makes it easy to spin up databases of all kinds, it also comes with some quirks. For example whenever I run up a container with Oracle SQL, then docker returns successfully before the Oracle SQL server is ready. This means that if I continue with the build, then it may fail because the SQL server is still working on setting itself up. The workaround so far has been to add a hardcoded timeout to the setup, but that is not very elegant. If there isn’t much proprietary trickery in the schema, it might be well worth switching to a lighter database system for these tests, like SQLite. It might not be possible, but it is something to consider. We want our integration tests to be as fast as possible, and the more we have to wait for the tests to complete the less value they bring. The actual integration will be indirectly verified by the wide integration tests anyway, so it isn’t vital that we are using the same database for the narrow integration test.
Some view batch systems as an outdated practise which has been replaced by messaging brokers, such as RabbitMQ and Kafka. While I can see the reasoning for that, there is no denying that batch jobs exist and file generation is still very much a thing. Another thing to consider is that the file system is a dependency our application has. Depending on the language and framework used, it might almost be abstracted away at this point, but it is still there. This means that the structure of our files, both read and written, is a contract which we must verify.
Some jobs are, inherently, batch-based. Banks might want a report generated once a day in the form of an XML file (or CSV, ugh). Some governmental systems require a yearly report generated in a specific format and so forth. While the basis of this data might come from a broker of some sort, there are still business requirements which dictate that we generate specific types of files at particular times, so batch applications and file transfers will continue to be a thing for the foreseeable future.
Writing these tests is pretty straight forward:
When generating a file, I’d like to suggest that we don’t write it to disk. I am much more in favour of wrapping the actual functional call, which writes a file to the disk, in a class and an interface:
In the model above, we see that we have wrapped the actual I/O call in a class. It is the last chain of our call, but we can intercept it as a mocked object. That means we can get direct access to the content of the file and all other parameters without writing anything to disk.
There are multiple reasons why we might avoid writing a file to disk:
No matter whether the file is written to disk or caught by a mock, we the next step in the process remains the same: Read the content of the file. It might be tempting to parse the content into some object structure, but I’d recommend against it—the more logic put into our tests, the more problems we create for ourselves. Instead, we should have a hard-coded file which our generated file should be equal to. We can choose to compare the content or hash the file and compare it that way.
Making that initial hard-coded test file can be a bit of a pain. Some might see me as a heathen for saying it, but I’d recommend generating a file based on your desired test data and use that as a starting point. We cannot get away from having to check the correctness of the hard-coded file manually, but when that is done, we only need to verify the correctness of any changes made to it.
Reading files is much easier than writing files. It doesn’t cause any external artifacts, and we don’t need to put in much effort to fake the environment the code runs in. To test an incoming file, we only need to:
Message brokers are popular these days and have become a vital cog in micro-service architecture, and therefore they should be a part of our integration tests. Message brokers are still an external dependency which requires our app to either send data in a particular structure over a specific protocol and be able to fetch data with a particular format. As with the other types of tests, the actual correctness of the contract can only be verified by the wide integration tests, but sometimes it is warranted to write narrow integration tests as well.
NOTE
You might get away with not having narrow integration tests for message brokers. It depends how much logic you might (or might not) have in the layer/class which deals with the message broker logic. If it is separate enough, then wide integration tests might be enough. If there is a bunch of logic which is hard to write regular unit tests for I do recommend writing narrow integration tests, but it might not always be required.
Luckily for us, it is pretty easy to write tests for either RabbitMQ, Kafka or whatever other message brokers you might be using. For RabbitMQ we can use RabbitMQ-mock and for Kafka has mocking built-in. The approach is simple: We mock our message broker in the setup of our test and simply run the test.
The other, maybe most common, external dependency might be other web APIs which comes in all shapes and forms. We may be dealing with SOAP, REST, GraphQL or some other technology, all of which bring their own challenges.
When I say web APIs, I am primarily talking about protocols running on HTTP. We will explore other ways of transferring data later, but for now, let’s keep the conversation to HTTP based protocols only.
The solution to all is mostly the same, however. We know that we cannot run all of these services locally as all might have other dependencies and databases which they rely on. The goal is to run something locally, which our service can connect to and return some predictable value, which we can use in our test. Currently, I’ve seen this be solved in two ways:
Rather than trying to run the external services locally, we can instead make a complete master service which only contains the test data. It duplicates all the contracts from the other services but returns the same data every time. This means that we can use the test service as a stand-in for whatever other dependencies we might need. This test service can be reused across a whole range of services and serve as the one-stop-shop for all narrow integration tests that go towards an API. I do not recommend having its test service for each real service, as that quickly becomes messy.
If we also package this test service into a docker container, we can simply add it to docker-compose and have it automatically start up during testing so that the developer doesn’t have to worry too much about it while developing.
While this approach works, it is not the one I would default to. Making a test service might be a good idea if you’re working with some proprietary technology, or any technology which doesn’t have a mock standing available, however, one should avoid this if possible as it comes with some downsides:
The easiest way of dealing with external web services in narrow integration testing is simply to mock it. After all, the actual contract will be tested by our wide integration tests; thus, we don’t need to worry about whether or not the contract is correct at this stage.
The easiest way to mock REST services is to use something like MockServer. These HTTP mock servers make it easy to capture arguments as well as responding over HTTP connections. MockServer specifically can even be used as a container, basically achieving the same capabilities as your test service, but configurable on a per-test basis.
Since GraphQL has a standardized way of dealing with requests, returning in standardized formats like JSON, we might also want to use MockServer for GraphQL, but MockServer is very basic in how it works and might quickly be too simple. A better solution might be to use the Apollo server instead. Either way, the answer is mocking.
SOAP is a fickle beast. I don’t have much against the technology itself except for its verbosity, but I despise the general environment which SOAP lives in. Pretty much all tools involved with SOAP either costs money or feel clunky to work with - it just all feels outdated. So does the default option for SOAP which is SoapUI. While historically been a desktop application, SoapUI can also be integrated with the application and run as part of your tests, and it can mock. What people seem to have done is making docker images with soapUI installed and starting the mock server that way, which is a very technology agnostic way of doing things.
When working with SOAP services I have resorted to using the method where I build my master test service, but there are projects like this which seems promising which I should take a closer look at (a post for the future?).
Writing narrow integration tests isn’t magic, and it is all about sensing outgoing and incoming data, as well as being able to send and respond to requests. If we can do these things, we can write narrow integration tests. The whole idea is to be able to contain the application in a test harness. That way, we can write these narrow integration tests, but we can also run our system locally without any dependencies to any external environment. That means that manual testing will also be more manageable.
The points above don’t capture all protocols, applications or ways of transferring data. So this section will be a general catch-all for whatever I haven’t concretely covered in the rest of this post, but we will see an available pattern on how to approach a given dependency and the solution.
Note that I am not saying that one solution is better than the other. It all depends on the context. For some dependencies, like the database, it is easier to just run the dependency rather than trying to fake it. For other dependencies, like SOAP, it might be easier to do a complete fake service to write tests for it meaningfully.
This is usually the most preferred way, as it makes life so much easier when writing the tests. It allows each test to specify the behaviour of the mock, as well as the return data. It also keeps all the test related data within the test itself. This approach tends to make the tests a bit bloated, but that is hard to avoid.
While there might not be a framework or server which is made for whatever programming language that is used, there might be a docker image which can be reused for this, so it is often worth looking around.
This system is described in the “make a test service” section in this post, but it is not limited to webs services only. Whatever that cannot be mocked easily; we can replicate with what I like to call a manual mock.
Sometimes, like in the case of SOAP, there might not be any readily available mocking tools for whatever platform you use. Or the ones that exist are so clunky that you don’t want to use them. In these situations, we might want to consider making a mock system ourselves.
Writing your manual mocks might not be worth it, and should be judged on a case-by-case basis. It is generally worth it if most of the solution relies on the technology which we are trying to mock. If it is just one dependency in the overall solution that has this problem, then it may not be worth it and instead use one of the other approaches might serve you better.
If there is no easy way to mock the dependency itself, it might be easier to simply mock the part of the code that interfaces with the dependency. In this approach, we simply extract out all the logic and hide the direct calls to the external dependency behind an interface which we can pretty quickly mock with a standard mocking framework.
The downside is that this might require rewriting parts of the service, which we usually want to avoid, but it is guaranteed to work with pretty much any modern programming language and any protocol on the virtue that we are merely ignoring the external dependency. Another downside is that we are not testing whether or not our application can communicate with that technology, but that will be caught by the wide integration tests anyway, so we shouldn’t worry too much about that.
Another thing to consider when going with this approach is whether or not we are writing integration tests at this point, and I would agree. At this point, we are writing unit tests, as we have abstracted away from any I/O dependency. This is also why we should be careful going this route as we are not testing the integration in any way. That said it is all about confidence in our application working, and if we trust our wide integration tests, this might be an appropriate solution.
If possible, we should require as little setup as possible when writing our narrow integration tests, but sometimes it is easier to run the dependency than it is trying to fake it in some way. Technologies like Docker have made setting up reusable containers much more straightforward than it ever has.
A consideration that should be made when considering this approach is whether or not the desired test belongs as a wide integration test rather than a narrow one. We made an exception with the database previously, but that is because databases tend not to need any other dependencies and our systems tend to have a bunch of different queries which all need to be tested. There might be similar situations for other scenarios where this approach does make sense.
One thing that keeps disappointing me in the world of programming is people’s lack of understanding that things almost always build on each other. Integration tests are one of those things. You can have wide integration tests or big E2E tests, but they will not be able to capture the detail which narrow integration tests can. At the same time, we are limited to what we can test in our narrow integration tests, so we also need wide integration tests. Having just one category of integration testing is better than nothing, but it is nothing compared to the benefit of having both.
Writing tests is hard. Not only because it is difficult to get right, but it is also something which many don’t prioritize all that much. When we add the additional dimension, which is external dependencies, we make it even more challenging to manage, so we often see bloated test environments which take hours upon hours to execute tests. Or we see such tests disabled in the pipeline entirely because people can’t be bothered with fixing them. I do think this is the wrong approach, and instead, we should work with making our tests more reliable and portable. I hope that my post has given some food for thought in that sense.
Another goal of this post was to serve as a starting point on how we write our narrow integration tests as well as documenting some of the patterns I tend to use. Hopefully, I discover newer and better patterns as I progress as a developer.