Avoiding date overlaps

Balázs Németh
5 min readApr 5, 2019

In this article I’m going to talk about a simple problem: how to avoid inserting overlapping records in a database.

Cafú is regarded one of the best full backs of all time. A king of overlapping runs.

First when you think about the problem it seems to be very easy to solve. Is it really?

Problem

Let’s say we have a business entity of which value is only valid between a certain pair of dates. For example the price of a ticket could be defined as follows:

TicketPrice (id, orig, dest, valid_from, valid_until, value)​​CREATE TABLE ticket_price
(
id VARCHAR(255) NOT NULL,
valid_from datetime,
valid_until datetime,
origin VARCHAR(255) NOT NULL,
dest VARCHAR(255) NOT NULL,
price int,
PRIMARY KEY (id)
);

When you’d like to get from FRA to JFK, you would simply query for a price from FRA to JFK if your departure date is between ‘valid_from’ and ‘valid_until’:

SELECT value FROM ticket_price where orig=’FRA’ AND dest=’JFK’ AND 2019–03–30 BETWEEN valid_from AND valid_until.

It’s easy to see that why expect to get back a single price, there cannot be multiple prices for the same route.

So imagine we have the following records:

​(1, ‘BUD’, ‘TXL’, 2019–01–01, 2019–12–31, 100)​​(2, ‘BUD’, ‘TXL’, 2020–01–01, 2020–12–31, 200)​​(3, ‘BUD’, ‘TXL’, 2020–02–01, 2020–03–31, 222)

But here id=3 overlaps with id=2 which is invalid.

:’(​​​

So first you would think you could validate this simply in the application code by calling a DB repository (eg. Spring Data):

boolean isOverapping = !ticketPriceReposity.findOverlapsForRoute(origin, dest, validFrom, validUntil).isEmpty();

But wait!

Parallel requests containing 2 and 3 could go through as both of them would see isOverlappingfalse and then we would have an inconsistent state written in the DB.​

Option A — Prevent race condition​​

So the first idea would be to prevent this scenario above happening.

We would d​isable the scaling of write services and only allow a single application instance to process incoming requests. That’s sadly not enough, we would have to serialize the requests as well, in Java maybe with some locking or synchronizing mechanism.

​​Verdict: :-(​​

​​Too strict, not flexible. I somehow feel like I even have to apologize for bringing this up. But this is a natural idea which simply doesn’t work well in practice.

Option B — DB level​​

Since the database is the point where all micro service instances meet, it seems to be a feasible idea to put this uniqueness check there. It’s also the classic solution, it always has been the option chosen by many in the past.

But different DBs support this very differently and there’s no solution which works everywhere the same way.​​

​​MySQL: trigger. ​

Postgres: partial index or a special unique index.

Mongo: ?

​​We’d like to avoid putting logic in the DB. First it looks simple but the logic in the DB can blow up and get bigger any time. And it’s difficult to test.​ Actually my biggest concern is testing. Sure, with docker and testcontainers you can do wonders but these sort of integration tests are slow and therefore expensive.

But still, let’s have a look at the before insert trigger in MySQL:

delimiter //​​
create trigger ticket_price_def_unique_trigger before insert on ticket_price​​
for each row​​
begin​​
declare msg varchar(128);​​
if EXISTS (SELECT 1 FROM ticket_price WHERE origin=NEW.origin AND dest=NEW.dest​​
AND (​​
((new.valid_from BETWEEN valid_from AND valid_until) OR (new.valid_until BETWEEN valid_from AND valid_until))​​
OR​​
((valid_from BETWEEN new.valid_from AND new.valid_until) OR (valid_until BETWEEN new.valid_from AND new.valid_until))​​
)) then​​
set msg = concat(‘Trying to insert overlapping rule: ‘, cast(new.origin as char), ‘-’,cast(new.destination as char));​​
signal sqlstate ‘45000’ set message_text = msg;​​
end if;​​
end​​
//​​
delimiter ;​​

It works fine but it looks ugly as hell.

Option C — Aggregates​​​

I’m sure some of you read book by Eric Evans (I still haven’t managed), and this is where the pattern is coming from.

The more I think about the problem, the more I see that these validity periods somehow belong together. And if they belong together logically, that what stops us from coding it that way?

Let’s call that object containing everything for an origin destination pair an aggregate. It’s also a write model: containing the data necessary for doing validation when we’d like to write the state of our application.

Aggregate (orig, dest, validity_periods: [{from, until}, {from, until}, …, {from, until}])

The validity periods would be a jsoncolumn or even a testor blob if you don’t have the json feature enabled in your database engine.

Now we also keep the original entity, which is simply a read model object, an object used to serve the UI, a read optimized view on the state of our application.

Read Model Entity (id, orig, dest, from, until, value)

Now to complete the CQRS pattern, we name the incoming request objects as commands, the actions intending to change the state.

​Command (orig, dest, from, until, value)

In this case it’s the entity without an ID.

And now the logic as pseudo-code:​​

AggregateRepository aggregateRepository;
ReadModelRepository readModelRepository;
@Transactional
public void createEntity(Command command) {
Aggregate aggr = aggregateRepository.findBy(command.orig, command.dest)​​; //if the aggregate doesn't exist, we need to create a new one!
Boolean valid = aggr.checkValidity(command.from, command.to)​​;
if(valid) {
aggr.addValidityPeriod(command.from, command.to);
aggregateRepository.save(aggr)​​;
readModelRepository.save(new Entity(command))​​;
} else {
//Invalid: Return validation error to the client
}
}

An aggregate serves as the transaction boundary for our business objects. Every write operation goes through them, and if you don’t have millions of validity periods, this trivial setup will solve your problem.

And it’s very important to note that we still need a unique index for orig+dest (or the natural key) on the aggregate. We can only have one instance for a route.​​

​​But since the good old race conditions can nevertheless happen, we need to add a version flag to the aggregate in order to treat them with optimistic locking:

Aggregate (orig, dest, validity_periods, version)

If you use JPA, introducing this is very cheap.

Summary

There you have it! I’ve shown you three options how to avoid overlapping dates in your application, and although the last option was the point of the article, in some cases the DB based approach works just fine.

One side note is that option B is the safest to use, especially in organization where the developers and others are not working together: with Option C it can happen that someone wants to see a record immediately in production, and inserts it manually into the read model but not the aggregate itself. The consistency is gone in this case, and there’s no way to prevent this. A team practicing DevOps and CI/CD + proper test automation wouldn’t do that but for the rest the risk is there.

Maybe I’ll expand on that later.

--

--