How to efficiently filter by rental resource availability
While looking into a house rentals website, I noticed an interesting problem: How do they filter the results by booking availability?
Here's Airbnb's UI for this just so the feature I'm discussing is clear.
The search on that website was blazing fast even with tons of results, and it was using Algolia (probably closer to NoSQL, custom search engine in C++), so I knew there was no slow 'JOIN's happening in the background. The filters also included other details related to houses.
I've worked with Elasticsearch and other NoSQL databases, and from a data modeling perspective there has to be some embedded field that the query was using to filter by available dates for it to be this fast.
After looking into the problem for a couple of hours and finding zero write ups on this, I decided to break it down in this post.
So what are the functional requirements here?
- Filter results by a start and end date
Non-functionally, we just want to use a pre-existing database, so we can't come up with custom data structures here.
Building this with a JOIN
Given a table with room reservations (one entry per day reserved for example), we can use this query to filter our results.
This does result in an expensive JOIN however.
Importance of this feature
In any 'resource' rental application, this search feature is probably going to be the most used feature in the app. So an expected minimum number of queries per day is going to be #DAUs x 5. If this feature is slow then your users are probably going to try another site instead.
To avoid JOINs, the easiest workaround is embedding some field on each entry. The filtering would also be a lot faster since it doesn't have to combine entries from two tables to do the filtering.
So what do we embed?
First thought that comes to mind is embedding a list of reserved date tuples:
But this is hard to search efficiently. No database offers filtering on an array of date tuples with a numeric range.
I think finding the right solution here was extra hard for me because I was unconsciously trying to store all this information in one document. The solution became clearer when I relaxed that constraint and allowed partial duplication.
The solution I came to is the following: For every available time range, we store a new document for this resource. Now that we have 'available_from' and 'available_to' fields, it's easy to filter by a numeric range.
White is available, black is booked/unavailable.
With a little bit of extra storage, we've massively sped up this query.
The new query looks something like this, and can be run pretty efficiently (just a numeric range query):
Now how do we populate this data? Upon every reservation made, we'll have to do some post-processing and update these documents / generate new ones as well.
Keep in mind that this isn't suggesting you store your primary data like this. Think of this more of a replica of your data or a completely separate collection/table. In some examples below we completely delete the resource document, which is a problem if it's your primary data source.
Does this cover everything?
To check that this solution covers all cases, here are all the edges I can think of:
- If the rental request covers a subset of the available range, then the resource document is just updated to reflect the new range, and potentially a new document is added to show availability on the other edge. Ex. someone books 3 only here.
- Resource availability has a finite range (ex. available from x to y only) and the booking completely covers this range (x to y request to book). In this case we just delete the remove the document. It might be that the resource is available in the future, but that's a separate document in the resources collection.
- Rental request splits an infinite range (as in the first image posted above). In this case we also get two documents out of it, unless the rental request was on the left edge. In that case the document is only updated:
Sometimes duplicating documents with different properties can provide us with clever ways to do efficient search. The rental availability search problem seems like a prevalent problem in many applications, and we shed the light on an efficient solution that doesn't involve any joins in this writeup.
I hope you enjoyed this design deep dive!
If you'd like to see more of this, make sure to subscribe to this newsletter to let me know that you're interested! Right below is a link to sign up.
You can find me on Twitter every day @iamramiawar, feel free to shoot me a DM!