Build a sequence that is reset everyday

Posted on 2021-04-08 in Programmation

How to create a database sequence that will be reset everyday? That is a sequence that will get one the first time of each day? It sounds easy but it's not that much.

Note

This article will focus on PostgreSQL since it's the database I use. A similar solution may be possible with other database systems, it'll be up to you to adapt if needed.

Note

This article will focus on Django since it's the framework I use. A similar solution is possible with other frameworks. I think all mature ones will give you the tools you need to do this. Again, it'll be up to you to adapt if needed.

Use a Sequence?

It sounds like the obvious solution. If you don't know what a sequence is, it's a feature for PostgreSQL which is used under the hood to create sequential primary keys. We can create custom ones for our needs, and we can also reset them, so they can start over if needed. So PostgreSQL would do all the work for us giving a very simple a robust solution. And it works perfectly with transactions out of the box.

We can see it in action with:

-- Use the NO CYCLE option to be sure never to have the same sequence twice for the same day.
CREATE SEQUENCE mysequence
    INCREMENT BY 1
    MINVALUE 1
    MAXVALUE 9999999
    START WITH 1 NO CYCLE;

We can then use it like this:

-- Launch this a few times to view the sequence increment.
SELECT nextval('mysequence');

Or with transactions:

  1. In a first psql console:

    BEGIN;
    SELECT nextval('mysequence');
    
  2. In a second psql console while keeping the first transaction opened:

    -- This will display the next numbers of the sequence as intended.
    BEGIN;
    SELECT nextval('mysequence');
    

So we always immediately get the next number, no matter if the transaction was committed or not. This means we can get holes our sequences if a transaction fails, but generally this shouldn't be a big deal.

We can then reset the sequence with a cronjob launched at midnight with:

ALTER SEQUENCE mysequence RESTART WITH 1;

So far so good. However, they are issues with this solution:

  • As stated earlier, we can have holes in the sequence. This shouldn't be a problem in most cases though.

  • We must own the sequence to alter it, so it can be an issue to reset it in a cronjob if you restricted what the database user used to run the site can do. That's what I describe here. If you don't apply this pattern or manage permissions differently for the sequence, it shouldn't be an issue.

  • If we launch the reset of the sequence in a transaction, calls to get the next values will block until the transaction ends. This shouldn't happen since we reset the sequence in a dedicated command from a cronjob at a time where the site doesn't have much traffic. And altering the sequence is immediate if PG can get a lock on it.

  • We need a cronjob to reset the sequence. It looks like nothing, and something that can be easily achieved on all platform. So it looks like a small issue like the ones listed above. In fact, it can be a show stopper: the cronjob will never pass exactly at midnight. In all crontab implementations I've seen, you can plan at most at the minute level. And all platforms I've seen, try to launch it as close as possible of the scheduled time, but it can vary from a few seconds to a few minutes depending on the implementation. And the command itself will need to launch and actually reset the sequence, which will take a big of time. All this means we can be in the following situation:

    1. Ask for the next value in the sequence for the first time of a day at 00:00:01. We expect to get 1, but, since the sequence hasn't been reset yet, we get something else. Let's say 10.
    2. The sequence is reset at 00:00:05.
    3. The day goes one, and we get 10 again near 17:00:00. Now, we got the same sequence twice in the same day.

    Is this an issue? It can be if you require the sequence to generate a unique value for a given day. In my case, that would mean a duplication in a column protected by a unique index (because I need this uniqueness). When can this not be a problem?

    • You don't care about the duplication.
    • If you know your system will never be used near the time the sequence is reset. However, never is a very strong words in systems used by humans. You may also forget about this constraint later and launch automated commands that will need the sequence.

    Can we mitigate the problem?

    • If we have a unique index where we use the sequence, we can detect the duplication and ask for another number from the sequence until we can insert the row. Sadly, I think this always makes the sequence logic leak to other part of the code.
    • We could store the date at which the sequence was reset and combine the date of this reset with the sequence number to get a number unique for each day. This requires a table. Since you have a table, it also means you don't need the sequence. Which leads us to the solution below.

Using a table

We can use a table which stores for each day the sequence number: we store the sequence associated with the date and then retrieve the value for the date, update it and use it.

This has several advantages:

  • It's fairly easy to create the row and update it with Django. We can maintain uniqueness with a unique constraint on the date. Whereas with the sequence, some commands required raw SQL.
  • We shouldn't have holes in the sequence if we handle the increment correctly with locks. These locks are required to avoid duplicated values anyway since Django doesn't support UPDATE my_sequence_table SET inc = inc + 1 WHERE sequence_date = CURRENT_DATE; to update and retrieve the value in one atomic instruction.
  • We can detect potential duplication in the code that handles the sequence. It doesn't leak to other part of our code. This can occur at the very start of a day, when we still haven't created the sequence for a given day and two threads try to create it at the same time. One of the thread will wait for the other one to complete the insertion and then get an integrity error. Meaning we can detect the problem and retry by getting the next value out of the sequence.

On the downsides:

  • The implementation is not obvious and requires locks and nested transactions to work properly. It can be a good idea to add lock_timeout=20s when you connect to the database to be sure locks won't block everything. This can be done like this:

    DATABASES = {
        "default": {
            "ENGINE": "django.db.backends.postgresql",
            "NAME": env.str("DB_NAME"),
            "USER": env.str("DB_USER"),
            "PASSWORD": env.str("DB_PASSWORD"),
            "HOST": env.str("DB_HOST"),
            "PORT": env.str("DB_PORT"),
            "OPTIONS": {
                # Configure a statement timeout to avoid running long statement even if the HTTP request
                # was killed or timed out.
                # See https://blog.heroku.com/postgres-essentials#set-a-code-statement_timeout-code-for-web-dynos # noqa: E501
                # Configure lock timeout to avoid blocking queries because of invalid locks.
                "options": "-c statement_timeout=30s -c lock_timeout=20s",
            },
        }
    }
    
  • The table will keep growing while we only need one value per day.

    • Since there are only 365 days in a year, we shouldn't have too many rows in the table. We also have an index (required to maintain uniqueness per day) on the date column we use to retrieve values. So this should be a problem within a reasonable time frame.
    • We can mitigate this by cleaning the table from time to time if needed, so definitely not a big deal.

Here is a possible implementation. Normally with the explanations above and the comments in the code you should be able to understand how it works. If not, please leave a comment!

 1 import logging
 2 from datetime import date
 3 
 4 from django.core.exceptions import ObjectDoesNotExist
 5 from django.core.validators import MinValueValidator
 6 from django.db import models, transaction
 7 from django.db.utils import IntegrityError
 8 
 9 logger = logging.getLogger(__name__)
10 
11 
12 class SequenceManager(models.Manager):
13     @transaction.atomic()
14     def get_next_sequence_for_day(self, day: date):
15         """Get the next sequence for the current day.
16 
17         If the sequence doesn't exist, we will create it. If it already does, we will
18         update it.
19         All this must be done in a transaction so we can correctly lock the sequence
20         row for update and then update it it in one go and be sure no other threads
21         will tangle with it.
22         This way, we are sure not to get duplicated sequences.
23 
24         We need to do a select_for_update and then update/save the model since Django
25         doesn't support the RETURNING feature of UPDATE statements, preventing us
26         to do the update in one and retrieval in one query without using raw SQL.
27         Since we need the model to create the sequence more easily, it makes sense
28         to use the model all the way.
29 
30         Note: since the day column is unique, the database will wait for the insert
31         of row with a given day (say 2021-03-01) before trying to insert another
32         at the same day.
33         This means, we correctly wait for the row to be created by the other thread
34         before moving on.
35         So, if we fail to create the row within this thread, we know another one
36         created it.
37         If we try to insert a row for a different day, nothing blocks as expected.
38         """
39         try:
40             # This block may fail if another thread create the object after we checked
41             # but before we created it. This result in an integrity error, meaning
42             # we cannot use the current transaction for anything.
43             # To prevent this, we nest the transactions. This inner transaction may
44             # become invalid, but the surrounding transaction (at function level)
45             # will stay valid allowing us to update the row while maintaining
46             # data integrity.
47             with transaction.atomic():
48                 logger.debug(f"Trying to update the sequence for {day}.")
49                 sequence = self._get_or_create_sequence_for_day(day)
50                 self._update_sequence(sequence)
51                 logger.debug(f"Succeeded to update the sequence as expected for {day}.")
52         except IntegrityError:
53             logger.debug(
54                 "It looks like the row didn't exist yet so we tried to create it, "
55                 "but another thread created it while we were checking. So we must "
56                 "catch the potential integrity error and try to get the model again."
57             )
58             sequence = self._get_sequence_for_day(day)
59             self._update_sequence(sequence)
60             logger.debug(
61                 f"Succeeded to update the sequence after integrity error for {day}."
62             )
63 
64         return sequence.sequence
65 
66     def _get_or_create_sequence_for_day(self, day: date):
67         try:
68             return self._get_sequence_for_day(day)
69         except ObjectDoesNotExist:
70             logger.debug(f"Creating the sequence for {day}")
71             return self.create(day=day)
72 
73     def _get_sequence_for_day(self, day: date):
74         return self.get_queryset().filter(day=day).select_for_update(of=("self",)).get()
75 
76     def _update_sequence(self, sequence):
77         sequence.sequence += 1
78         sequence.save(update_fields=["sequence"])
79 
80 
81 class Sequence(models.Model):
82     """The goal of this model is to track the daily sequence.
83 
84     **This must not be manipulated directly, always pass by the manager!**
85     """
86 
87     objects = SequenceManager()
88 
89     day = models.DateField(unique=True)
90     sequence = models.PositiveIntegerField(
91         validators=[MinValueValidator(0)], default=0
92     )

You can then use it by calling the get_next_sequence_for_day method on the manager, like that:

Sequence.objects.get_next_sequence_for_day(day)

We can also run a more complete test:

  1. In a first python manage.py shell_plus:

    from time import sleep
    from datetime import date
    
    with transaction.atomic():
        print(Sequence.objects.get_next_sequence_for_day(date.today()))
        print(Sequence.objects.get_next_sequence_for_day(date.today()))
        sleep(10)
    
  2. In a second one:

    from datetime import date
    Sequence.objects.get_next_sequence_for_day(date.today())
    

We will see 1 and 2 printed while in the other terminal. We correctly see the statement to print 3 after waiting for the first transaction to complete. Here we tested for the insertion of the first value of the day. You can of course re-run this code to check the behavior for standard, concurrent usage.

We can also check the case when the transaction fails:

  1. In a first python manage.py shell_plus:

    from time import sleep
    from datetime import date
    
    with transaction.atomic():
        print(LoanApplicationBusinessIdSequence.objects.get_next_sequence_for_day(date.today()))
        print(LoanApplicationBusinessIdSequence.objects.get_next_sequence_for_day(date.today()))
        sleep(10)
        raise RuntimeError()
    
  2. In the second:

    from datetime import date
    LoanApplicationBusinessIdSequence.objects.get_next_sequence_for_day(date.today())
    

We will see 4 and 5 displayed in the first terminal and then the error while the second terminal will wait and then display 4 as expected.

Wrapping up

This was not as easy as I though. The sequence immediately popped into my mind when tasked to tackle this problem. However, after some thoughts, it turned out to bring issues that I couldn't solve with it. Hence my solution with the table, which I decided to use. It's more complex but works as expected. If you have a remark or comment, please speak below.