How to Avoid Race Condition in Django

Photo by Joe Neric on Unsplash

How to Avoid Race Condition in Django

Avoiding race condition using F() expression

Overview

In this article, you are going to learn how to build Django application that is devoid of race condition so as to provide real and accurate data.

Imagine you are building a like system or voting system for a website or API and the value of the vote or the number of likes isn't consistent and accurate. You can as well imagine so much havoc that would cause. For instance, the wrong candidate might win an election or contest due to a logic error in your code.

Introduction

This article intends to solve race condition in the tutorial polls app in Django docs

The code for our vote() view does have a small problem. It first gets the selected_choice object from the database, then computes the new value of votes, and then saves it back to the database. If two users of your website try to vote at exactly the same time, this might go wrong: The same value, let’s say 42, will be retrieved for votes. Then, for both users the new value of 43 is computed and saved, but 44 would be the expected value. This is called a race condition.

The above is an excerpt from the polls app tutorial.

Simulating Race Condition Scenario

Generally, it is difficult to actually reproduce or create a race condition. As so, It is difficult to click on the radio button of the choice at the same time manually to get a race condition.

How can you then show that race condition can actually occur.

I mean seeing they say is believing

And the moment you see the result, you will actually be surprised about how much havoc this can cause in your website / API or data collection.

Well, with the knowledge of multithreading you can actually prove that race condition can occur by creating two threads.

One thread could retrieve, increment, and save a field’s value after the other has retrieved it from the database. The value that the second thread saves will be based on the original value; the work of the first thread will be lost.

So you will write a python script using the knowledge of multithreading to simulate such scenario

**NB:**In case you don't know what multithreading is, you should read the first article in this series.

  1. Create a python file named race_condition.py in the polls app directory

  2. Enter the code snippet below in the file.

import threading
from .models import Question, Choice



def like():
    que = Question.objects.first()
    option = que.choice_set.first()
    option.votes += 1
    option.save()

def thread_task():
    for i in range(20): 
        like()

def main():
    que = Question.objects.first()
    option = que.choice_set.first()
    option.votes = 0
    option.save()

    t1 = threading.Thread(target=thread_task)
    t2 = threading.Thread(target=thread_task)

    t1.start()
    t2.start()

    t1.join()
    t2.join()

for i in range(3):
    main()
    print(f'Iteration {i} vote counter: {Choice.objects.first().votes}')

Run this command in the terminal/cmd to run the codes. python manage.py shell

from polls import race_condition

The output should look like this: Iteration 0 vote counter: 20 Iteration 1 vote counter: 22 Iteration 2 vote counter: 32

The expected value should be 40 in each iteration. But you might get inconsistent values like 20, 22, and 32.

Avoiding Race Condition

Here is the code snippet of views.py file as contained in the docs.

from django.http import HttpResponse, HttpResponseRedirect 
from django.shortcuts import get_object_or_404, render 
from django.urls import reverse
from .models import Choice, Question

def vote(request, question_id): 
    question = get_object_or_404(Question, pk=question_id) 
    try: 
        selected_choice = question.choice_set.get(pk=request.POST['choice']) 
    except (KeyError, Choice.DoesNotExist): 
    # Redisplay the question voting form. 
        return render(request, 'polls/detail.html', { 'question': question,                 'error_message': "You didn't select a choice.", })
    else: 
        selected_choice.votes += 1 selected_choice.save()
        # Always return an HttpResponseRedirect after successfully dealing 
        # with POST data. This prevents data from being posted twice if a 
        # user hits the Back button.
        return HttpResponseRedirect(reverse('polls:results', args=(question.id,)))

The docs says there is a small problem with the view that handles the voting. Below is a breakdown of the problem: selected_choice.votes += 1 selected_choice.save()

The value of selected_choice.votes has been pulled from the database into memory and manipulated using the familiar Python operator (+) to increment it by 1, and then saved the object back to the database. The issue this causes is that it will only update the field based on its value when the instance was retrieved rather than based on the field's value in the database when the save() or update() is executed.

In the next section, you are going to see how we can hand over the increment operation to the database which makes it more robust and accurate.

F() expression

An F() object represents the value of a model field, transformed value of a model field, or annotated column. It makes it possible to refer to model field values and perform database operations using them without actually having to pull them out of the database into Python memory.

Instead, Django uses the F() object to generate an SQL expression that describes the required operation at the database level.

from django.db.models import F

selected_choice.votes = F('votes') + 1
selected_choice.save()

Although selected_choice.votes = F('votes') + 1 looks like a regular Python assignment of value to an instance attribute, but it's not. As a matter of fact, it’s an SQL construct describing an operation on the database.

When Django encounters an instance of F(), it overrides the standard Python operators to create an encapsulated SQL expression; in this case, one which instructs the database to increment the database field represented by selected_choice.votes.

If the database is responsible for updating the field, the process is more robust: it will only ever update the field based on the value of the field in the database when the save() or update() is executed, rather than based on its value when the instance was retrieved.

Whatever value is or was onselected_choice.votes, Python never gets to know about it - it is dealt with entirely by the database. All Python does, through Django’s F() class, is create the SQL syntax to refer to the field and describe the operation.

Having the database - rather than Python - update a field’s value avoids a race condition.

You make changes to the like function in race_conditon.py and run it again.

import threading 
from .models import Question, Choice
from django.db.models import F

def like(): 
    que = Question.objects.first() 
    option = que.choice_set.first() 
    option.votes = F('votes') + 1 option.save()

def thread_task(): 
    for i in range(20): like()

def main(): 
    que = Question.objects.first() 
    option = que.choice_set.first() 
    option.votes = 0 
    option.save()

    t1 = threading.Thread(target=thread_task)
    t2 = threading.Thread(target=thread_task)
    t1.start() 
    t2.start()
    t1.join() 
    t2.join() 
    return "hello"

for i in range(3): 
    main() 
    print(f'Iteration {i} like counter: {Choice.objects.first().votes}')
Iteration 0 like counter: 40 
Iteration 1 like counter: 40 
Iteration 2 like counter: 40

So the number of votes is now consistent and correct.

Final Thoughts

In summary, you learned about the following:

  • how multiple threads can be used to simulate a race condition,

  • how to make a database update a field value rather than python using F() expression.

  • the importance of avoiding race condition.

References

  1. Django Polls Tutorial. It can be retrieved here

  2. Django Docs: F() expression. It can be retrieved here