Promoting Cleaner Code with Serializers
There is one fundamental principle that will always be brought up when talking about clean code: Functions should be small and smaller! It is always the case that code should be broken up into small functions. All functions in a project should be created to serve one purpose, and they should be really damn good in performing that one purpose given to them.
While this principle looks simple and makes sense on paper, it is sometimes forgotten when developers are working with real problems. One such example is input validation. Input validation is a tedious yet important task to prevent malicious requests being served by a server. Failure to properly validate an input will result in a future security compromise and undefined behavior.
But how can input validation causes an otherwise great code into a messy wall of unreadable code? Let’s look at an example that I have encountered this week:
Our stakeholders have a database containing information of (possibly) thousands of TBC patients. To make data analysis easier, we are tasked to implement a API request which receives a bunch of filtering query parameters and should return the subset of patient information that fits the given query in CSV format.
So for example. Suppose that your endpoint receive the following query:
{ minAge: 30, maxAge: 50, district: "Beji" }
Then our endpoint should retrieve the data on all patients in age range 30–50 years old residing in district named “Beji”. All information are already stored in our database, so it should be easy to filter them. Do note that there are some filters omitted here, like the “is_male” attribute that indicates whether only male/only female patients should be considered in our response. In case of omission, no filtering should be done on that attribute.
This is a standard web application functionality that can be split into 3 steps:
- Receive (and validate) the query parameters.
- Retrieve the required data.
- Convert the data into CSV format.
None of these steps are by themselves hard to do. Django supports performing chaining filter, so filtering data based on a subset of query should be trivially easy. Converting data into CSV format should not be a problem with all of the third party libraries available online.
Now for the tricky input validation. On principle, this should be a trivial task that can be done with several conditionals. Just perform a few ifs to check if a certain parameter is given, whether it has a correct format (e.g. age parameter should be given as an integer), etc. Now, one might be tempted to perform this step along with the second step of retrieving the requested data. After all, retrieving the requested data will require roughly the same amount of ifs that the first step used. If your program already validates a parameter, why not just perform the filtering right away? This, while feasible, will most certainly result in a spaghetti code like this former implementation of the solution that I encountered:
Multiple try-except blocks, nested ifs that will be a nightmare for unit tests and test coverage. While it’s not the worst code in existence, it is certainly not the best either.
So, how can we fix this?
The first principle being violated hare is obvious: “Function should do one thing!”. So, let’s break this code up into two smaller functions
Much better! For the main get function at least. We merely just moved the mess of validations into a separate function. On the positive side, should there be a problem with the validation function above, the QuerySet filters below will not be affected!
Now, notice that the ‘joy’ of input validations is still out and open in our code. The _validate_query() function is still hard to be tested or read. Moreover, there are still several holes yet to be checked in that validation (Example: What if the age number is negative? What if the district name given is not valid? What if the district name and the sub district name is not consistent?).
You may suspect that this kind of validation problems should be a widespread issue in software development and someone should have already invented a library to automate this tasks. You will be right! Django has offered us a solution in the form of Django Serializer.
Serializers in Django is a library designed to perform serializations. Serializations is the process of changing a representation of specialized data format (e.g. Django Model object) to a more standardized/acceptable form like a Python dictionary, or a JSON string. For the sake of this article, I will not explain this process in detail. However, Django Serializers are also good at deserializing data. In practice, this means that Django Serializers can received a general, unvalidated data such as JSON string from a HTTP request body and transform it into a more suitable format after validating it.
Serializers are designed to be readable, only displaying the requirements given by the developer and hiding the messy validations process inside their Field classes. They are also customizable enough to suit most of our needs. To demonstrate, let’s try to make our validation phase cleaner by using Serializers!
Here is a basic implementation for our specific use case. A quick look on this snippet will give us a few key insights.
Our serializer class should inherit the serializer.Serializer superclass. There are some more variants available in Django Rest Framework (Such as ModelSerializer for dealing with Django Model class), but a simple Serializer will do for now.
All variable that we wish to validate should be written as a variable inside our serializer. Take note that all of them are assigned as one of the Field classes. This assignment tells us the criteria of our Serializer:
- NullBooleanField means that the is_male variable should take a form of boolean value. This can either be Python boolean value (True or False), a string containing the word ‘true’ or ‘false’. The BooleanField will handle it for us.
- IntegerField means that both min_age and max_age parameter should be an integer. Any non-numerical representation will be rejected. Unfortunately, negative integers will still be accepted. We will see how we can fix this later.
- CharField means that district and sub_district paramter should be a string. We can specify a maximum length if we need to using max_length parameter, but it is not needed for now.
Notice also the required = False parameter in each of the fields. This signifies that if any/all of the parameters mentioned above are absent in our query, the serializer class will not raise an error.
As you can see, this specification already covers more than half of our requirements, leaving only 4 unchecked rules:
- min_age and max_age should be positive
- district and sub_district value should be a valid one
- min_age and max_age should come together. I.e. they should both either not exists or exists as a parameter.
- district and sub_district value should be consistent
For the first one, we can use PositiveIntegerField() instead. The meaning of the name should be self explanatory.
For the second one, we can add choices in out CharField parameter, given that a list containing all valid district names exists.
For the last two, we can specify a custom validate() method that will be called when all of the other validations have been checked.
Adding the last 4 checks, we get:
Now, this serializer is ready to be used for our validation.
To utilize our newly created serializer, add:
First, we call the is_valid() method with raised_exception=True so that ValidationError will be raised in case of invalid data. And, because we are working within Django views, according to Django REST Framework API Guide:
These exceptions are automatically dealt with by the default exception handler that REST framework provides, and will return
HTTP 400 Bad Request
responses by default.
And we are done! No try-except block needed.
In conclusion, we can see how input validation is a rather quite tricky task for a simple programmer to handle. It is wise to not reinvent the wheel and use the already existing and tested Django Serializer. The use of Serializer also promotes clean code and increase readability.
There are a lot more features offered by Django Serializer: Custom error message, custom built-in validations, and other fields yet to be mentioned such as DateTimeField which can be really helpful for other scenarios. Hopefully, this article can demonstrate the basic use of serializers in back end development and it’s potential to create a cleaner code.