I was working with a table that had a similar (simplified) structure like this:
| uuid | file_path |
|----------------------------------|---------------------------|
| b8658dfc3e80446c92f7303edf31dcbd | media/private/file_1.pdf |
| 3d750874a9df47388569a23c559a4561 | media/private/file_2.csv |
| d177b7f7d8b046768ab65857451a0354 | media/private/file_3.txt |
| df45742175d7451dad59761f15653d9d | media/private/image_1.png |
| a542966fc193470dab84351c15523042 | media/private/image_2.jpg |
Let’s say the above table is represented by the following Django model:
from django.db import models
class FileCabinet(models.Model):
uuid = models.UUIDField(
primary_key=True, default=uuid.uuid4, editable=False
)
file_path = models.FileField(upload_to="files/")
I needed to extract the file names with their extensions from the file_path
column and
create new paths by adding the prefix dir/
before each file name. This would involve
stripping everything before the file name from a file path and adding the prefix, resulting
in a list of new file paths like this: ['dir/file_1.pdf', ..., 'dir/image_2.jpg']
.
Using Django ORM and some imperative Python code you could do the following:
...
# This will give you a queryset with the file paths.
# e.g. <QuerySet ['media/private/file_1.pdf', ... ]>
file_paths = FileCabinet.objects.values_list("file_path", flat=True)
# Now the file names can be collected in a list via a listcomp.
# This will return: ["dir/file_1.pdf", ..., "dir/image_2.jpg"]
file_paths_new = [
f"dir/{file_path.split('/')[-1]}" for file_path in file_paths
]
...
Here, we use the FileCabinet
model to make a query and obtain the file paths. We then use
Python to split the file paths and extract the file names, and add the prefix dir/
to
create the new paths. While this approach is relatively simple, it can be slow and
resource-intensive if the size of the working dataset is large. This is because the entire
working dataset is loaded into memory and the text manipulation is performed in Python.
To improve performance and efficiency, Django offers a declarative approach using expressions. These expressions allow you to offload operations like this to the database, which can be significantly faster and less resource-intensive than the imperative approach, especially for larger querysets. Here’s how you can achieve the same result in a declarative manner:
...
from django.db.models import F, Value
from django.db.models.functions import (
Concat,
Reverse,
Right,
StrIndex,
)
file_cabinet = polls_models.FileCabinet.objects.annotate(
last_occur=StrIndex(Reverse(F("file_path")), Value("/")),
file_name=Right(F("file_path"), F("last_occur") - 1),
file_path_new=Concat(Value("dir/"), F("file_name")),
)
...
You can see the new file paths by inspecting the file_cabinet
queryset as follows:
file_paths_new = file_cabinet.values_list("file_path_new", flat=True)
This will give you the following queryset:
<QuerySet
['dir/file_1.pdf',
'dir/file_2.csv',
'dir/file_3.txt',
'dir/image_1.png',
'dir/image_2.jpg']
>
Now, let’s step through the each of the ORM functionality that was levereged here:
The annotate
function is being used to add additional information to each returned
FileCabinet
object. This function allows you to specify additional fields that should be
calculated and included in the returned queryset. Inside the annotation method, we use F
objects to reference a model field within the query. They can be used to refer to a field’s
value in the context of an update or filter, rather than referring to the actual field
itself.
Three fields are being added to the FileCabinet
objects: last_occur
, file_name
, and
file_path_new
.
last_occur
is being calculated by using the StrIndex
function. This function takes two
arguments: the string to search and the string to search for. In this case, the string being
searched is the file_path
field, but it has been passed through the Reverse
function to
reverse the string. This is done so that the StrIndex
function starts searching from the
end of the string, rather than the beginning. The second argument to StrIndex
is the
string to search for, which in this case is /
. The StrIndex
function returns the
position of the first occurrence of the search string in the main string.
file_name
is being calculated by using the Right
function. This function takes two
arguments: the string to extract from and the number of characters to extract. In this case,
the string being extracted from is the file_path
field, and the number of characters to
extract is specified by the last_occur
field. The last_occur
field represents the
position of the last occurrence of /
in the file_path field, so extracting the characters
from this position onwards gives us the file name with its extension. The - 1
at the end
is used to remove the /
character itself from the extracted string.
Finally, the file_path_new
is constructed by using the Concat
function. This function
takes a variable number of arguments and concatenates them together into a single string. In
this case, the dir/
prefix is being concatenated with file_name
field.
Perfection!
Recent posts
- Hierarchical rate limiting with Redis sorted sets
- Dynamic shell variables
- Link blog in a static site
- Running only a single instance of a process
- Function types and single-method interfaces in Go
- SSH saga
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Quicker startup with module-level __getattr__
- Docker mount revisited