I was working on a DRF POST API endpoint where the consumer is expected to add a URL containing a PDF file and the system would then download the file and save it to an S3 bucket. While this sounds quite straightforward, there’s one big issue. Before I started working on it, the core logic looked like this:
# src.py
from __future__ import annoatations
from urllib.request import urlopen
import tempfile
from shutil import copyfileobj
def save_to_s3(src_url: str, dest_url: str) -> None:
with tempfile.NamedTemporaryFile() as file:
with urlopen(src_url) as response:
# This stdlib function saves the content of the file
# in 'file'.
copyfileobj(response, file)
# Logic to save file in s3.
_save_to_s3(des_url)
if __name__ == "__main__":
save_to_s3(
"https://citeseerx.ist.psu.edu/viewdoc/download?"
"doi=10.1.1.92.4846&rep=rep1&type=pdf",
"https://s3-url.com",
)
In the above snippet, there’s no guardrail against how large the target file can be. You could bring the entire server down to its knees by posting a link to a ginormous file. The server would be busy downloading the file and keep consuming resources.