So, you've opened up Django admin and changed or deleted one of your model files. All good... Wait, the old file is still sitting there on the disk?
Django has some interesting behavior, in that, if you update or delete a model file or image field, it doesn’t actually delete the associated file from your filesystem. That's the case whether the file is locally saved or on a hosted storage system.
It’s a pretty common gotcha for most who are coming to terms with Django. With many expecting Django to do its part to keep the filesystem inline with what is currently saved to the database. And to be fair, it is pretty counter-intuitive for the actual file to be retained when you go ahead and press that delete button.
It hasn’t always been the default behavior either. For older versions of Django, files were in-fact deleted from the filesystem automatically when you deleted the instance of the model with the file field. But the release of Django 1.3 put a stop to this, with avoidance of data-loss scenarios being cited as the primary justification in Django docs.
In earlier Django versions, when a model instance containing a FileField was deleted, FileField took it upon itself to also delete the file from the backend storage. This opened the door to several data-loss scenarios, including rolled-back transactions and fields on different models referencing the same file. In Django 1.3, when a model is deleted the FileField’s delete() method won’t be called. If you need cleanup of orphaned files, you’ll need to handle it yourself (for instance, with a custom management command that can be run manually or scheduled to run periodically via e.g. cron).
So, if you are going to follow Django’s advice, you’ll need to set up some kind of scheduler to go ahead and identify, then remove any unused media files from your filesystem at a regular interval. Cron is an option, perhaps using something like django-cron, or there is Celery. And there are some great libraries out there like django-unused-media which have functions packaged into management commands to handle the bulk of the effort.
I want something simple
But many will want to keep things simple and avoid task scheduling all together. After all, it's a fair amount of overhead if you just need to handle files for a handful of models. Or, maybe you are implementing some kind of file upload system, and you don’t want to retain old bulky files on your storage for any longer than necessary once deleted.
If that’s the case, you do have some options. Namely, you can implement signals or extend the save methods directly within your models, in order to override Django’s default behavior and remove the field file on delete or change. So, let’s look through some of the options below.
pre_save signals and post save signals
Probably the safest option is to implement a pre and post save signal for your model. These signals are fired before the model instance is saved (pre_save) and after the model instance is save (post_save). Allowing you to run any kind of custom function directly against that instance. So, what we want to do here, in the case of the pre save signal, is get the path of the old field file via a query on the db and compare that to the path included in the instance. If they are different, we can go ahead and delete the old file field before the save happens.
For the examples below, let’s assume you are working with a model named ‘Image’, and you have an image field also named ‘image’ which you want to delete the associated file for whenever the image field value is changed or removed from the database.
from django.db.models.signals import pre_save, post_delete from django.dispatch import receiver from apps.app.models import Image @receiver(pre_save, sender=Image) def pre_save_image (sender, instance, *args, **kwargs): if not instance.pk: return False try: stored_image = Image.objects.get(pk=instance.pk).image try: image = instance.image except: image = None if image != stored_image: stored_image.delete(save=False) except: pass @receiver(post_delete, sender=Image) def post_save_image(sender, instance, *args, **kwargs): try: instance.image.delete(save=False) except: pass
Let's step through this. For the pre_save signal, we are first checking that a record has actually been created by checking whether it’s primary key exists. Since this signal will also be fired when we create a new instance of the model, we don’t want to try and remove an old file in cases where a file won’t actually exist.
If the instance does exist, we then query against the model to get the current instance file field object. Then, if that exists and differs from the file field contained within the instance to be saved, we can go ahead and run the FileField.delete() method to remove the stored file. For the post_delete signal, things are even simpler. We know the model will have an associated record, so we can just try and delete the file field associated with the instance using the same FieldField.delete() method.
mixins and extending model saves
As another option. We can also look at creating a set of modelmixins which directly extend the model save function in order to do the above clean up. Although the code might look a little more clunky, this kind of approach can provide a lot of flexibility.
Now, your first instinct might be to override the model class __init__ function in order to store the current instance state. But Django does provide some guidance here on using their from_db method instead. We don’t need to re-write the full from_db method, as Django provides in their docs. Instead, let’s just include an extra line to pull the current instance state as a dictionary of field names and values, and then assign it to a new _stored parameter within the instance. Keep in mind, we are only retaining the field names and their current values which translates to the full file path of each file field.
from django.core.files.storage import default_storage class UpdateImageMixin(models.Model): _stored: dict class Meta: abstract = True @classmethod def from_db(cls, db, field_names, values): instance = super().from_db(db, field_names, values) instance._stored = dict(zip(field_names, values)) return instance def save(self, *args, **kwargs): if not self._state.adding: stored_image = self._stored.pop('image', None) if hasattr(self, '_stored') else None if self.image != stored_image: try: default_storage.delete(stored_image) except Exception as e: print(e) super().save(*args, **kwargs)
Note that we are checking self._state.adding before deleting anything. This is similar to checking whether the primary key already exists for that instance, as we did via the signals approach. It allows us to confirm that the actual record is already there and not being created as part of the save function.
The mixin to handle deletion is simpler as we don’t need the from_db method. Instead, we can simply extend the model delete method so that it also deletes the associated file, rather than checking if the field state has changed.
class DeleteImageMixin(models.Model): class Meta: abstract = True def delete(self, using=None, keep_parents=False): try: self.image.storage.delete(save=False) except Exception as e: pass super().delete()
If you don’t want implement something like the above. You do have the option of using a maintained library, which can handle all of this under the surface. The most popular option seems to be django-cleanup. This library makes use of pre/post save and delete signals, similar to the above, in order to keep your filesystem in-check with your models. It might be a bit over-kill for your app, but again, it’ll save you needing to write your own methods.
And there you have it. All the pieces you need to keep your filesystem aligned with your Django database. Just look out for how you will handle those rollbacks!