Checking MIME types for uploaded files

Darryl Buswell

Allowing image uploads from users, check. Time to put in a MIME type check for those uploaded files.

If you're allowing file uploads from your users, whether that's for images or not. It might be a good idea to put a MIME type check to ensure that you are accepting the right type and format of file. What's a MIME type you ask? Well, put simply, Multipurpose Internet Mail Extension (MIME) is a standard used to identify files based on their type of content. It allows you to check that a file is actually the type of file you expect. Not just according to the extension of the file (e.g. is that image.jpg actually a jpeg image?). And it can be a critical first-line defence against handling malicious uploads.

So, how would we achieve something like this using Python and Django? Fortunately, Python has some fantastic libraries out there which take the grunt work out of checking for file MIME types. The python-magic library for example, is a common standard here and is pretty lightweight too, making MIME type checking a breeze. But how can we best integrate something like magic into Django to help control MIME types for model fields? Well, that's where a custom validator comes in handy.

A custom validator you say?

Django model fields allow easy integration of callables which can raise a ValidationError if the field input doesn't meet some criteria. The Django docs offer a simple example of enforcing only even numbers to be inputted into a model IntegerField. But really, we can create custom validators to handle almost any type of file validation, including MIME type checking for FileInputs. So, when you next go to save content to the model field, Django will use your custom validator to check it meets your criteria. And if it doesn't, it will raise a ValidationError for you to dish out to the user.

So what's the use case? Well, a very common example is checking MIME types for image uploads. Keep in mind that Django in-fact offers an ImageField for image files as standard. And this field will do a check of the file type for any image which is saved to the field. But it will do so by checking all the possible valid image types which are supported by the Pillow library. And that's a very long list of types. Whereas, in practice, you'll no doubt will want stricter control around the exact image formats users are able to upload. For example, you may only want to allow users to upload jpeg or png images so that you can better handle some type of image processing after the file has been uploaded.

So, to do this, we are going to extend the validation classes for an ImageField within our Django model and use python-magic to check that any saved image is of a certain image MIME type before it is saved.

To start things off, make sure that you have included python-magic in your list of project requirements. As of writing this post, it looks like python-magic is on 0.4.24.


python-magic==0.4.24

Next, let's create our validation class.

validators.py



import magic
from django.core.exceptions import ValidationError
from django.utils.deconstruct import deconstructible
from django.utils.translation import ugettext as _

@deconstructible
class MimetypeValidator(object):

    def __init__(self, mimetypes, message=None, code='file-type'):
        self.mimetypes = mimetypes
        self.message = message
        self.code = code

    def __call__(self, value):

        try:
            mime = magic.from_buffer(value.read(2048), mime=True)

            if not mime in self.mimetypes:

                if not self.message:
                    raise ValidationError(_('%s is not an acceptable file type.') % value, code=self.code)

                else:
                    raise ValidationError(_(self.message), code=self.code)

        except AttributeError as e:
            raise ValidationError('Value could not be validated for file type %s.' % value, code='file-type')

So there are a few pieces to this. First off, we are going to allow three parameters to be passed to our instanced MIME validator. That includes a list of MIME types, obviously. But we are also going to allow both a custom message and error code for if a ValidationError is raised. From there, the validation is pretty straight forward. We are reading in the first 2,048 bytes of the file, and using that to retrieve its MIME type. Then, if that MIME type is not in our list of accepted MIME types. We are raising a ValidationError. One big caveat here however, is that you may not successfully identify the MIME types of all files using this method and may in-fact have a generic type, such as 'application/octet-stream' returned instead. We will leave it to you as how you want to handle these edge cases, but you could of course read the entire file as a next step.

Next, let's head over to our Django model, and add the custom validator.

models.py



from django.db import models
from apps.validators import MimetypeValidator

class Model(models.Model):
    mimetype_validator = MimetypeValidator(['image/png', 'image/jpg', 'image/jpeg', 'image/bmp'])

    image = models.ImageField(
        max_length=255,
        blank=False,
        null=False,
        help_text=_('Please provide an image (.jpg, .jpeg, .png, .bmp).'),
        validators=[mimetype_validator]
    )

Here we are allowing four MIME types. Namely, for png, jpeg and bmp images. And we have updated the help text for our ImageField to guide the user on what image formats we are allowing. So now, if the user were to upload an image type file which has an identified MIME type outside of our accepted list, we will return a ValidationError of "filename.ext is not an acceptable file type.", rather than saving the image.




Sign up for our newsletter

Stay up to date with our product releases, announcements, and exclusive discounts by signing up to our newsletter.