The introduction of the staticfiles app in django helped a lot with the management of media files by separating the media in two big categories: resource files and user uploaded files.
This separation was a logical step which made it a lot easier to move project from different development stage to production.
But while it help a lot, it's still not a perfect solution. The scenario I will paint is a scenario I have to endure with every django projects I work on. I asked questions in the django community, but never got any clear answer on how to deal with this.
So here's what my typical django day looks like:
I start a new project using SQLite until my models are mature enough.
When I'm ready to show my project to my client, I switch the database to MySQL or PostgreSQL
Then I can give access to my client so he can start fiddling with the admin and add some content
So far everything is fine. I can carry on working on the backend while my client fill the site with content and learn how to use the admin.
But this is where things starts to get ugly.
My client has an app which has either a FileField or an ImageField and start uploading images
Since we share the same database, I can see the new content, but not the newly uploaded Images or Files.
Then I also add some image or files and lo and behold, I now have two different media folders to merge
Things gets even more ugly if we each edit the same object
And if there is more than one developer working on the project, it's gets even worst and more error prone. Soon merging folders becomes a real nightmare.
My first, albeit naive, attempt to solve this problem was simple. I created a script that would push or fetch the media across different stages using rsync. But this got old faster than you can say merge conflicts. It wasn't a solution at all, it was just a way to run into problems more quickly.
Then I started thinking about the awesome django file storage API, which can abstract pretty much any kind of file storage from fylesystem to ftp or cloud. Name a protocol to store files and you can implement it with a reasonable amount of effort or more likely, it already exists somewhere under FOSS license.
Anyway. I started looking at the storage API to use a custom file storage, something that would have sit somewhere between my development environment and my demo server.
But then I realized that because a remote filestorage can be either slow or unavailable, you cannot just plug the custom file storage and hope it will magically work. Since requests are IO blocking, if the file store is slow or worst, unavailable, this can ruin the user experience by raising exponentially the response time of the server.. If response there is.
For this to work well, you need to put something between your file store and your django app, something that will process the file upload in a different process while you app keep minding it's own business: you need a task scheduler. Then you learn that you'll have to setup a message queuing server like rabbitMQ or AMQP and use something like Celery to interface it.
I just want to build a god damn website without having to care about stupid thing like uploaded files while in development/demo phase .. not launch a space rocket for god's sake.
This is simply too much complexity and failure points to introduce in all my projects, it outweigh any potential benefits.
So back to square one.
However this time I decided to consider every realistic ways to do it, including the worst ways, because as Sherlock Holmes says:
Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.
It's the last time I quote a fictive character I swear. That said, I think I've found myself a pretty decent solution: I just can use the database to store files.
Now before you close this window laughing (or crying) uncontrollably, I think you should give my idea a chance.
By no mean I suggest to do this for project deployed in production, the performance hit will kill any database no matter how clever your caching scheme is.
But while in development/demo stage, this can make sense. There's not a whole lot of requests and not a whole lot of data involved.
It does just what is says it does. Instead of saving file on the filesystem, it saves them to the database.
Great. But I don't want this to be permanent because at some point the files wont be in the database anymore.
So I forked the project to better adapt it to my needs. The first step was to preserve the file name/file path in the model, because later I want to create a management command
that will dump the files back to the filesystem so I can push the project in production.
But since I preserved the filename, including the full relative path it introduced another problem: serving the files. The files URLs cannot change or else it will break many things. It was a big issue, because I used the media folder not only to store user uploaded files, but also to store cached thumbnails generated by easy_thumbnails.
So in short, I needed a way to use paths like this one: /media/cache/uploads/blog/photo-1.jpg which uses files directly stored on the filesystem and /media/uploads/blog/photo-1.jpg which
fetch a file from the database.
It turns out that resolving this issue was easier than I expected. I just changed the way I serve static files.
In short, if database_files is installed, it will serve static files with a custom serve method that will check if the file exists on the filesystem before serving them.
If the file doesn't exists, it will look for it in the database and writes it on the filesystem if it exists before calling the original django static serve.
If the files doesn't exists neither on the filesystem nor the database, the original static serve method raise a 404 as it would normally do.
Now the only thing I have left to do is implement last modified date to perform cache invalidation. Win.
It's almost a good enough solution to use even in production (for small projects of course), since the database would not be used to serve files at any point, it would just be used to create or update the cached files.
My solution is currently half implemented, at this point it works but I still need to implement cache invalidation and test it thoroughly. I decided to blog about it early because I wanted to have some feedbacks, maybe there's some points or solution I did not consider.