Huge File Storage in Database instead of File System August 23, 2009

The quest to store large number of files in database or on regular file system began at a time when number of files to manage went beyond 1 million in one of our project. We started noticing a huge drop in performance of our windows server. After investigating we found out that using pure file system to store more then million files is really a bad solution. And following are key points.

  1. Each operation of file open/close requires kernel operation, because it needs to do various security checks.
  2. Every file operation requires some minimum locking mechanism which uses your cpu in kernel mode.
  3. Above operations being kernel operations, they slow down entire server even if your file access may be of least priority then any transactional database operation.
  4. Each services on server slows down.
  5. Even operating system itself manages a file system database (typically known as File Allocation Table and similar names in other systems).
  6. These databases are optimized but not for millions or trillions files. They are good to work under million files.
  7. It causes huge fragmentation on your storage media.
  8. Not only media, but even File System Database gets fragmented, leading to more slow down.
  9. Durability of storage media also can reduce because of too much fragmented usage.
  10. Backup and Restore operations take too long time. You can easily conduct an experiment, on any ftp client, try to upload only one 1Meg file and then try to upload 100 10KB files, although the total transfer is only 1Meg, you will notice uploading 100 files of smaller size takes longer.
  11. Keeping track of changes/deleted items are even more difficult.
  12. Tools for replication and change tracking for file systems aren’t available for huge file storages.
  13. The web server will need write access to some folders which is always a big security hole.
  14. And file system crash = operating system crash and thats  a huge challenge to bring it back up.
  15. For transferring files you need to open ftp and develop some tools to replicate them.

Finally, after all these observations, we started considering other options.

  1. Network File System, it looks promising, but even at other end of node, its simply an operating system. It still has same problems as mentioned above.
  2. Distributed File Systems like Hadoop and other tools require too much of learning, management and I think they are good when you need file storage as huge as Facebook, Yahoo etc. We are mid size company and our need is 1TB for now.

Database Solution:

Finally we began testing database solution with MySQL and then we moved onto MS SQL, however both have good advantages and disadvantages that we will discuss further.

We have following advantages over file systems, and also I have provided solution detail later on this article.

  1. Database systems have matured enough to handle large data easily today.
  2. They use small number of files to store all relational data in well organized form.
  3. They keep their database files open all the time, so the kernel operations of locks are reduced.
  4. You can consider as if your load of entire server is now concentrated only on your database.
  5. Databases are much advanced in terms of their scheduling of queries, you can reduce priority of your file database and database will easily offer more performance on other transactional databases.
  6. Data storage is transactional, where else file systems are not completely transactional. And file systems require more programming to make them transactional.
  7. Remember if its your customers data, even if its a small picture/icon any video.. if they are paying for your service then its better to store those media files in transactional storage.
  8. Database crash doesnt crash entire server and nor even other databases on same server.
  9. Backing up and restoring databases are easier because of the inbuilt tools.
  10. Replication on database are much easier.
  11. Change tracking is available on database.
  12. Database access is standard its not dependent on ftp/http like tools.
  13. Database can easily take over failures of neighbor nodes in network.
  14. You can easily query database and do multiple operations on files.
  15. The biggest point is, your web server does not need any write permission on any folder !!

Comparison between MySQL and MSSQL for file storage

  1. MySQL offers replication over internet, MSSQL doesnt, you will have to write your own replication program.
  2. MSSQL offers auto grow option for database file, with which you can reduce fragmentation. MySQL’s innodb doesnt have such option for database file auto grow method which leads to more fragmentation.
  3. MSSQL offers easy incremental backup/restore and more over, in case of crash, you can use actual data folder files to restore, where else MySQL’s innodb doesnt support hot backups so either you buy their hotbackup tool or rely on mysql’s backup tool which is not incremental.
  4. MSSQL costs $1600 for standard edition, where else MySQL innodb is free, but hotbackup or incremental backup tools will cost exactly the same. MSSQL wins over mysql for offering better file management and increasing disc durability.

Sample Solution

After reading various articles, we came to conclusion that storing entire file in one blob or image doesnt make sense as it will deteriorate performance like anything.

So we came with easy solution, which is already used by existing file systems. Thats “Breaking down file into smaller blobs, max 512kb”.

As MySQL and MSSQL both have network packet size limit, you should not store more then 512kb in one row. Otherwise you will have to increase network packet size and that will slow down database server. Following is easy small table structure we used.

Table: NSFiles

NSFileID (Auto Number) (bigint)
NSFileName (varchar 50)
NSContentType (varchar 100) ** This is most important for web access

Table: NSFileBlobs

NSFileBlobID (Auto Number) (bigint)
NSFileID (bigint) (indexed) (Reference of NSFiles.NSFileID)
NSBlobIndex (int)
NSBlobData (medium blob or image)

Following are Storage/Retrieval methods specified in C# with using Linq or any similar ORML tool.

//Upload Handler
NSFile file = new NSFile();
file.NSFileName = .....
file.NSContentType = Request.Files[0].ContentType;
file.Insert(); // <-- insert into database and get new unique NSFileID
byte[] buf = new byte[512000];
int i = 0;
using(MemoryStream ms = new MemoryStream(Request.Files[0].FileData))
{
int count = ms.Read(buf,0,512000);
while(count>0)
{
NSFileBlob blob  = new NSFileBlob();
blob.NSFileID = file.NSFileID;
blob.NSBlobIndex = i++;
if(count==512000)
{
blob.NSBlobData = buf;
}
else
{  // last blob will always be less then 512kb
blob.NSBlobData = new byte[count];
Buffer.BlockCopy(buf, 0 , blob.NSBlobData, 0 , count);
}
blob.Insert();
}
}

And following is simple code for Download Handler

// Download Handler
long fileID = long.Parse(Request.QueryString["FileID"]);
NSFile file = NSFile.Adapter.Get(fileID);
Response.ContentType = file.NSContentType; // important for web browser
// following line causes "Save As" box on client
Response.AddHeader("content-disposition","attachment;filename=\"" + file.NSFileName + "\"");
// following loops over all file blobs whose NSFileID is fileID and they are sorted by blob index
NSFileBlob.Adapter.ForEach(
NSFileBlob.Schema.NSFileID == fileID,
NSFileBlob.Schema.NSBlobIndex.SortAs(),
(t)=>{
Response.OutputStream.Write(t.NSBlobData,0,t.NSBlobData.Length);
}
);

  • Share/Bookmark
11 Comments

[...] Read more: Akash Kava [...]

Florin Diaconeasa August 25th, 2009

Another solution could be partitionioning your data using mysql 5.1+. This way you would have better performance in writing and reading also. Also, mysql innodb engine supports different files for each table, so it’s not necessary to have only one big file, as the default mysql innodb configuration does. :)

If interested in more details, you’ve got the email :)

Akash August 25th, 2009

If you do different files per table, they grow very slowly leading to huge fragmentation, problem is there is no auto grow value, if you look at MySQL’s data folder fragmentation and MS SQL’s data folder, we had it first in MySQL and then we went to MS SQL.

Florin Diaconeasa August 25th, 2009

hmm.. yes i didn’t think it thru :)

but also… holding such a number of files would prove to be better on a unix machine, especially open-source, as there are multiple solutions for this kind of problems.

For example, stripping the rights management on a file system(which is not such a crazy idea, already being implemented by facebook) plus other optimizations.

Not trying to start a war here :) , just pointing out some facts. But then again, quite hard to do smthg like this when your company’s profile is MS based

Akash August 26th, 2009

I appreciate your comments, unix/linux based systems optimized for file system storage only like facebook and other companies did can definitely be better then MS or DB solution. I especially mentioned that distributed file systems like hadoop and such systems basically requires good man power expert in linux/unix technologies mainly at C and compiler level. But they are difficult to find, hire and maintain. Hiring 1 system level expert is 100 times difficult then hiring 10 SQL+ASP.NET guys. Thats why I also mentioned that from 1GB to 1TB kind of solution, this database based system makes easier bet. Beyond that certainly you need dedicated staff to manage your files !!

Sarwar Faruque August 26th, 2009

While having no intention of turning this into a Linux vs. Microsoft topic, have you considered using Linux for a file management system? Two years ago, I built a web based document management systems for an university. The system had user access permissions and document version tracking, running on a LAMP stack. While the permission and tracking part was shouldered by the database, all the files were stored on a separate Linux server with a scheduler automatically backing up everything in 2 hour intervals. To date, the number of documents have been steadily growing and while it hasn’t hit the 1 million marker, it certainly has passed the half-million point in the last two years.

Akash August 26th, 2009

1. My post doesnt focus on which solution, MS or Linux to go for, it says about whether file system or database and I am not against anyone storing it in file system or not.
2. File system are not transactional, they are not ACID compliant.
3. People can still use mysql, we are not against anyone, we are talking here about locking and cpu overhead and disc life cycle and durability.

PUK October 28th, 2009

I found this article helpful to bolster what I already instinctively felt that database is better than filesystem. It’s much more portable, atomic, relational, transactional, etc, etc. so thanks.

BTW, This area of the page (Leave a Reply) renders ok in IE (v7) but in Firefox the captcha input is over the email label!

N.B. This is my 4th attempt at getting the captcha right…

PUK

ypaez December 5th, 2009

Great job, nice & easy solution, but looking arround this topic I found that SQL Server 2008 include a new feature called “FILESTREAM Storage” http://technet.microsoft.com/en-us/library/bb933995.aspx
Have you been tested this feature? Does it have a better performance that your solution?

Akash Kava December 9th, 2009

FileStream is good but I dont know if it is ACID compliant, it will be faster but I dont know about replication and transaction. As far as I know its raw file access within SQL Context, but it is not recommended over Blob. Yes if your files are big, and they are of less financial value, like all crap uploaded to youtube, for such data you can use filestream. But if you create a document management service, where your document (files) are of bigger financial value, then you should not use filestream.

Sachin Modak December 21st, 2009

Hi Akash,

I was reading through this blog and I was wondering if you considered having a flat file/control file design to keep track of files? Also something about storing files in a database just does not seem very performance centric? I was wondering if we can discuss this sometime? BTW, we can even do a phone call. Let me know.

sachin.modak@gmail.com

Leave a Reply