Thursday, April 26, 2007

Storing files uploaded into SharePoint Libraries/Galleries on the File System

One of the problems with the current version (and previous versions) of SharePoint is the fact that all the files uploaded into SharePoint are stored in the DB no matter what. Now for most scenarios this is great. For example you can store metadata alongside your file, have versioning and approval for you file, use SharePoint's check out and check in features, etc.

The fact that you're storing files in the DB also proves itself useful when your site is running on a server farm. Anyone who has developed a web site on a server farm has experienced the problem where users upload a file (and it gets uploaded into one of the servers) and then a user that hits the other servers can see the file (since we are listing files from a table in the DB) but when they click to download the file they get a FileNotFoundException (brings back memories doesn't it?). The fact that SharePoint stores all files uploaded into the site in the DB completely takes away all server synchronization issues when developing for a server farm.

Enough story telling let's get on with what this post is all about:

Now in certain scenarios (even on a server farm) you might need/want the files that the end user uploads into a SharePoint Library or Gallery to be stored in the file system. In some cases you only want them stored in the file system and in other cases you might just want them replicated in a specific file share on a server. Now a very classic example of this is when your site's content contributors are going to be uploading video/audio files into a secure library (where they only have access to) but you want to make all these files available for online usage through a Streaming Server. In other words you want the public users of the site to be able to playback the video/audio files through an embedded media player that is connected to your sites Streaming Server.

Now as far as I know most streaming servers can stream video/audio that is provided for them in a specific directory (they can't just go poking around in the SharePoint DB for traces of files!) so what can we do. As far as SharePoint support goes well there is none for this requirement. Hopefully the guys over at Microsoft will think of something by the next version/release of SharePoint but for now were stuck with some temporary solution:

Solution: to solve this problem I wrote a list event handler that would sit on the appropriate events of the specific Library. Once a file was added (or updated/deleted) I would perform the appropriate operation in the file system. As an example let's say we want all files added to be copied into our server's c:\StreamSource directory and also deleted from that directory once they are moved off of SharePoint. Before we get into code I should point out that when you upload a file into a SharePoint library the actual upload happens as a two part process. In the first step the file is uploaded and an ItemAdding/ItemAdded is raised then the user is presented with the Meta Data screen for that file and when they enter the meta data and click on the 'Check In' button the ItemUpdating/ItemUpdated event is raised. So if your replication of the file depends on the files meta data the replication has to happen in ItemUpdating (or ItemUpdated). Now for this example I'm going to use ItemUpdating to take care of the file replication:

public class FileReplicatorEventHandler : SPItemEventReceiver

{

public override void ItemUpdating(SPItemEventProperties properties)

{

byte[] content = properties.ListItem.File.OpenBinary();

string[] nameParts = properties.AfterUrl.Split('/');


string fileName = nameParts[nameParts.Length - 1]; //get the filename

using (FileStream fs = new FileStream(@"c:\StreamingSource\" + fileName,

FileMode.Create, FileAccess.Write, FileShare.None))

{

fs.Write(content, 0, content.Length);

}

}

}


Now if our end user goes and deletes an item from the library we want to make sure that the associated file in our "c:\streamingSource\" directory is also deleted. So here is the code to take care of that:


public override void ItemDeleting(SPItemEventProperties properties)

{

string[] nameParts = properties.BeforeUrl.Split('/');


string fileName = nameParts[nameParts.Length - 1];


File.Delete(@"c:\StreamingSource\" + fileName);

}


Based on this solution you could write a general component that can take care of any type of file replication on SharePoint. As a matter of fact if I find the time I'll try to create a general component and uploaded it here (or if anyone does it before me comment here so I won't waste my time. J ).

9 comments:

ungeeker said...

Hi there, thanks for the post. Seems to me the code is actually getting the previous file version, not the one you are just adding. I'm really stuck trying to access the file for the version about to be added. Am I missing something?

Kevin said...

Hi! Great post! Where would I put this code? I have SPD 2007.

Siva said...

Can u provide me some more detailed information or else providing a general component. I want to know if we can use retention policy, versioning and other sharepoint features by storing it on file system

dinesh said...
This comment has been removed by the author.
Dinesh Bolkensteyn said...

Well cool post, but this actually doesn't work properly.

You have to handle both ItemAdding/ItemAdded and ItemUpdating/ItemUpdated.

Indeed, what can happen is that the user uploads a file, and on the meta page just clicks cancel.

The file will anyway already be uploaded, just without metadata.

So if you relied on ItemUpdating/ItemUpdated only to synchronize your filesystem, you missed 1 file, and you're out of sync.

Now the real issue: I did not find (yet?) how to read the file's content from the ItemAdding event (let's say you want to cancel the event if the an error occurs while writing on the filesystem, to ensure that you are always in sync).

Indeed, ListItem during ItemAdding is still null (obviously).

But, it looks like that "BeforeUrl" and "AfterUrl" (by default the same content) are also unusuable.

Here is the code:
public override void ItemAdding(SPItemEventProperties properties)
{
base.ItemAdding(properties);

byte[] imageData;
using (WebClient webClient = new WebClient())
{
webClient.UseDefaultCredentials = true;
imageData = webClient.DownloadData(properties.WebUrl + "/" + properties.BeforeUrl);
}

properties.ErrorMessage = "The image had " + imageData.Length + " bytes.";
properties.Cancel = true;
}

The DownloadData() method is going to throw a 404 not found exception, because SharePoint did not yet make the picture available over the web.

So the big question is, how to get the content ?

It seems that, moreover, HttpContext.Current is null as well. So again, no (dirty) way.

The only "solution" I see is to go for ItemAdded, but then you have no way to cancel the event in case of error.

Dinesh Bolkensteyn said...

@ungeeker : Yes, that's probably the behaviour if you use Updating. In Updated however, you should get the latest version. If you found how to get the latest version in Updating, please let me know how.

@Kevin : You put this code in an event handler that you deploy to the GAC. You use Visual Studio and not SharePoint Designer to do so. This is C# code.

@Siva : The code hereabove is actually going to story a *copy* on the filesystem. The file is still available from within SharePoint (with versioning, retention policy, etc..). So it's both in the SharePoint database and the filesystem.

anil.b said...

Hi dinesh,

I am able to get the httpcontext.current object in the event listener but how do i read teh file content from this.

jrbudnack said...

Here's a posting on how to bind the HttpContext object in your event receiver, and access the current file contents using the Request.Files collection:

How To Access The Latest File Version in the ItemUpdating Event Receiver

Dinesh Bolkensteyn said...

Thanks for the update. So it is indeed possible :) Nevermind ^^