Wednesday, March 31, 2010

Efficient batch deletion from SharePoint List without filling up the site RecycleBin

For a data management function I need a performant way to first delete all items from a SharePoint list, before re-filling it with up-to-date content. Looping through the list and delete each item one-by-one is not acceptable. It would instantiate an internal SPRequest object for each single listitem deletion, and take a longer time to complete. The answer to do it time-efficient is to use the SPWeb method ProcessBatchData(). You feed this method with a batch string of multiple delete commands to perform in a single SharePoint request.
However, usage of ProcessBatchData has the disadvantage that all deleted items are put in the SharePoint SPSite Recycle Bin. Kinda ridiculous for a programmatic batch-based deletion. And because of the larger number of deleted items (the reason for doing the deletion batch-based...), it makes the Recycle Bin pretty much unworkable thus useless for UI initiated restore of manual deleted content items.
SharePoint does not provide an elegant way to prevent this standard behaviour. Elsewhere on the web it is suggested to temporarily disable the Recycle Bin at SPWebApplication level. This however suffers from 2 drawbacks:
  1. The current user needs to have the SPFarm administrator role. This is unlikely for a functional management role, and unacceptable from a security viewpoint.
  2. Disabling the Recycle Bin at webapplication level has a nasty side-effect. It namely clears all Recycle Bins in the webapplication. Not only that of your current context site and sitecollection, but all of them in the webapplication. This effectively destroys the Recycle Bin backup-purpose for manual deleted items; and it is thus not isolated to your scope but affects the entire webapplication. This is functional unacceptable.
So if it's not possible to prevent the batch deleted items from appearing in the Recycle Bin, it is then required to delete them there afterwards. This could be done via a call to SPContext.Current.Web.RecycleBin.DeleteAll(). But this clears the entire Recycle Bin, still potential removing too much. What we need is an approach to delete exactly those items from the Recycle Bin that were put there as result of the list batch-deletion. The Recycle Bin has a different API interface as regular SharePoint lists. Usage of ProcessBatchData to delete the same relevant items from the Recycle Bin is not possible. But it is undesirable to first clear the SharePoint list via a batch-deletion, and next be forced to loop-based remove the same items from the Recycle Bin. Luckily the SPRecycleBinItemCollection class exposes its own method to issue a batch deletion: SPRecycleBinItemCollection.Delete(GUID[]). It has a different method signature, which requires you to first determine the GUID per SPRecycleBinItem. Also you must take care of keeping the chunck of deletion below a thresshold. SharePoint namely locks the database tables when executing the Recycle Bin batch deletion, which hangs the SharePoint server upon deleting a larger set. The Recycle Bin cleanup/reset can be done asynchronous in the background, doing it chunk based.
The code for efficient batch deletion all items of a list, and next clear the same deleted items from the Recycle Bin is as follows:

1. Control for time-efficient purge all items from SharePoint List

2. Cleanup the Recycle Bin in the background

Monday, March 29, 2010

Beware: usage of ImageButton can cause a 401 NotAuthorized in external facing site

In one of our custom webparts we are using the ASP.NET ImageButton class. Instead of setting the image-source via the ImageUrl property, we specify the image via CSS. All works fine. That is, until we "got anonymous". Suddenly when browsing to a page with our WebPart on it, the (acceptance test) end-user was confronted with the IE loginbox. Via Fiddler (how I love this http debugging tool) I quickly detected the indirect cause: IE issued an HTTP GET for the URL of the Pages document library. In an authenticated mode this will pass unnoticed. But for an anonymous user the particular HTTP GET will result in a 401 NotAuthorized on trying to directly access the Pages folder URL.
So the next question was what caused the browser to issue this request for the Pages URL? Well, this was the direct result of our on intention lack of setting the ImageUrl property, in favor of flexible CSS specification. The resulting HTML is then something like:
<input type="image" name=" ... " class="ImgClass" src="" ... />
IE (6) browsers interpretate the empty image-src attribute as referring to the relative root of the webpage. In our case with the WebPart placed on a publishing page, this translates to a request for the Pages folder URL. FireFox and Chrome do not exhibit this behavior, they simple ignore the empty source attribute as it should. The solution for preventing IE to request the NotAuthorized Pages URL is to set a non-empty value for the ImageUrl property. It’s not even required to point to an existent image file. In that case the HTTP request will silently result in a 404 NotFound HTTP response. And since the image is set via CSS, the visual effect in the UI remains unnoticed for the end-user. I dislike polluting the HTTP transfer between browser and server with requests which are known before to fail. Therefore I point to ImageUrl to the current and available image-file. Via CSS it is then still possible to later flexible modify the displayed image.