Wednesday, March 31, 2010

Efficient batch deletion from SharePoint List without filling up the site RecycleBin

For a data management function I need a performant way to first delete all items from a SharePoint list, before re-filling it with up-to-date content. Looping through the list and delete each item one-by-one is not acceptable. It would instantiate an internal SPRequest object for each single listitem deletion, and take a longer time to complete. The answer to do it time-efficient is to use the SPWeb method ProcessBatchData(). You feed this method with a batch string of multiple delete commands to perform in a single SharePoint request.
However, usage of ProcessBatchData has the disadvantage that all deleted items are put in the SharePoint SPSite Recycle Bin. Kinda ridiculous for a programmatic batch-based deletion. And because of the larger number of deleted items (the reason for doing the deletion batch-based...), it makes the Recycle Bin pretty much unworkable thus useless for UI initiated restore of manual deleted content items.
SharePoint does not provide an elegant way to prevent this standard behaviour. Elsewhere on the web it is suggested to temporarily disable the Recycle Bin at SPWebApplication level. This however suffers from 2 drawbacks:
  1. The current user needs to have the SPFarm administrator role. This is unlikely for a functional management role, and unacceptable from a security viewpoint.
  2. Disabling the Recycle Bin at webapplication level has a nasty side-effect. It namely clears all Recycle Bins in the webapplication. Not only that of your current context site and sitecollection, but all of them in the webapplication. This effectively destroys the Recycle Bin backup-purpose for manual deleted items; and it is thus not isolated to your scope but affects the entire webapplication. This is functional unacceptable.
So if it's not possible to prevent the batch deleted items from appearing in the Recycle Bin, it is then required to delete them there afterwards. This could be done via a call to SPContext.Current.Web.RecycleBin.DeleteAll(). But this clears the entire Recycle Bin, still potential removing too much. What we need is an approach to delete exactly those items from the Recycle Bin that were put there as result of the list batch-deletion. The Recycle Bin has a different API interface as regular SharePoint lists. Usage of ProcessBatchData to delete the same relevant items from the Recycle Bin is not possible. But it is undesirable to first clear the SharePoint list via a batch-deletion, and next be forced to loop-based remove the same items from the Recycle Bin. Luckily the SPRecycleBinItemCollection class exposes its own method to issue a batch deletion: SPRecycleBinItemCollection.Delete(GUID[]). It has a different method signature, which requires you to first determine the GUID per SPRecycleBinItem. Also you must take care of keeping the chunck of deletion below a thresshold. SharePoint namely locks the database tables when executing the Recycle Bin batch deletion, which hangs the SharePoint server upon deleting a larger set. The Recycle Bin cleanup/reset can be done asynchronous in the background, doing it chunk based.
The code for efficient batch deletion all items of a list, and next clear the same deleted items from the Recycle Bin is as follows:

1. Control for time-efficient purge all items from SharePoint List

2. Cleanup the Recycle Bin in the background

6 comments:

  1. This code assumes that SharePoint will return the recycle bin array in date ascending order. The testing I've done shows it does not always do so. Could it be made to do so?

    ReplyDelete
  2. The code below assures that the query will return in deleted date order.

    SPRecycleBinQuery RBQuery = new SPRecycleBinQuery();
    RBQuery.OrderBy = SPRecycleBinOrderBy.DeletedDate;
    SPRecycleBinItemCollection recycleBinAfterBatchDelete = site.GetRecycleBinItems(RBQuery);

    ReplyDelete
  3. Hi Christopher,

    Thanks for your comment + code suggestion.
    However, it is in this context not needed.
    The deletion process operates on index base in the recyclebin; deleted items are added not inserted in the recyclebin collection. Note also that in the used context, date will be the same for all deleted items ==>these where deleted from the source SPList via ProcessBatchData in one single operation.

    ReplyDelete
  4. Fantastic, thank you for providing this. Saved me lots of time!

    ReplyDelete
  5. What would happen if an user manually deletes an item while your batch operation gets processed? This case could lead to a deletion of an unwanted item.

    I guess if you only have a single (and huge) ``ProcessBatchData`` transaction your code would be almost perfect. There is only a tiny time frame between ``ProcessBatchData`` and the count of the recycle bin items which could lead to such an indifference.

    On the other hand it doesn't feel right to include thousands of commands inside a single transaction. Good investigation though. Sometimes SP feels like the wrong platform to me...

    ReplyDelete
    Replies
    1. You make a valid observation, the approach is indeed vulnerable in case of parallel operation with human deletion. Note that the massive deletion from a single list will typically be a maintenance action (if I recall correct, I've came up with this as component to re-provision a site with updated content as new version during project lifetime, but not sure as this post is of ages ago (I think it was even SP 2007 then), nice to see it is still being found and "valued"). Overlap with human deletion can as part of the maintenance window be prevented by making the site (collection) readonly during the maintenance window. This is normal change execution policy.

      Delete