Tuesday, December 29, 2015

The long quest to reinstate SharePoint Explorer View performance

In our company, business end-users like SharePoint Explorer View for its usage familiarity with shared folders access. Our business end-users reported non-performance of the SharePoint Explorer View. It used to perform satisfactory, but since a while the responsiveness structural degraded. Taking 30 seconds or even minutes to open the Explorer View initial. On immediate retry, it typically opens direct. Wait a while, and the problem repeats itself.
It proofed problematic to establish the root cause of this performance degradation. SharePoint Explorer View functionality depends on a stack of IT components, local, network and SharePoint server side.
We identified – based on literature study (internet search), and own common sense – the following list of potential causes:
  1. Slow SharePoint processing (IIS, SharePoint code)
  2. Slow SharePoint content retrieval (SQL, storage)
  3. Slow or blocking network
  4. Slow authentication handling (NT Explorer as web client to SharePoint server, e.g. see Prompt for Credentials When Accessing FQDN Sites From a Windows system)
  5. Slow Web Proxy Auto Detection (see Explorer View very poor performance, Slow response working with WebDAV resources on Windows systems)
  6. SMB protocol blocked (Source: Microsoft Whitepaper "Understanding and Troubleshooting SharePoint Explorer View")
  7. Interference with IIS WebDAV Publication Service at SharePoint server (SharePoint – Open with Windows Explorer – problems, 5. WebDAV Publishing, and Explorer view does not work in some scenarios when the SharePoint farm is on Windows Server 2008 R2)
  8. Conquer with network requests from other local running programs
  9. Network requests blocked or delayed by anti-virus processing
  10. Slowness in local WebDAV client processing (Windows Explorer as WebDAV client)
  11. Outdated local Windows binaries (Explorer binaries (Shell32.dll), WebDAV binaries (Webclnt.dll, Davclnt.dll, Mrxdav.sys) & SMB binaries (Mrxsmb.sys, Mrxsmb20.sys, Rdbss.sys))
  12. Interference with Internet Explorer AddOn's
It was hard to identify whether any of the above really had the negative impact. This is in particular due that it is difficult to get the overall view on the processing of all the involved IT components: local, virus scanner, network, firewall, load-balancer, server side, SQL, storage, … In reality it boils down on inspecting the behaviour of each single component, and then try to correlate it with the behaviour of the other components to gain insight in the overall picture. We inspected IIS logs to detect whether the long times originated on SharePoint processing of the WebDAV protocol, but this did not result in rootcause identification. Then I applied Fiddler to detect what is going on over the wire. And as Fiddler is limited to http protocol, I also used Wireshark to dig deeper in on the wire level, and also including other protocols. None of these exercises did result in cause identification. If any investigation outcome, my cautious conclusion was that the delay is not on the wire nor SharePoint processing, but rather originates on the local client. To investigate that, I used ProcessMonitor; benchmarking scenarios with and without opening Explorer View. I did see some noticeable differences and thus suspects for the delay: AntiVirus processing, extensive additional operation of svchost.exe; but could not make either a final confirmation or refute of this as symptom or cause.
As the progress in problem investigation was stalled, we involved Microsoft Premier Support. The engineers started with investigating our captures on problem occurrences – Wireshark, Fiddler, Process Monitor, Netsh. Their initial analysis confirmed my own finding that it was not due network delay. Next suspect on the Microsoft list was local interference with AntiVirus filters. To conform or exclude this as cause, we uninstalled the AntiVirus on a test client. Afterwards, the problem still manifested on that test client. Via a renewed scan of the network captures, Microsoft identified a 3rd suspect: delay caused on local Webclient due provider priority handling with SMB protocol above WebDAV. I actually though of this myself a few weeks before, inspired by an old (2006!) whitepaper of Microsoft Services Support. The symptoms of our issue namely matched with what is described in that whitepaper. However, for unclear reason – lack of understanding the complexity of the WebClient handling, no trust in whitepaper given its age? – my suggestion on this as problem area was not in-depth investigated by our operations service provider. As now Microsoft suggested basically the same, I convinced our operations to conduct a pragmatic validation to determine or exclude it as the problem cause. The simple first test is to block ICMP traffic on the SharePoint farm, and then re-test the performance of SharePoint Explorer View. But also for this the results were negative: no performance improvement in Explorer View observed after blocking ICMP traffic.
We finally had a breakthrough when we noticed that the Explorer View slowness did not occur in infra scenario in which the laptop via VPN connects to our SharePoint farm. It was then a matter of identifying the differences in the IT stack of components of on-premisse access versus VPN. Also I made Wireshark captures of the 2 different infra scenarios. In the Wireshark captures we noticed that in the on-premisse scenario, the retrieval of the browser Proxy Auto Configuration (PAC) file repeatedly timed-out. In the VPN scenario, this effect did not occur. The explanation of this is the different network path to retrieve the PAC file, and that of the on-premisse situation included a blocking IT node (which actually is a cloud-hosted solution).
As it turned out, this was indeed the root cause. We resolved the blocking of the PAC file through caching it on the network perimeter, and immediate we regained the performance of SharePoint Explorer View.
Bonus: the last mile for accelerating Explorer View performance ...
With this infra correction, performance of using ‘SharePoint Explorer View’ is greatly improved: opens up almost immediately, < 1 second. That is, except for the very 1st time: that takes 4-6 seconds.
The cause of this is on Windows OS level: Windows Explorer utilizes WebClient component to connect via WebDav protocol. And the very 1st time this WebClient component must be instantiated / brought to live, what takes the additional 3-4 seconds.
If you also want to get rid of those seconds, you can set the startup mode of WebClient to ‘automatic’:

Monday, December 21, 2015

Our approach to deliver aligned business solutions

In my role as Collaboration Solution Architect I align with our business stakeholders to deliver company internal collaboration solutions. In these solution alignments, the technology is of secondary nature. Its impact is both as enabler as well constraining. Our technology options are in line with the generic enterprise IT architecture: SAP and Microsoft unless. So we apply SAP platform for some dedicated collaboration scenarios, and Microsoft for almost all the rest. As most SharePoint customers came to realize, we also impose governance in the collaboration solution setups we are allowed to deliver to our business. The overarching principle is to deliver only future-proof solutions. This translates to apply standard / out-of-the-box platform capabilities where feasible, and restrain from building custom solutions just for the sake of building custom solutions. And in case custom solution is needed to deliver on the requested functionality, then comply with the Microsoft guideline to stay away from farm-based solutions and instead rely on the Add-In model.
So, how does this work in practice? Let me clarify by a true example. Few months ago I was invited for an alignment meeting with our internal finance department. The title of the meeting was ‘BI portal’. In the meeting, the business stakeholders told about their functional intent. And extended on all of their requirements. Being aware of the technology strategy, they upfront realized that the target platform would be SharePoint. And they already themselves build an image on how the solution should look like.
Next step was to map their functional vision, and detailed requirements to feasibility. Initial focus here is to discuss and challenge on the functional vision. Not that I’m the subject expert, they are, but still it makes sense to ‘walk through’ the functional vision to evaluate it on true added business value. Next is to map the vision, and the detailed requirements, to the technology. How will the generic setup be, what can be delivered out-of-the-box via SharePoint features, what can be delivered through customization, what would require custom solutions, and what is not allowed due our governance constraints? Helpful in both alignments – the functional and the technical – is to use examples of solutions delivered for others, to trigger and inspire. These example solutions can be your own’s, but also of course from anywhere. The SharePoint community is very generous in sharing knowledge and experiences.
As follow-up of the alignment meeting, I sketched a potential solution direction on high level. I deliberately use PowerPoint as format for this, as that by its nature limits you for over-extensive writings. Outline of the solution architecture is: a) Sketch of the context, b) Main requirements, c) UI impressions (mockups), d) Global Design utilizing SharePoint platform capabilities, e) the information architecture. a) and b) serve to verify whether my understanding of the request is valid, and c) is to agree on the user experience.
After fine-tuning on business aspects with the stakeholders, next step is to get technical consent: d) and e). At minimal, communicate the setup of the solution direction with technical peers and our SharePoint operations. In our company we have formalized this via ‘Templace Control Teams’ per technology platform.
If and after both functional and technical alignment, next is to ‘build’ it in agile manner. Deliver a first version, not feature-complete yet, but it must already have functional meaning and value. Demonstrate in a ‘sprint-demo’ to business, discuss on behaviour and new insights, and deliver these in next ‘sprint’.
If the solution setup is restricted to SharePoint standard only, and potential customization as 'SharePoint content' (html, CSS, javascript), it is possible to build up the application direct in the production collaboration space. Although I then typically choose to first build it in my own 'development/playground' site, and after business agreement deploy it to the target location by repeating the 'content-based' provisioning.

Thursday, December 17, 2015

Beware: List Validation Settings also effective in Workflow execution

I’m provisioning a ‘BPM-light’ process in our SharePoint ‘business platform’. Utilized SharePoint building blocks are Document Sets, Content Types and SharePoint Designer Workflow. The ‘light’ solution worked correct when demonstrating on first evaluation-time ('sprint') to the designated end-users. Of course they had some additional fine-tuning requests, which is just the way to deliver and align on business functionality in agile manner. After I implemented some of the minor additional changes, the workflow no longer functioned and reported an error.
The workflow error message exposed a problem with setting a value in the current item, of type Choice field:
This worked before, what changed that made it fail now?
Well, the explanation was found in my recent changes: To ensure correct user input when creating a new Document Set, I added a Validation rule. The validation rule is simple, on creation the State value must always be ‘Draft’ (*)
The explanation of the introduced error when trying to set ‘State’ field-value from workflow execution, is that the validations are not limited to UI/form handling. The validation settings are applied on SharePoint level whenever a change to item is made. And thus also applied for the item change initiated from workflow.
I tried to build in a differentiator in the validation to only require ‘State=Draft’ upon Document Set creation. But I could not get a working validation rule for that. I tried:
  • =OR([Created]<[Modified],[State]="Draft")
  • =OR((INT(Modified-Created) > 0),(State="Draft"))
  • =OR((DATEDIFF(Modified,Created,"s")>0),(State="Draft"))
But all 3 formules return False, when on a modification time the state is set to value different then ‘Draft’. The explanation is that I noticed the values of ‘Created’ and ‘Modified’ are always both zero (0) on validation time. Likely a sequencing issue.
(*) Note: For optimal User Experience I would have preferred to modify the form itself, and hide or disable the ‘State’ field to avoid the user changing it to non-allowed value. However SharePoint does not support to customize the NewForm in case of Document Sets. The only option you have is to replace the standard DocumentSet page (_layouts/NewDocSet.aspx) for another server-side based version [How to: Create a New Document Set Form in SharePoint Server 2010 (ECM)], but our SharePoint governance rules (‘future proof solutions’) do not allow to do that.

Wednesday, December 9, 2015

Lessons learned with Add-In update execution

In our intranet we have Add-In instances installed throughout the site hierarchy. The content owners of subsites are enabled to utilize any of the provided set to their own will. In a release, IT takes care of automatic update of all the installed instances of Add-In(s) included in the release. This is done in a 2-fold approach:
  • Add the updated Add-In(s) to the SharePoint Add-In catalog;
  • Then first manual update ('Get It') the installed Add-In on the rootweb, for immediate result visible to end-users;
  • Next, via powershell script, update all the Add-In instances in the site hierarchy: traverse through the hierarchy, and on each hostweb that has the Add-In installed, execute 'Update-SPAppInstance'. Under the SharePoint hood, this delegates the Add-In update(s) to execution via a timerjob.
We've learned 3 important lessons with this Add-In update approach:
  1. The completion of the Add-In update throughout the site hierarchy is very time-consuming. In our intranet we've experienced elapse time of over 10 hours.
  2. Until the full completion of the Add-In update, it is not possible to manual update the same Add-In. Situation in which you want / need to do that - and we encountered one - is when in the release it is observed that after the Add-In update, the updated Add-In has an issue. As Add-In rollback is then not longer possible (once a Add-In update is executed), the remedy is to deploy a fix (on the update/fix).
  3. Most surprising: until the full completion of update of Add-In 'X', also the manual update of any other Add-In 'Y' is blocked for completion.
Lessons we took from these observations is that in intranet releases, we first complete all manual Add-In updates. This includes the potential additional installation of required 'fix' on 'updated Add-In'. And only once all manual Add-In updates are completed, and we have ascertained the effect of each is as expected; then execute the long-running Add-In update(s) via powershell maintenance script throughout the entire site hierarchy.

Wednesday, December 2, 2015

Peculiar but Explained: Access Denied on page in search result preview

This morning I noticed that a page present in search result, displayed 'Sorry, you don't have access to this page' in the preview pane. Peculiar as an authorization principle of SharePoint Enteprise Search is that the search result is security trimmed, and only returns results for which the logged-on user has authorization to see. Brainstorming with a colleague we came up with the explanation. The search result actually included the url of a SharePoint subweb, for which I do have read-access. The page configured as 'Welcome Page' (landing page) in this subweb is not [yet] checked in, and therefore not available for me. And as SharePoint Search previewing applies the SharePoint publishing 'Welcome Page' redirection when hovering over (sub)web url, this explains the 'Access Denied' experience in the preview pane. Above is confirmed: after that landing page has been checked in, now in preview I see that page impression instead of the 'Access Denied'.