Thursday, November 6, 2014

How to Prevent FaultException<TDetail> From Eating Your Stack Trace

Eating 
Any action that is performed by the CRM SDK that results in an exception on the CRM server, is returned as a System.ServiceModel.FaultException<Microsoft.Xrm.Sdk.OrganizationServiceFault> exception.  The interesting problem that this creates, is the FaultException<TDetail> exception class overrides the ToString() method of the Exception class, eating the entire stack trace.  So given the following hypothetical program:
public void Test()
{
    using (var service = GetOrganizationServiceProxy())
    {
        try
        {
            return service.GetEntity<SystemUser>(Guid.NewGuid());
        }
        catch (FaultException<OrganizationServiceFault> ex)
        {
            Log(ex);
            throw ex;
        }
    }
}

The log would only contain the following text:


System.ServiceModel.FaultException`1[Microsoft.Xrm.Sdk.OrganizationServiceFault]: systemuser With Id = f483d6ed-ed8c-4d5d-b6e4-145e8d8ec993 Does Not Exist (Fault Detail is equal to Microsoft.Xrm.Sdk.OrganizationServiceFault).

Which doesn’t contain the stack trace at all. For a simple program like this, it’s pretty easy to figure out where the exception occurred, but in an enterprise level application with hundreds or thousands of sdk calls, it becomes nearly impossible. After looking at the base Microsoft Exception class implementation I came up with the FullStackTraceException class:

/// <summary>
/// Provides a method for handling any exception types that "eat" the stacktrace in it's implementation of ToString()
/// </summary>
[Serializable]
public class FullStackTraceException : Exception
{
    private readonly Exception _exception;

    private FullStackTraceException(Exception ex)
    {
        _exception = ex;
    }

    public static Exception Create(Exception exception)
    {
        return CreateInternal((dynamic)exception);
    }

    private static Exception CreateInternal(Exception exception)
    {
        return exception;
    }

    public static FullStackTraceException Create<TDetail>(FaultException<TDetail> exception)
    {
        return new FullStackTraceException(exception);
    }

    private static FullStackTraceException CreateInternal<TDetail>(FaultException<TDetail> exception)
    {
        return new FullStackTraceException(exception);
    }

    public override String ToString()
    {
        String s = _exception.ToString();

        if (_exception.InnerException != null)
        {
            s = String.Format("{0} ---> {1}{2}   --- End of inner exception stack trace ---{2}", s, _exception.InnerException, Environment.NewLine);
        }

        string stackTrace = _exception.StackTrace;
        if (stackTrace != null)
        {
            s += Environment.NewLine + stackTrace;
        }

        return s;
    }
}



It uses static factories to determine if the exception is of the FaultException<TDetail> type or not.  If it isn’t, the exception is returned unchanged.  If it is, a new version of FullStackTraceException is returned, with an overridden ToString method that includes the stack trace as it would normally appear.  So if the first example was changed to:


public void Test()
{
    using (var service = GetOrganizationServiceProxy())
    {
        try
        {
            return service.GetEntity<SystemUser>(Guid.NewGuid());
        }
        catch (FaultException<OrganizationServiceFault> ex)
        {
            Log(FullStackTraceException.Create(ex));
            throw ex;
        }
    }
}

The ToString() call in Log would return


System.ServiceModel.FaultException`1[Microsoft.Xrm.Sdk.OrganizationServiceFault]: systemuser With Id = f483d6ed-ed8c-4d5d-b6e4-145e8d8ec993 Does Not Exist (Fault Detail is equal to Microsoft.Xrm.Sdk.OrganizationServiceFault).

Server stack trace:
   at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)
   at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
   at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)


Exception rethrown at [0]:
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at Microsoft.Xrm.Sdk.IOrganizationService.Retrieve(String entityName, Guid id, ColumnSet columnSet)
   at Microsoft.Xrm.Sdk.Client.OrganizationServiceProxy.RetrieveCore(String entityName, Guid id, ColumnSet columnSet)
   at Microsoft.Xrm.Sdk.Client.OrganizationServiceProxy.Retrieve(String entityName, Guid id, ColumnSet columnSet)
   at DLaB.Utilities.ClientSideOrganizationService.Retrieve(String entityName, Guid id, ColumnSet columnSet) in c:\TFS\DLaB.Utilities\ClientSideOrganizationService.cs:line 88
   at DLaB.Extensions.GetEntity[T](IOrganizationService service, Guid id, ColumnSet columnSet) in c:\TFS\DLaB.Extensions.cs:line 1214
   at DLaB.Extensions.GetEntity[T](IOrganizationService service, Guid id) in c:\TFS\DLaB.Extensions.cs:line 1185
   at DLaB.Example.Test() in c:\TFS\Example.cs:line 7

Which includes the appropriate client side stack trace, allowing for much simpler debugging!  (In actual use, the FullStackTraceException.Create(ex) would occur in the Log method itself to keep the code DRY.)



Enjoy!

Wednesday, August 27, 2014

How To Make It Easier To Create A Plugin For The XrmToolBox

The XrmToolBox is an awesome collection of tools for CRM.  If you haven’t used it before, stop reading this and download it now.  Go ahead.  I’ll wait.

Welcome back…

To date I’ve created 2 plugins for XrmToolBox, Option Set Converter and Early Bound Generator.  While developing them I discovered that there really needed to be a base plugin control to make it simpler to perform CRM calls and perform asynchronous calls.  The base class that I ended up with made development a lot easier for me, and could be very helpful for the rest of the community, so I’m going to highlight it now, the PluginUserControlBase.

UpdateConnection

The first improvement the base makes is simply to implement the required IMsCrmToolsPluginUserControl interface.  The documentation shows UpdateConnection has to check the string name being being passed in to determine which method to call:

public void UpdateConnection(IOrganizationService newService,
                     string actionName = "",
                     object parameter = null)
{
    Service = newService;

    if (actionName == "WhoAmI")
    {
        ProcessWhoAmI();
    }
}


This is a really painful implementation since each method that you need a connection to CRM, will require an if statement for the method to actually call.  It also allows for late bound exceptions if the action name is updated.  Here is my implementation:



public virtual void UpdateConnection(IOrganizationService newService, ConnectionDetail detail, string actionName, object parameter)
{
    Service = newService;
    ConnectionDetail = detail;
    OnConnectionUpdated(new ConnectionUpdatedEventArgs(newService, detail));
    if (actionName == String.Empty)
    {
        return;
    }

    MethodInfo method;
    if (parameter == null)
    {
        method = GetType().GetMethod(actionName);
        if (method == null)
        {
            throw new Exception("Unable to find method " + GetType().Name + "." + actionName);
        }
        method.Invoke(this, null);
    }
    else
    {
        var externalCaller = parameter as ExternalMethodCallerInfo;
        if (externalCaller == null)
        {
            method = GetType().GetMethod(actionName, new[] {parameter.GetType()});
            if (method == null)
            {
                throw new Exception("Unable to find method " + GetType().Name + "." + actionName);
            }
            method.Invoke(this, new[] {parameter});
        }
        else
        {
            externalCaller.ExternalAction();
        }
    }
}


The Service and ConnectionDetail get set, then an Event is raised for any plugins that need to know when the connection is updated.  Then, rather than having an if statement, reflection is used to lookup the correct method to execute.  The ExternalMethodCallerInfo check also allows an action to be used, incase a method needs to be called with more than one parameter.



 



ExecuteMethod



The documentation shows a button clicked event handler having to check if the Service is populated, and then passing in the string name of the method to call:



private void BtnWhoAmIClick(object sender, EventArgs e)
{
    if (Service == null)
    {
        if (OnRequestConnection != null)
        {
            var args = new RequestConnectionEventArgs {ActionName = "WhoAmI", Control = this};
            OnRequestConnection(this, args);
        }
    }
    else
    {
        ProcessWhoAmI();
    }
}


This is also really annoying, having to check each time if the service is populated, and calling it differently depending on it being populated or not.  If the plugin has 3 or 4 actions that require the Crm Service, the OnRequestConnection logic will have to be duplicated multiple times.  I created an ExecuteMethod method to do all of that automagically, using actions rather than strings:



/// <summary>
/// Checks to make sure that the Plugin has an IOrganizationService Connection, before calling the action.
/// </summary>
/// <param name="action"></param>
public void ExecuteMethod(Action action)
{
    if (Service == null)
    {
        var name = action.GetMethodInfo().Name;
        if (name.Contains("__"))
        {
            throw new ArgumentOutOfRangeException("action",
                @"The Action of an Execute Method must not be a lambda.  Use the ExecuteAction(action, parameter) Method.");
        }

        OnRequestConnection(this, new RequestConnectionEventArgs
        {
            ActionName = action.GetMethodInfo().Name,
            Control = this
        });
    }
    else
    {
        action();
    }
}

/// <summary>
/// Checks to make sure that the Plugin has an IOrganizationService Connection, before calling the action.
/// </summary>
/// <param name="action"></param>
/// <param name="parameter"></param>
public void ExecuteMethod<T>(Action<T> action, T parameter)
{
    var caller = parameter as ExternalMethodCallerInfo;
    if (Service == null)
    {
        if (caller == null)
        {
            OnRequestConnection(this, new RequestConnectionEventArgs
            {
                ActionName = action.GetMethodInfo().Name,
                Control = this,
                Parameter = parameter
            });
        }
        else
        {
            OnRequestConnection(this, new RequestConnectionEventArgs
            {
                ActionName = "Recaller",
                Control = this,
                Parameter = parameter
            });
        }
    }
    else if (caller == null)
    {
        action(parameter);
    }
    else
    {
        caller.ExternalAction.Invoke();
    }
}

private void Recaller(ExternalMethodCallerInfo info)
{
    info.ExternalAction.Invoke();
}


WorkAsync



The final improvement that I wanted to highlight was WorkAsync methods which eliminate the need to have duplicated code to create a background worker and update statuses: 



WorkAsync(
    "Message to Display when Starting a CRM Call",
    (w, e) => { /* Work To Do Asynchronously */ },
         e => { /* Cleanup when work has completed */ });


The end result is a much much more simplified plugin:



public partial class SampleTool : PluginUserControlBase
{


    private void BtnWhoAmIClick(object sender, EventArgs e)
    {
        ExecuteMethod(ProcessWhoAmI);
    }

    private void ProcessWhoAmI()
    {
        WorkAsync("Retrieving your user id...",
            (e) => // Work To Do Asynchronously
            {  
                var request = new WhoAmIRequest();
                var response = (WhoAmIResponse) Service.Execute(request);

                e.Result = response.UserId;
            },
            e =>  // Cleanup when work has completed
            {  
                MessageBox.Show(string.Format("You are {0}", (Guid)e.Result));
            }
            );
    }


    private void BtnCloseClick(object sender, EventArgs e)
    {
        base.CloseToolPrompt();
    }
}


Interested in creating a plugin for the XrmToolBox?  Just download the Early Bound Generator and reference the DLab.XrmToolboxCommon dll.



Happy Coding!

Monday, March 17, 2014

How To Interpret AccessRights Numbers In CRM Security Exceptions

When a user attempts to preform an action in CRM there are two potential types of security issues that could arise.
  1. The user’s access level for the privilege required to perform the action is none e.g., attempting to share a contact:
    Contact_No_Share
  2. The user has an access level for the privilege required to perform the action, but it isn’t generous enough e.g., attempting to share a contact with a user outside of the business unit:
    Contact_Org_Share
In either case, if action requires a single privilege, and the user doesn’t have the privilege (or doesn’t have high enough access rights for the privilege), you’ll get an error like the following:
CrmException:
 SecLib::AccessCheckEx failed. Returned hr = -2147187962,
 ObjectID: abe86dd6-d7aa-e311-ade6-cfcd73da4a68,
 OwnerId: db2f6110-1a9d-df11-9d85-005056bb728e, 
 OwnerIdType: 8 and
 CallingUser: edce293b-789d-e311-950c-ed9561bb126a.

 ObjectTypeCode: 4201,
 objectBusinessUnitId: 966fe7c8-89c1-de11-853f-005056bb728e,
 AccessRights: prvShareActivity


This gives combined with the Security Role UI to Privilege Mapping gives all the information required to determine the security rights required.  The Calling user, object, object’s type, object’s owner, object’s owner type, and object’s business, as well as the Access Right required.  Sometimes though, the error looks like this:


CrmException:
 SecLib::AccessCheckEx failed. Returned hr = -2147187962,
 ObjectID: abe86dd6-d7aa-e311-ade6-cfcd73da4a68,
 OwnerId: db2f6110-1a9d-df11-9d85-005056bb728e, 
 OwnerIdType: 8 and
 CallingUser: edce293b-789d-e311-950c-ed9561bb126a.

 ObjectTypeCode: 4201,
 objectBusinessUnitId: 966fe7c8-89c1-de11-853f-005056bb728e,
 AccessRights: 262153


In this case the Access Rights is a little more cryptic.  This is what happens when an operation requires multiple privileges, e.g. Share and Append.  To “decrypt” it you’ll need to know how enum flags work, and what the actual powers of two are in this case.


Below are the values as taken from the Microsoft.Crm.Sdk.Messages.AccessRights Enum

Name Value Binary Value Description
ReadAccess 1 0001 Specifies the right to read the specified type of object.
WriteAccess 2 0010 Specifies the right to update (write to) the specified object.
AppendAccess 4 0100 Specifies the right to append the specified object to another object.
AppendToAccess 16 0001 0000 Specifies the right to append another object to the specified object.
CreateAccess 32 0010 0000 Specifies the right to create an instance of the object type.
DeleteAccess 65536 0001 0000 0000 0000 0000 Specifies the right to delete the specified object.
ShareAccess 262144 0100 0000 0000 0000 0000 Specifies the right to share the specified object.
AssignAccess 524288 1000 0000 0000 0000 0000 Specifies the right to assign the specified object to another security principal.


In the error above, 262153 is

0100 0000 0000 0000 1001

But wait, There is no Access Right with a binary value of 1000.  Since I’ve been unable to find any other documentation, my guess is that something on our installation (that was an upgrade from a CRM 4.0 server) still has a reference to the CRM 4.0 enum values in which AppendToAccess is actually 8 (1000) rather than 16 (0001 0000).


Using that assumption and the table above, 2621534 is made up of ShareAccess, AppendToAccess, and ReadAccess.  This means the user has to have sufficient Access Levels for Share, Append, and Read privileges of the entity, in order to successfully perform the requested operation.

Friday, February 21, 2014

Improved Method To Avoid A System.InvalidOperationException Multi-Threaded Race Condition In CRM 2011

In order to improve performance creating connections to CRM via the C# SDK, we followed the steps outlined here and here, to cache the ServiceManagement and Credentials objects.  After some quick testing, everything ran smoothly and we pushed the changes to prod.  A couple days later, these exceptions started showing up in our logs:
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at System.Collections.Generic.List`1.Enumerator.MoveNext()
   at Microsoft.Xrm.Sdk.Client.ServiceConfiguration`1.CreateLocalChannelFactory()
A quick Google search turned up this page, which listed the problem being caused by a race condition when creating a connection to CRM, and enabling the proxy types on the OrganizationServiceProxy.  I decided to look into what the SDK was actually doing to create the race condition and this is what I found in the call site where the exception was being thrown:
private ChannelFactory<TService> CreateLocalChannelFactory()
{      
     lock (ServiceConfiguration<TService>._lockObject)
      {
          ServiceEndpoint local_0 = new ServiceEndpoint(this.CurrentServiceEndpoint.Contract, this.CurrentServiceEndpoint.Binding, this.CurrentServiceEndpoint.Address);
          foreach (IEndpointBehavior item_0 in (Collection<IEndpointBehavior>)this.CurrentServiceEndpoint.Behaviors)
              local_0.Behaviors.Add(item_0);
          local_0.IsSystemEndpoint = this.CurrentServiceEndpoint.IsSystemEndpoint;
          local_0.ListenUri = this.CurrentServiceEndpoint.ListenUri;
          local_0.ListenUriMode = this.CurrentServiceEndpoint.ListenUriMode;
          local_0.Name = this.CurrentServiceEndpoint.Name;
          ChannelFactory<TService> local_2 = new ChannelFactory<TService>(local_0);
          if (this.ClaimsEnabledService || this.AuthenticationType == AuthenticationProviderType.LiveId)
              ChannelFactoryOperations.ConfigureChannelFactory<TService>(local_2);
          local_2.Credentials.IssuedToken.CacheIssuedTokens = true;
          return local_2;
      }
}
Even with the Lock statement, the CurrentServiceEndpoind.Behaviors call apparently was still throwing the exception. I then checked the OrganizationServiceProxy to see what EnableProxyTypes() was doing:
public void EnableProxyTypes(Assembly assembly)
{
      ClientExceptionHelper.ThrowIfNull((object) assembly, "assembly");
      ClientExceptionHelper.ThrowIfNull((object) this.ServiceConfiguration, "ServiceConfiguration");
      OrganizationServiceConfiguration serviceConfiguration = this.ServiceConfiguration as OrganizationServiceConfiguration;
      ClientExceptionHelper.ThrowIfNull((object) serviceConfiguration, "orgConfig");
      serviceConfiguration.EnableProxyTypes(assembly);
}
Hmm, nothing there looked to be updating the Behaviors collection. Must be something in the OrganizationServiceConfiguration’s EnableProxyType():
public void EnableProxyTypes(Assembly assembly)
{
      ClientExceptionHelper.ThrowIfNull((object) assembly, "assembly");
      ClientExceptionHelper.ThrowIfNull((object) this.CurrentServiceEndpoint, "CurrentServiceEndpoint");
      lock (this._lockObject)
      {
          ProxyTypesBehavior local_0 = this.CurrentServiceEndpoint.Behaviors.Find<ProxyTypesBehavior>();
          if (local_0 != null)
              ((Collection<IEndpointBehavior>) this.CurrentServiceEndpoint.Behaviors).Remove((IEndpointBehavior) local_0);
          this.CurrentServiceEndpoint.Behaviors.Add((IEndpointBehavior) new ProxyTypesBehavior(assembly));
      }
}
And there it was, the cause of the race condition. One thread calls CreateLocalChannelFactory() while another thread calls EnableProxyTypes(). Even though they are different OrganizationsServiceProxies, they share the same ServiceConfiguration. Even though they are both wrapped in lock statements, they are using different lock objects.

The fix suggested by the only Google result for this error is to add a check to see if the ServiceConfiguration’s CurrentServiceEndpoint has any EndpointBehaviors before enabling the proxy types. This still potentially (although very unlikely) still allows for the exception to occur. I decided to simplify it. Since the only thing the OrganizationServiceProxy does in it’s EnableProxyTypes() is pass the call onto the ServiceConfiguration’s EnableProxyTypes(), and since the ServiceConfiguration is shared amongst all threads, the call to EnableProxyTypes() can be performed directly after the creation of the ServiceConfiguration, before it is returned and used by the OrganizationServiceProxy. This removes the race condition as well as having to check for any existing behaviors before calling EnableProxyTypes(). Below is the utility class that we use to create our OrganizationServiceProxies, sharing the ServiceConfiguration and Credentials.

Before diving into the code, here are some things that aren’t shown below:
  • CrmServiceEntity is a class that contains all of the information required to connect to CRM. It overrides Equals(), so it is a valid Key to use.
  • GetOrAddSafe is an extension method that ensures that only once CrmServiceCreationInfo object gets created per CrmServiceEntity.
And now the code, which is pretty simple. A call comes in to CreateService, with a CrmServiceEntity parameter. GetOrAddSafe is then called, looking for any existing value in the ConcurrentDictionary. If none is found, it calls the constructor for CrmServiceCreationInfo with the CrmServiceEntity parameter. Keep in mind this is locked and will not be called twice for the same CrmServiceEntity.

The ServiceConfiguration and ClientCredential get created as normal, but then if Proxy Types are enabled, the EnableProxyTypes is called on the Service Configuration. There is some reflection magic required to call the method since it is an internal class, as well as a fail safe incase Microsoft ever changes the class name. But the end result is before the ServiceConfiguration object is ever returned, it should have it’s proxy settings set, which means the race condition will never happen. Enjoy!
private static ConcurrentDictionary<CrmServiceEntity, CrmServiceCreationInfo> _crmServiceCreationInfos = new ConcurrentDictionary<CrmServiceEntity, CrmServiceCreationInfo>();
private static readonly object _crmServiceCreationLock = new object();
private static OrganizationServiceProxy CreateService(CrmServiceEntity entity)
{
      var crmServiceCreationInfo = _crmServiceCreationInfos.GetOrAddSafe(_crmServiceCreationLock, entity, e => new CrmServiceCreationInfo(e));
      var orgService = new OrganizationServiceProxy(crmServiceCreationInfo.ServiceConfiguration, crmServiceCreationInfo.ClientCredential);
      if (entity.ImpersonationUserId != Guid.Empty)
      {
          orgService.CallerId = entity.ImpersonationUserId;
      }
      return orgService;
}


private class CrmServiceCreationInfo{
      public IServiceManagement<IOrganizationService> ServiceConfiguration { get; set; }
      public ClientCredentials ClientCredential { get; set; }
      public CrmServiceCreationInfo(CrmServiceEntity entity)
      {
          var orgUri = GetOrganizationServiceUri(entity);
          ServiceConfiguration = ServiceConfigurationFactory.CreateManagement<IOrganizationService>(orgUri);
          ClientCredential = GetCredentials(entity);
          if (entity.EnableProxyTypes)
          {
              // As of at least CRM 2011 Rollup 15 there exists the potential that sharing the Service Configuration and EnablingProxyTypes could cause a 
              // System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
              //    at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
              //    at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
              //    at System.Collections.Generic.List`1.Enumerator.MoveNext()
              //    at Microsoft.Xrm.Sdk.Client.ServiceConfiguration`1.CreateLocalChannelFactory()
              // http://social.microsoft.com/Forums/en-US/d8d81294-5c11-4490-824d-649c653c7335/linq-exception-occurs-while-retrieving-paged-crm-data-in-a-multihreaded-manner

              // Rather than not enabling the proxy types if it has already been enabled which could still cause the issue,
              // enable it here which is guaranteed to only execute once.
              // type should be of type OrganizationServiceConfiguration which is an internal type.  If something changes
              // Create a temporary OrganizationServiceProxy to then fix the issue.

              var type = ServiceConfiguration.GetType();
              var method = type.GetMethod("EnableProxyTypes", new[] { typeof(System.Reflection.Assembly) });
              if (method == null)
              {
                  LogManager.GetCurrentClassLogger().Warn("EnableProxyTypes doesn't exist for " + type.FullName);
                  using (var orgService = new OrganizationServiceProxy(ServiceConfiguration, ClientCredential))
                  {
                      orgService.EnableProxyTypes(GetEarlyBoundProxyAssembly());
                  }
              }
              else
              {
                  method.Invoke(ServiceConfiguration, new Object[] { GetEarlyBoundProxyAssembly() });
              }
          }
      }
}

Monday, January 27, 2014

Why Is CRM Displaying A Section That Has Been Defined As Hidden?

After being head down in CRM for the last 2 years, there are few things that surprise me, but the situation I’m about to describe, threw me for a loop:Huh

Our client had an entity with multiple forms.  All the forms looked virtually identical with the 40 same basic fields, split up among 5 different sections.  4 of the sections were defined as visible, but one wasn’t.  It contained the name of the entity (which was programmatically calculated from various fields, which explains why it was on the form but hidden) along with a couple other fields that were required by the JavaScript of the form, in order to process the business rules of the entity, but the actual business user wouldn’t care about.

Everything had been working fine until we upgraded the on-premise version of CRM from rollup 11 to rollup 15.  After the rollup, we found a bug where on one of the forms, for a particular subset of data, the hidden section was mysteriously being displayed.  I tried everything I could think of, defaulting the section to being hidden, setting it to being hidden as the last function call of the onLoad JavaScript, even setting all of the fields within the hidden section as being hidden.  All to no avail. 

Finally, in a desperate attempt to determine why the section was being displayed, I searched for “setVisible” in the JavaScript of the page, and placed a break point everywhere in the source code.  20 breakpoints later one of the breakpoints was triggered and I finally had a call stack to point me to the problem.  Although it was all minified JavaScript, chopped full of 3 character variables and functions containing only a ‘$’ a letter and a number, one of the function’s names was in English and caught my eye, “setFocus”.  Why was Microsoft setting the focus on a random field on my custom form?  A few debugging steps later, and I had my answer:

CRM has JavaScript that runs after the onLoad function of the form to find the first field on the form that is not read-only, and then sets the focus on that field.  As a part of the setFocus function, it ensures the field is visible by not only setting the field as visible, but setting it’s parent section as visible as well.  So what was happening on this one particular form was all of the fields on the form before the hidden section had been marked as read-only, and when Microsoft’s JavaScript searched for the first field that was not read-only, it found the name field in the hidden section, set the focus on it, which made the section visible.

The solution was relatively simple, make all the fields in the section read-only, and update any JavaScript that updated them to always force submit.  Making the fields read-only removed them from being valid focusable candidates, and adding force submit to each field ensured that any updates the JavaScript made would still get posted back and updated in CRM.  Problem Solved.

Debugging Microsoft’s minified JavaScript is never fun or easy, but sometimes it’s a necessity.  Happy debugging!

Tuesday, January 7, 2014

Lessons Learned Deleting 312 Million Records From CRM’s PrincipalObjectAccess Table

Unrestrained growth of the POA (PrincipalObjectAccess) SQL table in CRM is nothing new.  Performance of on premise CRM installations can start to be noticeably affected at 10 million records or so.  You can imagine my surprise when I learned my client had just over 1/3 of a billion records!  Microsoft has a script to clean it up, which probably works great at cleaning up a couple million records, but is not really an option when needing to clean up 312 million. 

After multiple modifications to Microsoft's scripts in our test environments we ended up splitting/modifying the Microsoft into 4 different scripts.

  1. Create and populate ToDeletePOAEntries Table.
  2. Drop the existing indexes on the POA table, add a single index just with the columns needed for the join
  3. Delete the POA records
  4. Reset the index changes made in Script 2.

The smartest thing we did was probably sending an e-mail, every time the script deleted a million records.  This allowed us to see trends in how long it was taking to delete and attempt to fix the issue.

Here is a graph of the e-mails received for the first 82 million, which took just over 9 hours:

First82Million

 

We wrongly assumed that the increase in deletions was due to the indexes needing to be rebuilt, but rebuilding the index didn’t change the trend.  We then decided to try rebuilding the ToDeletePOAEntries table and this is what we saw for the next 48 million over the course of 4 hours:

MiddleMillions

We then edited the script to make the cleanup of the ToDeletePOAEntries automatic.  If I had to do it over again, I’d add an Identity column to the ToDeletePOAEntries table, and update the where of the join for each loop to start at the next batch of entries.  This is what we then saw with the noticeable peeks required to recreate the ToDeletePOAEntries table every 25 million:FinalMillions

 

Here are the scripts that were used.  Hopefully they’ll be helpful for anyone else out there in the same position we were in. It took just over 30 hours to delete 312 million records with CRM itself completely shutdown.

Script 1: Creates ToDeletePOAEntries table, populating it with all of the records that need to be removed out of the POA table.  This is basically the first part of the Microsoft script,

USE [Contoso_MSCRM]
GO

IF NOT EXISTS (SELECT * FROM sys.sysobjects
WHERE id = object_id(N'[dbo].[ToDeletePOAEntries]')
AND ObjectProperty(id, N'IsUserTable') = 1)

create table ToDeletePoaEntries
(
ObjectId uniqueidentifier,
Otc int
)
IF NOT EXISTS (SELECT * FROM sys.sysindexes si
INNER JOIN sys.sysobjects so ON si.id = so.id
WHERE so.id = OBJECT_ID(N'[dbo].[ToDeletePoaEntries]')
AND OBJECTPROPERTY(so.id, N'IsUserTable') = 1
AND si.name LIKE '%mainindex%')

CREATE UNIQUE NONCLUSTERED INDEX [mainindex] ON [dbo].[ToDeletePoaEntries]
(
[ObjectId] ASC,
[Otc] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

----- Insert records to be deleted in ToDeletePoaEntries

-- go through all user-owned entities which are not replicated and don't support duplicate detection
declare entity_cursor cursor local FORWARD_ONLY READ_ONLY
for select distinct e.ObjectTypeCode, e.BaseTableName, a.PhysicalName from EntityView e
inner join AttributeView a on e.EntityId = a.EntityId and a.IsPKAttribute = 1
where e.IsReplicated = 0 and e.IsDuplicateCheckSupported = 0 and e.OwnershipTypeMask & 1 = 1

open entity_cursor

declare @baseTableName sysname
declare @otc nvarchar(20)
declare @primaryKey sysname
declare @totalCollected int = 0
declare @currentCollected int
declare @tempRowCount int = 0
declare @collectstatement nvarchar(max)fetch next from entity_cursor
into @otc, @baseTableName, @primaryKeywhile @@FETCH_STATUS = 0
begin
print 'Cleaning up POA for ' + @baseTableName

set @currentCollected = 0


set @collectstatement = 'insert into ToDeletePoaEntries(ObjectId, Otc)
select distinct poa.ObjectId, poa.ObjectTypeCode
from PrincipalObjectAccess poa left join '
 + @baseTableName + ' e on poa.ObjectId = e.' + @primaryKey +
' where e.' + @primaryKey + ' is null and poa.ObjectTypeCode = ' + @otc;

print @collectstatement

exec(@collectstatement)
set @tempRowCount = @@ROWCOUNT
set @currentCollected = @currentCollected + @tempRowCount
print CAST(@currentCollected as nvarchar(20)) + ' records collected for ' + @baseTableName
set @totalCollected = @totalCollected + @currentCollected

fetch next from entity_cursor
into @otc, @baseTableName, @primaryKey
end

close entity_cursor
deallocate entity_cursor

print CAST(@totalCollected as nvarchar(20)) + ' total records collected'

Script 2: Drop the existing indexes on the POA table, add a single index just with the columns needed for the join.  This isn’t included in Microsoft’s script, and I don’t have any data as far as how helpful this was, but it should result in the least amount of work required by the database in order to delete a record.

USE [Contoso_MSCRM]

go

/****** Object:  Index [ndx_PrimaryKey_PrincipalObjectAccess]    Script Date: 12/18/2013 08:44:05 ******/
IF EXISTS (SELECT *
           FROM   sys.indexes
           WHERE  object_id = Object_id(N'[dbo].[PrincipalObjectAccess]')
                  AND name = N'ndx_PrimaryKey_PrincipalObjectAccess')
  ALTER TABLE [dbo].[principalobjectaccess]
    DROP CONSTRAINT [ndx_PrimaryKey_PrincipalObjectAccess]

go

/****** Object:  Index [ndx_Cover]    Script Date: 12/18/2013 08:44:15 ******/
IF EXISTS (SELECT *
           FROM   sys.indexes
           WHERE  object_id = Object_id(N'[dbo].[PrincipalObjectAccess]')
                  AND name = N'ndx_Cover')
  DROP INDEX [ndx_Cover] ON [dbo].[PrincipalObjectAccess] WITH ( online = OFF )

go

/****** Object:  Index [fndx_Sync_VersionNumber]    Script Date: 12/18/2013 08:44:33 ******/
IF EXISTS (SELECT *
           FROM   sys.indexes
           WHERE  object_id = Object_id(N'[dbo].[PrincipalObjectAccess]')
                  AND name = N'fndx_Sync_VersionNumber')
  DROP INDEX [fndx_Sync_VersionNumber] ON [dbo].[PrincipalObjectAccess] WITH (
  online = OFF )

go

/****** Object:  Index [cndx_PrincipalObjectAccess]    Script Date: 12/18/2013 08:44:45 ******/
IF EXISTS (SELECT *
           FROM   sys.indexes
           WHERE  object_id = Object_id(N'[dbo].[PrincipalObjectAccess]')
                  AND name = N'cndx_PrincipalObjectAccess')
  DROP INDEX [cndx_PrincipalObjectAccess] ON [dbo].[PrincipalObjectAccess] WITH
  (
  online = OFF )

go

/****** Object:  Index [UQ_PrincipalObjectAccess]    Script Date: 12/18/2013 08:45:56 ******/
IF EXISTS (SELECT *
           FROM   sys.indexes
           WHERE  object_id = Object_id(N'[dbo].[PrincipalObjectAccess]')
                  AND name = N'UQ_PrincipalObjectAccess')
  ALTER TABLE [dbo].[principalobjectaccess]
    DROP CONSTRAINT [UQ_PrincipalObjectAccess]

go

CREATE NONCLUSTERED INDEX [IX_CleanupIndex]
  ON [dbo].[PrincipalObjectAccess] ( [objectid] ASC, [objecttypecode] ASC )
  WITH (statistics_norecompute = OFF, sort_in_tempdb = OFF, ignore_dup_key = OFF
, drop_existing = OFF, online = OFF, allow_row_locks = ON, allow_page_locks = ON
) ON [PRIMARY]

go 

Script 3: Delete the POA records.  This script has the most changes from the Microsoft version.  It wasn’t until we actually started running the scripts in Prod that we finally figured out our main bottle neck, the join of the POA table to the ToDeletePOAEntries.  For every million records deleted, the next time through the loop, the SQL server would attempt to find records that had just been deleted first, before joining on the records not yet deleted.  There is vast rooms for improvement here I’m sure, but this is what we came up with over winter holiday.

USE [Contoso_MSCRM]

go

-- Delete query
-- This scripts cleans up orphaned POA records for selected entities
DECLARE @deleteBatchSize INT = 50000
DECLARE @deleteBatchSizeNVarChar NVARCHAR(10) = Cast(
        @deleteBatchSize AS NVARCHAR(10))
DECLARE @totalDeleted INT = 0
DECLARE @currentDeleted INT
DECLARE @deletestatement NVARCHAR(max)
DECLARE @retry INT = 1
DECLARE @tempRowCount INT = 0
DECLARE @recipientList NVARCHAR(1000) = 'MSmith@Contoso.com;JDoe@Contoso.com'
DECLARE @backupScript NVARCHAR(1000)
DECLARE @backupCount INT = 0
DECLARE @bodyText NVARCHAR(4000)
DECLARE @subjectText NVARCHAR(1000)
DECLARE @baseTableName SYSNAME
DECLARE @otc NVARCHAR(20)
DECLARE @primaryKey SYSNAME
DECLARE @totalCollected INT = 0
DECLARE @currentCollected INT
DECLARE @tempRowCountRecreate INT = 0
DECLARE @collectstatement NVARCHAR(max)

WHILE ( @retry > 0 )
  BEGIN
      BEGIN try
          SET @currentDeleted = 0
          SET @tempRowCount = 0
          -- delete all records of the current entity type which don't have corresponding object in the base table
          SET @deletestatement = 'delete top (' + @deleteBatchSizeNVarChar
                                 +
') from PrincipalObjectAccess from PrincipalObjectAccess poa join ToDeletePoaEntries e on poa.ObjectId = e.ObjectId and poa.ObjectTypeCode = e.Otc'

    PRINT @deletestatement

    -- delete PrincipalObjectAccess records in batches
    EXEC(@deletestatement)

    SET @tempRowCount = @@ROWCOUNT
    SET @currentDeleted = @currentDeleted + @tempRowCount

    WHILE @tempRowCount = @deleteBatchSize
      BEGIN
          EXEC(@deletestatement)

          SET @tempRowCount = @@ROWCOUNT
          SET @currentDeleted = @currentDeleted + @tempRowCount

          PRINT Cast(@currentDeleted AS NVARCHAR(20))
                + ' records deleted '
                + Cast(Getutcdate() AS NVARCHAR(50))

          -- Every 1,000,000 records notify the team that we have deleted another 1,000,000 records, with the total record count
          IF ( @currentDeleted % 1000000 = 0 )
            BEGIN
                PRINT '1,000,000 records deleted'

                SET @bodyText = CONVERT(NVARCHAR(20), @currentDeleted)
                                + ' POA records deleted.';

                EXECUTE msdb.dbo.Sp_send_dbmail
                  @profile_name = 'SQLNotify Profile',
                  @recipients = @recipientList,
                  @body = @bodyText,
                  @subject = 'POA Cleanup Progress Email',
                  @importance = 'High'
            END

          -- Every 25,000,000 records Rebuild ToDeletePoaEntries
          IF ( @currentDeleted % 25000000 = 0
                OR @currentDeleted <= 100000 )
            BEGIN
                PRINT '25,000,000 records deleted'

                -- DROP AND RECREATE TABLE
                DROP INDEX [mainindex] ON [dbo].[ToDeletePoaEntries]

                DROP TABLE todeletepoaentries

                CREATE TABLE todeletepoaentries
                  (
                     objectid UNIQUEIDENTIFIER,
                     otc      INT
                  )

                ----- Insert records to be deleted in ToDeletePoaEntries
                -- go through all user-owned entities which are not replicated and don't support duplicate detection
                DECLARE entity_cursor CURSOR local forward_only read_only FOR
                  SELECT DISTINCT e.objecttypecode,
                                  e.basetablename,
                                  a.physicalname
                  FROM   entityview e
                         INNER JOIN attributeview a
                                 ON e.entityid = a.entityid
                                    AND a.ispkattribute = 1
                  WHERE  e.isreplicated = 0
                         AND e.isduplicatechecksupported = 0
                         AND e.ownershiptypemask & 1 = 1

                OPEN entity_cursor

                SET @totalCollected = 0
                SET @tempRowCountRecreate = 0

                FETCH next FROM entity_cursor INTO @otc, @baseTableName,
                @primaryKey

                WHILE @@FETCH_STATUS = 0
                  BEGIN
                      PRINT 'Cleaning up POA for ' + @baseTableName

                      SET @currentCollected = 0
                      SET @collectstatement = 'insert into ToDeletePoaEntries(ObjectId, Otc)  select distinct poa.ObjectId, poa.ObjectTypeCode  from PrincipalObjectAccess poa left join ' + @baseTableName
                                              + ' e on poa.ObjectId = e.' +
                                              @primaryKey
                                              + ' where e.' + @primaryKey
                                              +
                      ' is null and poa.ObjectTypeCode = '
                                              +
                                              @otc;

                      PRINT @collectstatement

                      EXEC(@collectstatement)

                      SET @tempRowCountRecreate = @@ROWCOUNT
                      SET @currentCollected =
                      @currentCollected + @tempRowCountRecreate

                      PRINT Cast(@currentCollected AS NVARCHAR(20))
                            + ' records collected for ' + @baseTableName

                      SET @totalCollected = @totalCollected + @currentCollected

                      FETCH next FROM entity_cursor INTO @otc, @baseTableName,
                      @primaryKey
                  END

                CLOSE entity_cursor

                DEALLOCATE entity_cursor

                CREATE UNIQUE NONCLUSTERED INDEX [mainindex]
                  ON [dbo].[ToDeletePoaEntries] ( [objectid] ASC, [otc] ASC )
                  WITH (statistics_norecompute = OFF, sort_in_tempdb = OFF,
                ignore_dup_key
                =
                OFF, drop_existing = OFF, online = OFF, allow_row_locks = ON,
                allow_page_locks = ON) ON [PRIMARY]

                PRINT Cast(@totalCollected AS NVARCHAR(20))
                      + ' total records collected'
            END
      END

    -- We are done! Delete the ToDeletePoaTable and Notify the team
    DROP TABLE [dbo].[todeletepoaentries]

    EXECUTE msdb.dbo.Sp_send_dbmail
      @profile_name = 'SQLNotify Profile',
      @recipients = @recipientList,
      @body = 'All POA Records to be deleted have been deleted.',
      @subject = 'POA Cleanup Complete',
      @importance = 'High'

    -- Reset the @retry value to 0, which will allow the code to exit the WHILE loop
    SET @retry = 0
END try

    BEGIN catch
        PRINT 'Retry ' + CONVERT(NVARCHAR(10), @retry)

        SET @retry = @retry + 1
        -- Hit an error - Notify the team
        SET @bodyText = Error_message();
        SET @subjectText = 'POA Cleanup Progress Email - Error '
                           + CONVERT(NVARCHAR(10), @retry);

        EXECUTE msdb.dbo.Sp_send_dbmail
          @profile_name = 'SQLNotify Profile',
          @recipients = @recipientList,
          @body = @bodyText,
          @subject = @subjectText,
          @importance = 'High'

        -- There was an error so let's reindex
        ALTER INDEX ALL ON dbo.principalobjectaccess reorganize;

        UPDATE STATISTICS dbo.principalobjectaccess;
    END catch
END --END OF THE RETRY LOOP
PRINT 'EXECUTION SUCCEED' 

Script 4: Reset the indexes on the POA table to bring them back to their original condition:

USE [Contoso_MSCRM]

go

IF EXISTS (SELECT *
           FROM   sys.sysindexes si
                  INNER JOIN sys.sysobjects so
                          ON si.id = so.id
           WHERE  so.id = Object_id(N'[dbo].[ToDeletePoaEntries]')
                  AND Objectproperty(so.id, N'IsUserTable') = 1
                  AND si.name LIKE '%mainindex%')
  DROP TABLE [dbo].[todeletepoaentries]

go

USE [Contoso_MSCRM]

go

/****** Object:  Index [IX_CleanupIndex]    Script Date: 12/19/2013 16:02:29 ******/
IF EXISTS (SELECT *
           FROM   sys.indexes
           WHERE  object_id = Object_id(N'[dbo].[PrincipalObjectAccess]')
                  AND name = N'IX_CleanupIndex')
  DROP INDEX [IX_CleanupIndex] ON [dbo].[PrincipalObjectAccess] WITH ( online =
  OFF )

go

/****** Object:  Index [cndx_PrincipalObjectAccess]    Script Date: 12/18/2013 08:52:57 ******/
CREATE CLUSTERED INDEX [cndx_PrincipalObjectAccess]
  ON [dbo].[PrincipalObjectAccess] ( [objectid] ASC, [principalid] ASC )
  WITH (pad_index = OFF, statistics_norecompute = OFF, sort_in_tempdb = OFF,
ignore_dup_key = OFF, drop_existing = OFF, online = OFF, allow_row_locks = ON,
allow_page_locks = ON, FILLFACTOR = 100) ON [PRIMARY]

go

/****** Object:  Index [fndx_Sync_VersionNumber]    Script Date: 12/18/2013 08:53:03 ******/
CREATE UNIQUE NONCLUSTERED INDEX [fndx_Sync_VersionNumber]
  ON [dbo].[PrincipalObjectAccess] ( [versionnumber] ASC )
WHERE ([VersionNumber] IS NOT NULL)
WITH (pad_index = OFF, statistics_norecompute = OFF, sort_in_tempdb = OFF,
ignore_dup_key = OFF, drop_existing = OFF, online = OFF, allow_row_locks = ON,
allow_page_locks = ON, FILLFACTOR = 100) ON [PRIMARY]

go

/****** Object:  Index [ndx_Cover]    Script Date: 12/18/2013 08:53:13 ******/
CREATE NONCLUSTERED INDEX [ndx_Cover]
  ON [dbo].[PrincipalObjectAccess] ( [objecttypecode] ASC, [principalid] ASC,
[accessrightsmask] ASC, [inheritedaccessrightsmask] ASC )
  WITH (pad_index = OFF, statistics_norecompute = OFF, sort_in_tempdb = OFF,
ignore_dup_key = OFF, drop_existing = OFF, online = OFF, allow_row_locks = ON,
allow_page_locks = ON, FILLFACTOR = 80) ON [PRIMARY]

go

/****** Object:  Index [ndx_PrimaryKey_PrincipalObjectAccess]    Script Date: 12/18/2013 08:53:21 ******/
ALTER TABLE [dbo].[principalobjectaccess]
  ADD CONSTRAINT [ndx_PrimaryKey_PrincipalObjectAccess] PRIMARY KEY NONCLUSTERED
  ( [principalobjectaccessid] ASC )WITH (pad_index = OFF, statistics_norecompute
  = OFF, sort_in_tempdb = OFF, ignore_dup_key = OFF, online = OFF,
  allow_row_locks = on, allow_page_locks = on, FILLFACTOR = 100) ON [PRIMARY]

go

/****** Object:  Index [UQ_PrincipalObjectAccess]    Script Date: 12/18/2013 08:53:33 ******/
ALTER TABLE [dbo].[principalobjectaccess]
  ADD CONSTRAINT [UQ_PrincipalObjectAccess] UNIQUE NONCLUSTERED ( [principalid]
  ASC, [objectid] ASC, [objecttypecode] ASC )WITH (pad_index = OFF,
  statistics_norecompute = OFF, sort_in_tempdb = OFF, ignore_dup_key = OFF,
  online = OFF, allow_row_locks = on, allow_page_locks = on, FILLFACTOR = 100)
  ON [PRIMARY]

go