It was a long night - one lasting 27 hours, actually. It was the moments before deploy. Your team is counting on you. Your customer is counting on you. You are counting on you. You hit deploy. You wait. You pray. You've cut some corners to make the deadline, but you hope it won't show. And then the unthinkable happens. Azure role starts to recycle. Now what!?
I tried everything I've known up until that point. I've ran the solution locally in an emulator, and it worked. As I found out later, that had a logical explanation as well. I went through failed IIS request logs, I went through Event Logs. Mind you, this was a challenge because our IT department has, for some reason, blocked outgoing ports as well :-). But that's a different story.
The only symptom we could see was "Role is restarting..." in the Azure Management web site. When IIS logs were empty that's when I started to suspect something else was going on. After a lot of searching, I ran into this link: http://blogs.msdn.com/b/kwill/archive/2013/10/03/troubleshooting-scenario-7-role-recycling.aspx.
Following the instructions on the site and links on it, I RDPed to the VM in Azure, and ran the following in a PowerShell window:
md c:\tools; Import-Module bitstransfer; Start-BitsTransfer http://dsazure.blob.core.windows.net/azuretools/AzureTools.exe c:\tools\AzureTools.exe; c:\tools\AzureTools.exe
From there on, the rudimentary looking, but very useful Azure Tools was downloaded and ran. I then downloaded X64 Debuggers, then on the Utils tab, I use the Attach debugger...__"This is arguably one of the most useful utilities in AzureTools. This will let you attach a debugger to a process which fails and exits immediately on startup. The most common scenario is a role which is recycling and you want to attach WinDBG to the WaIISHost/WaWorkerHost process, but it crashes too quickly for you to manually attach the debugger. In Azure the normal trick of setting the Image File Execution Options debugger registry key doesn’t work and this Attach Debugger utility is the only consistent way I have found to attach a debugger. To use, first download a debugger (ie. double-click the X64 Debuggers And Tools-x64_en-us tool from the Tools tab), enter the process name you are interested in, click Attach Debugger, and wait for the Azure guest agent to automatically start that process again."
If you have never used WinDbg, make sure to read an article or two before doing so. This is a good start. I was really lucky as the process actually threw an exception that was not being bubbled up to the Azure host - so I didn't see any information about it.
This is what I got in the WinDbg window:
Microsoft.WindowsAzure.ServiceRuntime Critical: 201 : ModLoad: 00007ff8`8d6d0000 00007ff8`8d7d6000 D:\Windows\Microsoft.NET\Framework64\v4.0.30319\diasymreader.dll
Role entrypoint could not be created:
** System.TypeLoadException: Unable to load the role entry point due to the following exceptions:**
-- System.IO.FileLoadException: Could not load file or assembly 'Autofac, Version=3.3.0.0, Culture=neutral, PublicKeyToken=17863af14b0044da' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)
File name: 'Autofac, Version=3.3.0.0, Culture=neutral, PublicKeyToken=17863af14b0044da'
WRN: Assembly binding logging is turned OFF.
To enable assembly bind failure logging, set the registry value [HKLM\Software\Microsoft\Fusion!EnableLog] (DWORD) to 1.
Note: There is some performance penalty associated with assembly bind failure logging.
To turn this feature off, remove the registry value [HKLM\Software\Microsoft\Fusion!EnableLog].
---> System.Reflection.ReflectionTypeLoadException: Unable to load one or more of the requested types. Retrieve the LoaderExceptions property for more information.
at System.Reflection.RuntimeModule.GetTypes(RuntimeModule module)
at System.Reflection.RuntimeModule.GetTypes()
at System.Reflection.Assembly.GetTypes()
at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.GetRoleEntryPoint(Assembly entryPointAssembly)
--- End of inner exception stack trace ---
at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.GetRoleEntryPoint(Assembly entryPointAssembly)
at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.CreateRoleEntryPoint(RoleType roleTypeEnum)
at Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.InitializeRoleInternal(RoleType roleTypeEnum)
(d10.7e8): CLR exception - code e0434352 (first chance)
(d10.7e8): CLR exception - code e0434352 (first chance)
ModLoad: 00007ff8`a2000000 00007ff8`a20b6000 D:\Windows\SYSTEM32\clbcatq.dll
ntdll!NtTerminateProcess+0xa:
00007ff8`a2150f0a c3 ret
From here on out, things were pretty clear! We weren't using a RoleEntryPoint and the exception was happening very very very early on (think init inside Global.asax). The underlying issue with us was that we had an invalid binding redirect in the web.config. It was invalid because we are using NANT to build the web.config file (long story there...), and someone didn't update the new versions after updating the NuGets. It worked locally because the old DLL was still somewhere in cache.
I haven't been able to reproduce the issue properly just yet, but I will try again soon. When I succeed, I will update this post. Until then, I hope the links I found can help someone struggling with the Role Recycle problem.