I recently performed an upgrade from TFS 2012 to TFS 2013 update 4 in production. The upgrade went very smooth until the next day when users began to come online. As traffic started to ramp up, CPU utilization on the application tier servers started to bounce up and down causing significant user impact. Looking at the event log, I could see where IIS was restarting the application domain over and over due to a detected change within the file system. Initially, we thought the cause was McAfee. We validated exclusions and even added some additional exclusions without success. McAfee was an interesting place to start, but not the root cause as I suspected.
Here is what the event log on both app tier servers revealed:
The application is beginning to shutdown.The application is being shutdown for the following reason: ConfigurationChange
This event log behavior continued and even increased as load increased on the servers effectively making the situation worse.
I eventually opened a call with MS. The engineer indicated they have seen this quite a few times over the last several months. I found a few users via Google who had similar issues, but no recommended fix. After talking with the MS engineer, he suggested moving the TFS cache directory out from under the Web Services folder. I was desperate at this point and open to anything, so I moved it up one folder and out from under Web Services. After making the move through the TFS Admin Console and restarting IIS, the issue immediately went away. It has been over an hour since making the change and no issues so far.
This appears to be the recommended fix for the issue at this point in time. I do not fully understand why MS would default the cache folder to the same location as the TFS web code in IIS. Moving the cache location also excludes it from the internal IIS file system watchers.
No comments:
Post a Comment