Friday, September 4, 2015

Exchange Team Blog: Introducing the Microsoft Office 365 Hybrid Configuration Wizard

Running Exchange 2013 CU8 or higher? Download the new wizard!

The Exchange hybrid team has been working hard over the past year getting the 3rd version of the Hybrid Configuration Wizard (HCW) ready. This new version is called the Microsoft Office 365 Hybrid Configuration Wizard. This article tells you what’s new and shows you how to run the wizard. We also explain the various issues that have been addressed with the new Wizard, and touch on some of the telemetry we pull with every run of the wizard. We think this new wizard has enough of the old to reduce the learning curve while adding plenty of enhancements to make your hybrid deployment as friction free as possible.

Microsoft Office 365 Hybrid Configuration Wizard Stand-Alone Application

This version of the HCW is a standalone application that is downloaded from the service. This is an important change because one of the bigger limitations of the previous versions of the HCW was that it was included with the on-premises product. This led to the following issues:

  • Up-To-Date hybrid experience: When you ran the HCW you got the experience consistent with your on-premises version of Exchange Server. This meant that if you are running Exchange 2013 CU7 you got the CU7 experience. If you ran Exchange 2013 CU9 you got the CU9 experience in HCW. Each customer would have a different HCW experience.

Solution: The new HCW will download the latest version every time it is run, therefore providing the latest and improved experience. As soon as we make changes to, or fix any issues in the HCW, customers will see the benefits immediately.

  • HCW not tied to Cumulative Updates: Since the previous versions of the HCW were part of the on-premises product they were updated per the regular Exchange Serviceability model. This means that the hybrid team had to wait for a new Cumulative Update (Rollup for Exchange 2010) every three months to deliver any enhancements or changes. For a component like hybrid that is a problem, we have to be agile enough to handle changes not just to on-premises, but also in the service.

Solution: Again, every time you attempt to run the HCW we will ensure you have the latest version. This version will of course go through its rounds of validation, but it is in no way tied to the releases of a CU. No more waiting months for fixes!

  • Piloting Changes: As we move forward with this new HCW we will be making some aggressive changes. In the months ahead we want to add more capabilities to HCW. One of the most important changes in HCW will be the ability to roll out feature changes slowly and in a controlled manner.

Solution: We have built in the capability to allow customers who are on “first Release” and any other customers we specify (for example TAP customers) to see the latest version of the HCW. Often the latest release and the production release will be the same version, but we do have the ability to pilot versions of the HCW as needed.

Improvements to error handling

The HCW has a lot of dependencies and relies on various prerequisites for a successful completion. For example, you have to add an external TXT record for the HCW to create the Federation Trust, you have to have your certificates properly installed on your Exchange servers, and you have to have Internet access from your Exchange servers to name a few. I am not trying to scare you away from hybrid, in fact the wizard does walk you through most of the prerequisites. I am instead trying to point out that there are many failure points for the HCW to contend with.

Up till now the solution was to provide you an error message that included a stack trace. These error messages are extremely difficult to decipher and often the first reaction after a couple of failed Internet searches was to call into support. Figure 1 shows the old error Experience for those that may not be familiar with it.

image
Figure 1: Old error messages

Our goal is to allow you to successfully configure without an error, but we also want to make sure that we give you the information needed to get past any hurdles you may face. Figure 2 shows a sample of the new (much more informative) error experience. In the sample you can see the following major improvements to the error experience:

  • Improved Title: We have added the ability to see what Phase and Task were being completed at the time of the failure. For instance, you can know if we failed at the prerequisite check or configuration phase. You will also immediately know if you failed to create the Organization Relationship or Outbound Connectors.
  • Error code: We have added a new error code for all the possible error messages in the new wizard. You will now see all errors prepended with a code HCW8***. This change allows for our errors to be easily searched and it allows them to remain searchable even if we change the context of the errors.
  • Humans can read the errors: One of the previous challenges was that we provided a stack trace as the error message instead of just a friendly actionable string. We now keep the stack trace in the logs for anyone who may want that information.
  • New “More info” feature: We added the “More info…” option under the error message. We have recently associated a KB or TechNet article as the most likely solution to EVERY error message the HCW throws. Simply click the “More Info…” link and you will be taken to that solution
  • Access to log Files: You can easily access the HCW log file by clicking on the link that says “Open Log File”. In addition, you will find the log file on the system were you ran the new wizard from by going to “%appdata%\Microsoft\Exchange Hybrid Configuration”. Keep in mind the old location for the logs in the Exchange install directory is not used.
  • Coolest addition: When you run the HCW you will more than likely have the Exchange Admin Center already open, but there is a chance that if you run into an issue you will need to use either your on-premises or Exchange Online PowerShell. The new HCW error experience includes a link that will open the on-premises and/or Exchange Online PowerShell. We already have the credentials you entered into the wizard, so you can seamlessly open PowerShell by using those credentials. In addition, we open the Exchange Online PowerShell with a blue background and the Exchange on-premises PowerShell with a black background so you can easily differentiate the two.

image
Figure 2: Awesome error experience

Top issues solved by the new HCW

About a year ago we came out with a tool to assist you to troubleshoot your hybrid experience. This tool collects and parses the HCW log and provided a link to an article that gave a solution to your issue. The tool has been run thousands of times and has given us great insights into what the top failure points are for the HCW. This telemetry tells us what we need to focus on and allows us to see any failure trends, but in the end we were limited to the information gathered from folks that ran the HCW troubleshooter.

Because we want to be as helpful as possible, we now by default upload the HCW logs to the service when you run the new wizard. Gathering this data will allow us to serve you better by limiting the amount of time it takes for someone in support to find out more about your environment and it allows us to see any trending issues and failure points that we need to address. Even with the limited amount of logs we have collected from the troubleshooter, we have been able to identify the following issues and are addressing them in the new HCW. I think you will see why the log collection is so important to the hybrid team.

Note: If you want to opt out of uploading the Hybrid logs you can do that by using the registry key below on the machine were you are running the HCW from:
1. Navigate to the following location in the registry, create the path if needed:
Exchange 2016: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ExchangeServer\v16\Update-HybridConfiguration
Exchange 2013: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ExchangeServer\v15\Update-HybridConfiguration
2. Create REG_DWORD “DisableUploadLogs” with value 1

TXT proof string issues

Any time you are required to add an DNS entry you are dealing with a potential for failure. HCW includes a step were you need to add a record to an external DNS to prove to the Azure Authorization Service (known as Microsoft Federation Gateway) that you own the domain. This step may seem trivial but it accounts for ~15% of our HCW failures.

Usually the TXT proof string get messed up in one of two ways:

  • Incorrect string entered: when creating the DNS record to provide domain ownership we often see that the incorrect value was provided. This is in large part due to the way the HCW copied the value. In the previous version of the HCW, when you copied the TXT string, we prepended the words “Domain Proof” so it looked similar to “Domain Proof = t4jnhkjdesy78hrn…”.

Solution: While simple, moving forward we are only going to copy the part of the string that is needed from the “copy link” option in HCW, which should lead to less issues with incorrect TXT strings.

  • Domain name lockouts: The point of providing this TXT string to the external DNS is so the service can validate that you own the domain and federation certificate. After a few failed attempts to validate a domain we lock you out from federating that domain for a few hours. The purpose of this lockout is to prevent a denial of service attack. Often this issue occurs because someone put the wrong value in DNS (see the first bullet), someone created the record and did not wait for replication of the record, or someone created the record in internal (not external) DNS.

Solution: To resolve this we created a new external endpoint in the service that will perform the DNS lookup for the TXT record and only try to federate the domain if the record is correct or if that new service endpoint cannot be found. The logic for this is as follows:

  1. First we try to hit the new external service endpoint and see if the TXT record is resolvable externally and is correct in DNS. If so, we move forward with federating the domain.
  2. If the record is either wrong or not resolvable, we inform you that you need to verify the record and wait for replication.
  3. If the new external TXT validation service is not reachable, we will warn you that we could not verify the TXT record but allow you to continue anyway.

Figure 3 show the new TXT experience you will be getting with the Microsoft Office 365 Hybrid Configuration Wizard.

image
Figure 3: TXT records

Missing Certificate in Wizard

The HCW has a screen that asks you for the “Transport Certificate”. The HCW looks to ensure this certificate is installed on every server that you designated to be part of the Send and Receive Connector Configuration, as shown on the pages in Figure 4.

image
Figure 4: Send and Receive Connector

In order for the certificate to properly display you need to ensure that the following has been completed on all of the servers designated in the wizard pages shown in figure 4:

  • The Certificate must be a third party trusted certificate.
  • The proper names must be on the certificate such as mail.consoto.com or *.contoso.com.
  • The SMTP service must be assigned to the certificate on each of the sending and receiving servers.
  • The certificates must have a private key.

These requirements are nothing new, but if you have a large environment, getting all of this correct on a large number of servers can be a tough task. If even one server was missing any of the requirements, we would fail to show you the certificate. In previous versions of the HCW you were left with a blank screen (see figure 5) which offered no direction or solution.

image
Figure 5: Blank certificate

The Microsoft Office 365 Exchange Hybrid Configuration Wizard experience will not remove the certificate requirements, but it will help you solve the issue. The HCW will now show you a list of certificates that meet the requirements, and it will show you the servers that do not have a proper certificate installed (see figure 6). This will allow you to either remove those servers from the HCW receive and send connector pages, or you can properly install the certificate on those servers.

image
Figure 6: Better certificate error

A more efficient Hybrid experience

One of the things we tried to do with the HCW is ensure that we are performing the various configurations in the most efficient way possible (this is our on-going green effort). A good example of an inefficient task that the HCW previously performed was the Mailbox Replication Service (MRS) enablement process. In the HCW logs collected from the troubleshooter, we could see that this cmdlet was often taking an extremely long time to complete. What we do now, is enable the Migration endpoint on the servers in your environment so that you can start moving mailboxes when the HCW is complete without having to enable the endpoint. One of the cmdlets that we used in the previous version of HCW was get-WebServicesVirtualDirectory. In a larger often geographically dispersed environment this cmdlet could take over eight hours to run. In many cases you would end up getting the following error:

ERROR: Updating hybrid configuration failed with error 'Subtask Configure execution failed: Configuring organization relationship settings. Execution of the Set-WebServicesVirtualDirectory cmdlet had thrown an exception. This may indicate invalid parameters in your Hybrid Configuration settings. Unable to access the configuration system on the remote server. Make sure that the remote server allows remote configuration

Solution: We have resolved this issue in the new HCW using the -ADPropertiesOnly option with Get-WebServicesVirtualDirectory. This changes things so the HCW reads the MRS settings using a local directory call instead of waiting for a response from every server in the environment. This change along with a few others in this area, makes the process take around 15 minutes instead of 8 hours (your deployment times will vary) in these large environments. This is just one example of the type of cleanups we did in the HCW to improve the reliability and speed of the configuration tasks.

Autodiscover issues in HCW

The single most common failure point for the HCW is the inability to retrieve the Federation Information via the Autodiscover call initiated by the Get-FederationInformation cmdlet. The output of this cmdlet is needed in order to create the Organization Relationships so you can do things like free busy sharing. This accounts for nearly 30% of all HCW failures based on the logs collected from the troubleshooter (are you starting to see the importance of these log files?). When looking at the issue there are certain things the wizard cannot directly address. For instance, at times the issues are related to an improperly configured firewall, or someone doesn’t have a third-party certificate for IIS on the Exchange servers. However, a good portion of you have had things configured correctly and still we failed to complete the Get-FederationInformation cmdlet.

One of the things this cmdlet does is use DNS settings from the server you are connected to in order to resolve the Autodiscover endpoint and retrieve the federation information. Many customers do not have a DNS record created for Autodiscover internally since there is often no need for this. The internal Outlook client will use the Service Connection Point to find the Autodiscover endpoint so there is no need for this from an outlook standpoint, however the Get-FederationInformation cmdlet does not use the Service Connection Point. Therefore, if there is no forwarding configured for this zone in DNS the Get-FederationInformation cmdlet will be unable to resolve the autodiscover endpoint and the HCW will fail.

Solution: To resolve this issue, we have added a new method for checking for the federation information. We still try to use local DNS first and if it fails we then will try to hit an external service to see if we can get the federation information externally. This will ensure that if you have Autodiscover published properly externally the HCW will complete as expected. See figure 7 for details:

image
Figure 7: Get-FedInfo

OAuth Integration

Another common failure point is the OAuth portion of the HCW. The HCW today shows you an option to configure OAuth if you are Exchange 2013 native, but not if you coexist with previous versions of the Exchange. OAuth is required for some features today, such as cross premises discovery and automatic archive retention. Because of that, we want to ensure that OAuth is by default configured so all of the Hybrid features work when you complete the HCW.

One downside to this is that the current OAuth configuration experience previously had a high rate of failure. We have gone through and fixed a good portion of the experience and we have also added logic to the new HCW so that if the OAuth portion fails we will disable the OAUTH configuration by disabling the IntraOrganizationConnector and let you know we disabled it and give you remediation steps. This will ensure that a failed OAuth configuration does not prevent other hybrid features such as cross premises Free Busy from working.

Many more…

The above are just a few of the issues that have been addressed with the latest version of the HCW. There are many example that we could have used such as a couple of issues we addressed with mail flow, Multi-Forest deployments, and many more. In this latest version we strived for feature parity, while improvement the failure rate, and allowing for future innovation. We think we have hit the mark.

Running the HCW

Now that we have covered some of the new features and benefits of running the Microsoft Office 365 Exchange Hybrid Configuration Wizard, let’s take a guided tour. We are not going to go through each option in depth as most of them have not changed from Exchange 2013.

How to find the new HCW

We have not moved the location of the HCW in the Exchange Admin Center, the entry point look and feel is consistent with previous version of the Exchange 2013 HCW. The only difference is that instead of calling local code when you click “configure” or “modify” in the hybrid node of EAC, we now initiate the click once application. Figure 8 shows the entry point.

image
Figure 8: Entry Point

HCW Landing Page

The next screen you will see is the HCW landing page, which is a page that serves two purposes. The first and most important purpose is that we can redirect a small subset of customers (based on pre-defined criteria) to an alternate HCW experience. As discussed previously in this blog, this allows us to pilot new features without affecting the production HCW experience. The second benefit of this landing page is that it allows us to provide a proper error message if the browser version, popup blockers, etc. are not configured in a way that would support the HCW. When you are on the landing page you will select the “click here” option to download the HCW. See figure 9 for a view of the landing page.

image
Figure 9: Landing page

Welcome Screen

The Welcome screen (see figure 10) will provide you with a link that will inform you about what a Hybrid configuration is along with an additional link at the bottom that explains what the HCW application is going to do. The Second link is at the bottom-left of the screen and says What does this application do? On this screen you will simply click next to continue.

image
Figure 10: Welcome screen

Server Detection Page

The next screen allows you to choose which server you will use to perform your hybrid configuration. This is the machine that the HCW will remote PowerShell into in order to perform all of the hybrid configuration tasks.

The selected server must be running a version of Exchange that is within two releases of our currently released Cumulative update. This means at launch the new HCW will work if you are connecting to an Exchange 2013 CU8 or newer version of Exchange. However, when Exchange 2013 CU11 releases you will see that we will no longer allow you to run the new HCW from Exchange 2013 CU8 and will require a minimum of CU9. Keep in mind that even though the HCW will allow you to proceed if you are two versions older than the current release (n-2), we actually only support going one version back for Hybrid (n-1).

If for you were to select a server that is running an unsupported version, the HCW will provide you with an error stating that you are not running a supported version. In addition, the HCW will provide you with a list of servers that are running a supported version (if any exist).

image
Figure 11: Unsupported version

The HCW will try to select the best server to perform the configuration tasks from using the following logic:

  1. First we look to see if the server we are on is running the latest supported version of Exchange in the organization.
  2. Next we look to see if there is an existing Exchange server in the site running the latest supported version of Exchange.
  3. Finally, we attempt to connect to an out of site Exchange server (typically in a different geographical location) running the latest supported version.

If you do not like the server selection the HCW made via the above mentioned detection logic you can manually specify the server name that you want to connect to. You can use the short name (ServerName) or the long name (ServerName.Contoso.com) in the provided box to select the appropriate server running the supported version of Exchange.

The last option on this page allows you to select the tenant location. For most the tenant location is simply “Microsoft Office 365” but if your Office 365 is operated by 21 Vianet, you can also use the “21 Vianet” option.

image
Figure 12: Server detection

Credentials page

The main improvement on this page is the fact that we do not force you to type in your on-premises credentials. However, if you are not signed in as the user with the Organization Management Role you can manually override this behavior and provide separate credentials.

image
Figure 13: Credentials page

Connection Status page

We will then show you the connection status window, which will let you know if improper credentials were provided on the previous step. Usually this is a pretty uneventful window and you just click next.

image
Figure 14: Connection status

Mail flow options page

The rest of the questions in the HCW from this page on are related to the mail flow options. The experience and windows you see from this point forward may vary depending on the options selected. For more information on the mail flow options you have please review this article.

image
Figure 15: Mail flow options

Receive and Send Connector Configuration

This page of the wizard allows you to select the Exchange 2013 and/or Exchange 2016 servers that you intend on configuring for sending and receiving mail for your on-premises environment. You can have a mix of 2013 and 2016 servers selected. We do not allow you to choose Exchange 2010 servers from these menus.

image
Figure 16: Receive Connector

image
Figure 17: Send Connector

Certificate selection page

We described the enhancements to this certificate selection page previously in the blog, we covered the experience you will get if a valid certificate cannot be found on any one of the Sending and Receiving servers selected on the previous page (figure 16 and figure 17). This certificate page is what you should expect to see when the certificates are installed properly on all servers. In this case you will get a list of certificates that are meeting all of the requirements and installed on all of the selected servers. In most cases the list includes only one certificate that meets the list of requirements.

image
Figure 18: Certificate

FQDN for Mail Flow

The final question in the wizard will allow the HCW to properly configure the smart host settings on the outbound connector in Exchange Online. You will usually provide the FQDN that matches your MX record in this window.

image
Figure 19: FQDN

Update page

Up to this point in the HCW there has been no modification made to your on-premises or Exchange Online environment. When you select the update option on this page we will start making the modification based on the answers to the questions you provided on the previous screen. Similar to the old version of the HCW we will store those answer in the local Active Directory in a configuration object known as your desired state. We will then read from that configuration object to make the modification.

image
Figure 20: Update

Wrapping this up

The Exchange hybrid configuration process is something that has evolved rapidly over the past few years. We have done a lot over that time to simplify these complex configurations. With this latest version we have continued that trend by adding flexibility for innovation, more HCW stability, better HCW performance, a cleaner configuration experience, and (if needed) a proper error experience. However, our tools and services are built for you so let us know what you think, when you try out the wizard send us feedback through the feedback widget in the HCW. Just look for the “give feedback” link on the bottom of the page in the wizard and please rate the experience.

The Exchange Hybrid Team



from Exchange News Full Article

No comments:

Post a Comment