Quantcast
Channel: Clustering and High-Availability
Viewing all 46 articles
Browse latest View live

Cluster Resources

$
0
0

Hi Cluster Fans,

 

Resources can be hard to find, so every few months we will be publishing an updated list of useful documents, guides and information to this blog (http://blogs.msdn.com/clustering/). 

 

If there is any other useful Microsoft content which you feel is missing, let us know by clicking the 'email' link in the upper right corner of the page and send us the resource and URL.

 

Thanks,
Symon Perriman
Program Manager
Clustering & HA

Microsoft

 

Useful Sources

·         Blog: Cluster Team: http://blogs.msdn.com/clustering/

·         Blog: Ask Core: Clustering

·         Training: Course 6423a: Implementing and Managing WS08 Clustering

·         Website: Cluster Technical Resources

·         Website: Cluster Information Portal

·         Website: Clustering Forum (2008)

·         Website: Clustering Forum (2008 R2)

·         Website: Clustering Newsgroup

 

 

Windows Server 2008 R2

·         Blog Guide: Deploying Cluster Shared Volumes (CSV)

·         Blog Guide: Cluster Shared Volumes (CSV): Disk Ownership

·         Blog Guide: PowerShell for Network Load Balancing (NLB) in Windows Server 2008 R2

·         Blog Guide:  PowerShell for Failover Clustering in Windows Server 2008 R2

·         Blog Guide: Live Migration Traffic

·         Blog Guide: How to manually defrag or ChkDisk a CSV disk

·         TechNet Guide:Using Live Migration in Windows Server 2008 R2

·         TechNet Guide: What’s new in R2 Clustering

·         Webcast: Innovating High Availability with Cluster Shared Volumes (CSV)

·         Webcast: Failover Clustering Feature Roadmap in WS08 R2

·         Whitepaper:Windows Server 2008 R2 & Microsoft Hyper-V Server 2008 R2 - Hyper-V Live Migration Overview & Architecture

·         Webcast: Windows Server 2008 R2 Live Migration

·         Webcast: Clustering in a Virtual World

·         Website: Clustering Forum (2008 R2)

 

 

Architecture

·         Blog Guide: Cluster Virtual Adapter (NetFT)

·         Blog Guide: PlumbAllCrossSubnetRoutes

·         Whitepaper: Failover Cluster Architecture Overview

 

 

Core

·         Guide:Server Core

·         TechNet: Installation

·         Utility:Remote Server Administration Tools (simplifies Server Core configurations)

·         Webcast:How Microsoft does IT: Enhancing High Availability with Server Core in Windows Server 2008

 

 

Deployment / Migration / Upgrade

·         Blog Guide:Migration Options for Hardware

·         Blog Guide: PrintBRM Error 0x80070043 workaround

·         Blog Guide:DHCP Database migration to Windows Server 2008

·         Blog Guide:PrintBRM.exe, 0×80070043 and Print Clusters - A Workaround

·         KB Guide: Exchange 2003: Move Mailbox 

·         KB Article: SQL Server 2008 Rolling Upgrades

·         KB Article: Cluster Nodes as Domain Controllers (DCs)

·         MSDN Guide: SQL Upgrade Paths

·         MSDN Guide: SQL Cluster Upgrade

·         TechNet Guide: Migrating Cluster Settings

·         TechNet Guide: Failover Clustering Deployment

·         TechNet Guide: Validating Hardware for a Failover Cluster

·         TechNet Guide: Installing a Failover Cluster

·         TechNet Guide: Creating a Failover Cluster

·         TechNet Guide: Cluster Requirements

·         TechNet Guide: Validating a cluster

o   Blog Guide:Validation Warning: Teredo

o   Blog Guide:Validation Warning: Patch GUID

 

·         TechNet Guide: Configuring Accounts in Active Directory

·         TechNet Guide:Recommended Clustering Hotfixes (2003)

·         TechNet Guide:Recommended Clustering Hotfixes (2003 SP2)

·         TechNet Guide:Recommended Clustering Hotfixes (2008)

·         TechNet Guide: Exchange 2007 Overview

·         TechNet Guide: Exchange 2007 Cmdlets

·         TechNet Guide: Print Migration Overview

·         TechNet Guide: UI: Print Migration Tool/Wizard

·         Utility: File Server Migration Toolkit (FSMT) (2008)

 

Exchange Server

·         Lab:TechNet Virtual Lab: Exchange Server 2007 Standby Continuous Replication

·         Lab:TechNet Virtual Lab: Using Cluster Continuous Replication (CCR) in Exchange 2007

·         TechNet:Installing Cluster Continuous Replication (CCR) on 2008

·         TechNet:Deploying Exchange 2003 in a Cluster

·         TechNet: Planning for Cluster Continuous Replication (CCR)

·         TechNet: Installing CCR on Windows Server 2008

·         TechNet: How to create an Exchange SCC Failover Cluster with CMD

·         Webcast:How Microsoft IT Implemented New Storage Designs for Exchange Server 2007

·         Webcast: Exchange 2007 High Availability Deep Dive

 

 

File Server

·         Blog Guide: File Share ‘Scoping’ in Windows Server 2008 Failover Clusters

·         Blog Guide: Share Subdirectories in Windows Server 2008

·         TechNet Guide: Configuring a Two-Node File Server Failover Cluster

·         TechNet Guide: Creating a Clustered File Server checklist

·         TechNet Guide: Create a Shared Folder in a Clustered File Server

·         WebCast:TechNet Webcast: Prepare Yourself for Windows Server 2008 (Part 5 of 8): New File Server Features

·         WebCast: How Microsoft IT Deploys Windows 2008 Clusters for File Services

·         Webcast: New File Server Features of Windows Server 2008 (Level 200)

 

Hyper-V

·         Blog Guide: Deploying a HA Virtual Machine (2008)

·         Blog Guide: HA Virtual Machine Deployment Considerations (2008)

·         Blog Guide: Network Load Balancing (NLB) and Virtual Machines

·         Blog Guide: Adding a Pass-Through Disk to a HA VM

·         Blog Guide: SCVMM: Intelligent Placement

·         Blog Guide: Monitor Network Traffic for a VM on a Cluster

·         TechNet Case Study:How Microsoft IT Designs the Virtualization Host & Network Infrastructure

·         TechNet Case Study: Best Practices for Deploying VMs using Hyper-V

·         TechNet Guide: Getting Started with Hyper-V

·         TechNet Guide: High-Availability for a Server Running Hyper-V

·         TechNet Guide: Design for a Failover Cluster in Which All Nodes Run Hyper-V

·         TechNet Guide: Requirements and Recommendations for Failover Clusters in Which All Nodes Run Hyper-V

·         TechNet Guide: Failover Cluster in which the Servers run Hyper-V

·         TechNet Webcast: 24 Hours of Windows Server 2008 (Part 24 of 24): High Availability with Hyper-V

·         TechNet Webcast: Creating Business Continuity Solutions Using Windows Virtualization

·         TechNet Webcast:High Availability with Hyper-V

·         Webcast: Top 10 VMWare Myths, including CSV and live migration

·         Webcast: Hyper-V Quick Migration on a Failover Cluster

·         Whitepaper: Quick Migration with Hyper-V

·         Whitepaper: Testing Hyper-V and Failover Clustering

 

 

Miscellaneous Resources

·         Blog Guide: Add a New Disk to a Cluster (2008)

·         Blog Guide: Configuring Auditing for a Cluster (2008)

·         Blog Guide: Cluster Recovery (2003)

·         KB Article:The Microsoft Support Policy for Windows Server 2008 Failover Clusters

·         TechNet Guide: Configuring the Quorum in a Failover Cluster

·         TechNet Guide: Managing a Failover Cluster

·         TechNet Guide: Modifying Settings for a Failover Cluster

·         TechNet Guide: The Failover Cluster Management Snap-In

·         TechNet Guide:Understanding Backup and Recovery Basics for a Failover Cluster

·         TechNet Guide: Support Policy

·         TechNet Guide: Windows Server 2008 Itanium / IA64 support

·         Webcast: Top 10 Windows Server 2008 Failover Clustering Enhancements over Windows Server 2003 Clustering, Based on Best Practices (Level 300)

·         Webcast: Failover Clustering 101

·         Webcast: Achieving High Availability with Windows Server “Longhorn” Clustering (Level 200)

·         Whitepaper: Microsoft’s HA Strategy

·         Whitepaper: Overview of Failover Clustering

·         Whitepaper: HA with Microsoft MPIO (2003, 2008)

·         Website: Windows Logo site

·         Webcast: Introduction to Failover Clustering

 

Multi-Site Clustering

·         Cluster Team Site:http://www.microsoft.com/windowsserver2008/en/us/failover-clustering-multisite.aspx

·         KB Article: Deployment Considerations for Windows Server 2008 failover cluster nodes on different, routed subnets

·         Webcast: TechNet Webcast: Geographically Dispersed Failover Clustering in Windows Server 2008 Enterprise

·         Webcast: How You Can Achieve Greater Availability with Failover Clustering Across Multiple Sites (Level 300) 

·         Whitepaper: Multi-site Clustering

·         Webcast: Multi-Site Clustering in Windows Server 2008

 

 

Network Load Balancing

·         Blog Guide: Network Load Balancing (NLB) and Virtual Machines

·         KB Article: NLB Troubleshooting Overview

·         KB Article: Create/manage/destroy NLB clusters via NLB Manager remotely from another server, or from RSAT client (admin pack) on Vista

·         Presentation: Server Core: Install the NLB feature

·         TechNet Guide: Configuring NLB with Terminal Services

·         TechNet Guide: NLB Deployment Guide

·         TechNet Guide: Implementing a new NLB Cluster

·         TechNet Guide:Verifying the NLB Cluster and Enabling Client Access

·         TechNet Guide: Overview of NLB

·         TechNet Guide: Creating NLB Clusters

·         TechNet Guide: Managing NLB Clusters

·         TechNet Guide: Setting NLB Parameters

·         TechNet Guide: Controlling Hosts on NLB clusters

·         TechNet Guide:Troubleshooting for System Event Messages Related to NLB Cluster

·         TechNet Guide: User Interface: NLB Manager

·         TechNet Guide:Upgrading a NLB Cluster

·         TechNet Guide: Upgrading a Network Load Balancing (NLB) Cluster

·         Webcast:24 Hours of Windows Server 2008 (Part 23 of 24): Failover Clustering and Network Load Balancing z

 

 

Other Resources / Workloads

·         Blog Guide: Configure Multiple Instances of MSDTC (2008)

·         Blog Guide: Installing MSDTC (2003)

·         Blog Guide: Optimize Print Cluster (2003)

·         Blog Guide: Creating and Configuring a Generic Application Resource

·         TechNet Guide:Configuring Generic Resources

·         TechNet Guide: Configure a Service or Application for High Availability

 

 

Scripting

·         Blog Guide: Creating a Cluster using WMI

·         Blog Guide: CLI: Cluster Resource Groups

·         Blog Guide: CLI: Quorum

·         Blog Guide: CLI: Disk Resources

·         Blog Guide: CLI: Cluster Creation

·         Blog Guide: CLI: Adding Disks

·         TechNet: How to create an Exchange SCC Failover Cluster with CMD

 

 

SQL Server

 

Utilities

·         Utility: Failover Cluster Management Pack for Operations Manager 2007

·         Utility: ClusPrep: Cluster Configuration Validation Wizard (2003)

·         Utility:RSAT Remote Server Administration Tools (simplifies Server Core configurations)

·         Utility: File Server Migration Toolkit (FSMT) (2008)

 


Network Load Balancing in R2: Extended Affinity

$
0
0

Hello!

 

I am Rohan Mutagi.  My job at Microsoft is to do something that everyone likes: criticize others J, specifically, other people’s code.  Yes, I am a tester and it’s my role to find bugs in Network Load Balancing (NLB).  Over the next few months I will be blogging more about changes that NLB went through in Windows Server 2008 R2.  In this blog, I will focus NLB Extended Affinity (TCP). 

 

 

What is Extended Affinity?

To understand how NLB does load balancing, please refer to this TechNet article about various forms of affinity and their impact on load balancing decisions.

 

Extended Affinity is an extension to the Single and Network affinity.  NLB does not rely on any network protocol’s state to make its load balancing decisions.  As a result, NLB will work with a wide variety of protocols, documented and undocumented, stateless (HTTP/UDP etc.) and stateful (RDP/SSL etc).  This makes NLB more flexible in deployment and easier to manage since we don’t have to configure the load balancer to work with every protocol that it needs to handle.  However, some applications would benefit from being able to explicitly associate a connection to a server.

 

An example would be using IIS by online retailer using shopping carts.  When a customer shops at their store, they save the intended purchases in a shopping cart which is stored on one of the nodes in the cluster.  To keep the products in the shopping cart, the customer must stay connected to that same node.  However configuration changes to the cluster (such as adding a new VIP or node) which cause cluster convergence may then directed customers to another cluster node, and they have lost the purchases saved in that shopping cart.  Now the customer may become frustrated and the retailer may lose money.

 

Another instance could be with SSL where the SSL session can consist of multiple TCP Connections.  In normal operations, if single affinity is used, NLB will guarantee that all connections coming from the same source IP will hit the same server.  This would include multiple TCP connections of the same SSL session.  However, configuration changes might cause NLB to accept different connections of the same SSL session by different servers during the convergence.  As a result, the SSL session is broken.

 

With Extended Affinity, NLB provides the ability to associate a client connection to a NLB server over re-convergence.  This association holds true until the timeout specified by admin for the given port rule expires without any new traffic on the same connection.

 

Scenario

1.       We have a 2 node NLB cluster. (VIP: 2.2.2.2)

2.       Web browser Client (1.1.1.1) connects via SSL to NLB VIP (2.2.2.2).

3.       That particular connection is handled by IIS Server on NLB NODE1.

4.       Client Requests a web page that involves filling a web form.

5.       Client spends 20 minutes filling this form that would, once submitted, need to be stored on NODE1.

6.       In the meantime, on the server, admin adds a new node (NODE3) to the NLB Cluster.

7.       Now the connection (1.1.1.1 -> 2.2.2.2) is owned by NODE3

8.       The client submits his web form.

 

 

Without Extended Affinity

9.       Since the ownership of the connection (1.1.1.1 -> 2.2.2.2) has moved to NODE3, The server rejects the packet from the client.

10.   The browser tries to re-establish the SSL connection and this time hits a new server

11.   The new server will reject the “form data” that the browser provides since there is no authentication for this client on this node (NODE3). Thus the data that client filled-in is lost.

 

 

With Extended Affinity

9.       The server notices that stickiness in enabled for that particular connection (1.1.1.1 -> 2.2.2.2) and will route the connection to the correct owner of the connection (NODE1) despite the configuration change that caused the connection ownership to move to the new node (NODE3).

10.   The browser successfully communicates with the server and the transaction completes.

 

 

 

Applying Extended Affinity

The following sections detail how to use Extended Affinity in your Windows Server 2008 R2 NLB Cluster.

 

Using NLBManager

Extended Affinity can be modified by following the below steps:

 

1.       Right click on the cluster and Select "Cluster Properties"

 

       

 

  

2.       In the Cluster properties dialog box, Click on the "Port Rules" tab.

 

 

 

 

3.       Choose the appropriate port rule and click Edit:

 

 

 

 

4.       Select the appropriate affinity and set the "Timeout" value to required value. Click OK.

 

          

 

 

5.       Now you should see the new “Timeout” to be the set amount (10 mins).

 

          

 

 

Using PowerShell

Using Powershell, you can set the timeout for the default port rule using the CMDlet Set-NLBClusterPortRule.  For more information about using PowerShell with NLB, visit: http://blogs.msdn.com/clustering/archive/2008/12/26/9253786.aspx.

 

The below CMDlet will display all the port rules that are configured on a cluster on the current machine.  The “Timeout” shows the currently configured “Extended Affinity” timeout.  If this value is set to 0, that would mean that Extended Affinity is currently not enabled for the given port rule.  The below example shows that the timeout for all the port rules is set to 0.  This means that Extended Affinity is not enabled on any of the 3 port rules.

 

       

 

 

Now let’s enable Extended Affinity for the 2nd port rule using PowerShell.

 

1.       Get the required port role using the Get-NlbClusterPortRule command we used above, but this time lets add a filter to find a port rule that is configured on a port 443 and bound to cluster on network interface Test-4

 

         
 
 

2.       Apply Extended Affinity to this port rule by using Set-NlbclusterPortRule to modify its timeout value.

 

       

 

 

Review:

Get-NlbClusterPortRule -Port <YourPortNumberHere> -InterfaceName <NetworkInterfaceName> | Set-NlbClusterPortRule –NewTimeout <NewTimeoutValueInMinutes>

 

That concludes the overview of the new Extended Affinity feature for NLB in Windows Server 2008 R2.  Thanks for reading this blog post.  If you have any questions feel free to contact is by clicking the ‘Email’ link on the upper right-corner of the page.

 

Thanks,

Rohan Mutagi
Software Development Engineer in Test
Clustering & High-Availability
Microsoft

PowerShell for NLB: Part 1: Getting Started

$
0
0

Hi NLB Fans,

 

NLB provides users with various methods to manage clusters.  In Windows Server 2008, there are 3 ways to manage an NLB cluster:

 

1.       Network Load balancing Manager GUI (nlbmgr.exe)

2.       NLB command line tool (Nlb.exe)

3.       NLB WMI Provider (root\MicrosoftNLB namespace)

 

In Windows Server 2008 R2, the NLB team has introduced a PowerShell interface for configuring, managing and debugging NLB.  This awesome new feature makes it very easy to administer systems in an automated way.

 

In this blog post we will explore NLB's support for PowerShell.  We will elaborate on the original post PowerShell for NLB, providing more details on naming mechanism, samples and CMDlet discovery.

 

This blog post contains the following sections:

 

·         PowerShell Naming convention

·         Exploring NLB CMDlets

o   Using Get-Command

o   Using command Auto-completion

o   Using Argument auto completion

o   Getting examples to use

 

Future blog posts in this series will discuss:

·         NLB common scenarios

·         Basics of Debugging NLB with PowerShell

 

NLB PowerShell follows the PowerShell CMDlet guidelines in naming and execution of the NLB CMDlets. Here we will explore the general naming conventions that will make it easy to further understand and explore NLB CMDlets.

 

PowerShell Naming Convention

A CMDlet is made up of two parts a Noun and a Verb. These two parts of speech are combined together with a hyphen in between. A NLB example would be:

 

PS > Get-NlbCluster

  

The ‘Get’ example above is split into 2 parts, the verb (Get) and the noun (NlbCluster), and these 2 words are separated by a hyphen.  As rule of thumb, the verb defines the action to be performed on the noun.  In the above example, we want to "Get" all instances of "NlbCluster".

 

To view all the NLB CMDlets, run PS > Get-Command –module NetworkLoadBalancingClusters

 

 

 

A list of all the NLB supported verbs can be seen below:

 

 

 

A list of all the NLB supported nouns can be seen below:

 

 

 

Exploring NLB CMDlets

PowerShell makes it quite easy to use CMDlets, even if you have no prior knowledge of the NLB CMDlets.  PowerShell provides two main features that help with exploring/learning CMDlets.

 

Get-Command

You can use Get-Command to explore existing CMDlets that are available. This CMDlet, in conjunction with the knowledge of Verb-Noun pairing is a powerful way to getting to the CMDlet of interest.

 

Quick Syntax for this command

> Get-command -module NetworkLoadBalancingClusters [-Noun | -Verb <String>]

> Get-command <CommandFilter> -commandtype <commandtype>

 

Example usage

Let say we want to delete a node from the current cluster.  We know our end goal, but don’t know how to achieve it via PowerShell.  Using the above syntax we can try to reach our goal.  So the action we want to perform is "delete", and the noun that we want to act on is "NLB Cluster Node".

 

First we try to find all commands that start with "delete" verb and are of type CMDlet, by running > Get-Command delete-* -commandType cmdlet, but do not find any results.

 

Instead of "delete" let’s try “Remove".  Below we see that we found the CMDlet we are looking for.

 

 

 

We could have approached this in a different way.  We could have searched for the noun "Node" and filtered further on the exact verb.

 

 

 

As we can see with the above examples, we can intuitively guess the Verb-Noun pair for the NLB operation we want to perform, and use the Get-Command CMDlet to get the exact CMDlet.

 

The list below shows the usage of Get-Command to list out all the supported NLB CMDlets:

 

 

 

Command Auto-Completion

Another way to find out what CMDlets exist is to use the command auto-completion key <TAB>.

PowerShell provides a feature where the arguments of a CMDlet autocomplete.  

 

Examples

1.       Open PowerShell window with the NLB modules loaded.

2.       Type Add-NLBCluster<Tab>

 

This will automatically complete the above CMDlet, and display "Add-NLBClusterNode" on the screen.

 

This is another handy way to see what all CMDlets are supported.  Another example would be:

Start-NLB<TAB> would display Start-NLBCluster

Hitting <TAB> again, would display Start-NLBClusterNode

 

Argument Auto-Completion

Now that we know how to find the CMDlet of interest, let’s see how we further use this information to formulate the exact command that we need to execute.  PowerShell supports automatic expansion of the command arguments.  Once you have typed in a CMDlet you can type a hyphen (-) and hit <TAB> key to automatically expand the available arguments for the given CMDlet.

 

Examples

1.       Open a PowerShell Window with the NLB Module loaded

2.       Enter > Get-NLBCluster-<TAB>

3.       You will see that the “HostName” parameter will be auto-completed

4.       Hit <TAB> again and you will see the text “InterfaceName” appear by the text prompt.

 

Using the <TAB> you can cycle through all the available arguments that the give CMDlet supports.  If you went past an argument while hitting <TAB>, you can go back to it using the <SHIFT+TAB> key sequence.

 

This auto-completion can be further “filtered” by typing the first few characters of the argument you are interested in.  For example, if I want to look for a parameter “InterfaceName”, you can try the following:

1.       Open a PowerShell Window with the NLB Module loaded

2.       Type “Get-NLBCluster -i“ <TAB>

3.       This will directly show you all the CMDlets that begin with the letter “I”, in this case “InterfaceName”

 

 

Get-Help

 As you may know from the ‘Help Documentation’ section of the earlier NLB blog post that the Get-Help CMDlet is incredibly powerful.

 

The final thing that I would like to bring up in this section is the use of the –example argument for the help.  As the name suggests, you can quickly see the examples of a given CMDlet via the “-example” argument.

 

Example

 

 

 

Another awesome support option is the “-Online” option. This will launch the web browser with online content that is up-to-date with the latest information regarding the CMDlet (of course, this may not work if you are using a Server Core installation which cannot access Internet Explorer).

Example:

> Get-Help –Online New-NlbCluster


 

Rohan Mutagi & Gary Jackman
Clustering & High-Availability Test Team
Microsoft

PowerShell Help Online & Management Pack Updates for Failover Clustering & NLB

$
0
0

Hi Cluster Fans,

 

We have added Windows Server 2008 R2 PowerShell help on TechNet and have updated our Management Packs to support 2008 R2 for System Center Operation Manager (SCOM), for both Failover Clustering and Network Load Balancing.

 

PowerShell Help Online 

With PowerShell help online you can see the same information as the inbox Get-Help CMDlet in an easier to browse format, with more examples and information added over time.  The website can also be launched in your default web browser directly from PowerShell, assuming your machine has a web browser.  This is done by adding -Online to the Get-Help CMDlet.

 

Failover Clustering

The main site for PowerShell for Failover Clustering is at: http://technet.microsoft.com/en-us/library/ee461009.aspx

 

Here’s an example for Failover Clustering:  PS > Get-Help Test-Cluster -Online

 

Network Load Balancing

The main site for PowerShell for Network Load Balancing is at: http://technet.microsoft.com/en-us/library/ee817138.aspx

 

Here’s an example for Failover Clustering:  PS > Get-Help New-NLBCluster -Online

 

SCOM Management Pack Updates

We’ve updated our Management Packs for System Center Operation Manager (SCOM) for both Failover Clustering & NLB to add news features and support Windows Server 2008 R2. 

 

Failover Clustering

The SCOM Failover Clustering Management Pack provides both proactive and reactive monitoring of your Windows Server 2003, Windows Server 2008 and Windows Server 2008 R2 cluster deployments. It monitors Cluster services components—such as nodes, networks, and resource groups—to report issues that can cause downtime or poor performance.

 

The main site for the Management Pack for Failover Clustering is at: http://www.microsoft.com/downloads/details.aspx?FamilyId=AC7F42F5-33E9-453D-A923-171C8E1E8E55

 

Some of the improvements include:

·         Support for discovery and monitoring of Windows Server 2008 R2 clusters and functionality such as Cluster Shared Volumes

·         MP scalability improvements (the MP supports monitoring of 300 resource groups per cluster)

·         Noise reduction, for example clustered resources are no longer discovered and monitored by default (resource groups are monitored by default)

·         Configuration or hardware issues that interfere with starting the Cluster service

·         Alerts about connectivity problems that affect communication between cluster nodes or between a node and a domain controller

·         Active Directory Domain Services (AD DS) settings that affect the cluster; for example, permissions needed by the computer account that is used by the cluster

·         Configuration issues with the network infrastructure needed by the cluster; for example, issues with Domain Name System (DNS)

·         Issues with the availability of a cluster resource, such as a clustered file share

·         Issues with the cluster storage

 

Network Load Balancing

The SCOM Network Load Balancing (NLB) Management Pack provides discoveries, monitors, alerts, and warnings to help the operator understand the state of NLB clusters and NLB servers running Windows Server 2008 and Windows Server 2008 R2. The Management Pack can provide early warnings that an operator can use to proactively monitor the state of the NLB servers in the computing environment.

 

The main site for the Management Pack for Network Load Balancing is at: http://www.microsoft.com/downloads/details.aspx?FamilyID=dc17e093-bdd7-4cb3-9981-853776ed90be

 

Some of the improvements include:

·         Support for discovery and monitoring of Windows Server 2008 R2 NLB clusters

·         Monitor the NLB Node status

·         Based on the status of individual cluster nodes, determine the overall state of the cluster.

·         Where an integration management pack exists, determine the health state of a cluster node by looking at the health state of the load balanced application, such as IIS

·         Alert on errors and warnings that are reported by the NLB driver, such as an incorrectly configured NLB cluster

·         Ability to the node out of the NLB cluster if the underlying load-balanced application becomes unhealthy, and add the node back to the cluster when the application becomes healthy again

·         Noise reduction on some alerts

 

Enjoy these improvements to your clustering experience!

 

Thanks,

Symon Perriman

Program Manager II

Clustering & High-Availability

Microsoft

PowerShell for NLB: Common Scenarios

$
0
0

Hi,

 

This is the second blog is our series of posts on PowerShell for Network Load Balancing (NLB).  The first post introduces you to the CMDlets: http://blogs.msdn.com/clustering/archive/2009/10/28/9913877.aspx

 

Most of NLB CMDlets have the following common parameters.

 

 -InterfaceName

Specifies the interface to which NLB is bound

 -NodeName

Specifies the name of the cluster node that you want to manage

 

Most CMDlets require reference to a Cluster object.   To get a Cluster object you can run Get-NLBCluster and pass the output object to the desired CMDlet or use the -interfaceName parameter. 

 

We will discuss running CMDlets and using the output as input of another CMDlet in future posts.

 

Creating a New Cluster

New-NLBCluster

A new cluster can be created via NLB using New-NLBCluster CMDlet. This is a synchronous command, meaning that it will only return after completing the operation.  You can also use this CMDlet to create a new cluster on remote nodes.  To achieve this, the managing system must have Windows Server 2008 R2 installed and the cluster node must be Windows Server 2008 or higher.

 

New-NLBCluster has the following parameters of interest.

 

 -InterfaceName

Specifies the interface to which NLB is bound

 -ClusterPrimaryIP

The clusters primary IP address. More IP addresses can be added via Add-NLBClusterVIP

 -HostName

We can create a cluster on a remote machine by passing the machine name here

 -ClusterName

Specifies the name of the new cluster (optional)

 -DedicatedIP

This will add a dedicated IP address to the stack that can be used to reach this machine directly

 -OperationMode

The cluster operation mode can be one of the following: unicast, multicast, igmpmulticast

 

 

Example

 

 

 

Adding Nodes to a Cluster

Add-NLBClusterNode

Once a cluster has been created, we may want to add more nodes to the cluster. This can be achieved via the Add-NLBClusterNode CMDlet.

Parameters of interest:

 -InterfaceName

Specifies the interface to which NLB is bound

 -HostName

We can create a cluster on a remote machine by passing the machine name here

 -NewNodeName

The name of the new node that needs to be added to the cluster

 -NewNodeInterface

Interface on which we want to bind NLB on the new node

 

Example

 

 

Managing Port Rules

Set-NLBClusterPortRule

After creating a new NLB cluster you may want to modify the port rules before adding any nodes.  To do so you will want to use the Set-NLBClusterPortRule CMDlet.

 

Set-NLBClusterPortRule will modify existing port rules.  For example, when creating a new cluster, the default port rule is added.  If you want to customize the port rule you can either delete the existing port rule or modify the existing port rule.  Modifying the existing port rule is the best approach because you run only one command rather than two commands.

 

Set-NLBClusterPortRule has the following parameters that I believe are the most useful.   As always, for detailed help on this please run Get-HelpSet-NLBClusterPortRule.

 

 -NewStartPort

Specifies the new start port for the cluster port rule. The acceptable range is between 0 and 65535

 -NewEndPort

Specifies the new end port for the cluster port rule. The acceptable range is between 0 and 65535

 -NewAffinity

Specifies the new affinity for the cluster port rule. There are three possible values for port rule affinity: none, single, and network

 -NewIP

Specifies the new IP address for the cluster port rule

 -NewTimeout

Specifies the new timeout in minutes for the cluster port rule. The acceptable range is between 0 and 240

 -InterfaceName

Specifies the interface to which NLB is bound

 -Port

Specifies a port number within the port rule to set

 

Example

This shows how to change the port rule:

 

 

 

The previous example assumes that only one port rule exists prior to modifying the port rule.  If multiple port rules exist prior to running the command and you wanted to modify the StartPort or EndPort,  you will get an error because the port ranges (as specified by the start port and end ports) overlap.

 

Example

If you want to modify the port range, you should use the -port parameter:

 

 

 

You may have noticed that the example shows changing affinity instead of the port range.   I did this to set up for the next example where I change the affinity to single affinity on both port rules. 

 

 

 


Managing Cluster Nodes

Set-NLBClusterNode

To manage NLB node properties such as host priority, initial host state or persisted suspend state, you need to use Set-NLBClusterNode.

 

 -HostPriority

Specifies the host priority or host ID for the cluster node. The value should be between 1 and 32

 -InitialHostState

Specifies the initial host state for the cluster node. The value is either started, stopped, or suspended

 

 By default Set-NLBClusterNode manages only one node at a time.  For example, when running a command from one of the nodes the local host is the node that is managed.

 

 

  

If you want to run a command that executes on all nodes you can first run the Get-NLBClusterNode and redirect the output to Set-NLBClusterNode.

 

 

  

To view all node properties you can run the following Get-NLBClusterNode and pipe the output through Format-List CMDlet.

                                                          

  


 

Controlling Cluster Nodes

 Start-NLBClusterNode & Stop-NLBClusterNode

To control the state (such as stop or start) of the cluster or a node there is a CMDlet for the respective action or "verb" and the respective object.  For example to stop a cluster you could run Stop-NLBClusterNode while Start-NLBClusterNode CMDlet will start the specific cluster node.

 

The CMDlet I want to discuss here is the Stop-NLBClusterNode command, specifically the parameter, -Timeout.  This new parameter lets you control the time you want to wait before forcing the Stop operation on the node. Now you don’t have to wait for Drain to complete, before doing a stop. You can simply run this command with a timeout value, like in the example below.

 

In creating the CMDlets we combined stop and drainstop in to one CMDlet, Stop-NLBCluster and Stop-NLBClusterNode.

 

 -Drain

Drains existing traffic before stopping the cluster node

 -Timeout

Specifies the number of minutes to wait for the drain operation before stopping the cluster node

 

Example

This example will do the following:

1.       Drain all the connections on the Cluster

2.       If there are no outstanding connections, stop the cluster immediately

3.       If all connections are not drained in less than 10 minutes, force stop the node, breaking all existing connections to that particular node.

 

 

 

 

Debugging NLB with PowerShell

Get-NLBClusterDriverInfo

The NLB team has added an awesome CMDlet, Get-NLBClusterDriverInfo, this CMDlet is a replacement for the nlb.exe binary that you may have used. This is a loaded CMDlet with lots of options. Note, this CMDlet does not provide any remoting capabilities, so it does not take hostname as input parameter.

 

1.       Getting the Cluster configuration: When this CMDlet is run without any arguments, it returns the basic cluster configuration on the current machine.

 

 

 

2.       We can determine if a given connection will be handled by the current node using the -filter argument.  This argument requires the following additional arguments to be set:

 -ClientIP

IP Address of the client in question

 -ClientPort

If known, the client source port. This can be set to 0, if unknown

 -ServerPort

The destination port of the server. Example, http could be on 80

 -ServerIP

The server's IPAddress. For incoming connections, this means the VIP

 

In the following example, we are checking to see if a TCP connection coming from client: 1.1.1.1 will be accepted by the NLB server on Port 80, whose VIP is 1.1.1.2

 

 

 

Stay tuned for more NLB PowerShell information!

 

 

Thanks,

Rohan Mutagi & Gary Jackman
Clustering & High-Availability Test Team
Microsoft

Site-aware Failover Clusters in Windows Server 2016

$
0
0

Windows Server 2016, debuts the birth of site-aware clusters. Nodes in stretched clusters can now be grouped based on their physical location (site). Cluster site-awareness enhances key operations during the cluster lifecycle such as failover behavior, placement policies, heartbeating between the nodes and quorum behavior. In the remainder of this blog I will explain how you can configure sites for your cluster, the notion of a “preferred site” and how site awareness manifests itself in your cluster operations.

Configuring Sites

A node’s site membership can be configured by setting the Site node property to a unique numerical value.

For example, in a four node cluster with nodes – Node1, Node2, Node3 and Node4, to assign the nodes to Sites 1 and Site 2, do the following:

  • Launch Microsoft PowerShell© as an Administrator and type:

Technical Preview 4

(Get-ClusterNode Node1).Site=1
(Get-ClusterNode Node2).Site=1
(Get-ClusterNode Node3).Site=2
(Get-ClusterNode Node4).Site=2

Technical Preview 5

In Technical Preview 5 the syntax to configure sites has changed to the following:

#Create Site Fault Domains
New-ClusterFaultDomain –Name Seattle –Type Site –Description “Primary” –Location “Seattle DC”
New-ClusterFaultDomain –Name Denver –Type Site –Description “Secondary” –Location “Denver DC”

#Set Fault Domain membership
Set-ClusterFaultDomain –Name Node1 –Parent Seattle
Set-ClusterFaultDomain –Name Node2 –Parent Seattle

Set-ClusterFaultDomain –Name Node3 –Parent Denver
Set-ClusterFaultDomain –Name Node4 –Parent Denver

Configuring sites enhances the operation of your cluster in the following ways:

Failover Affinity

  • Groups failover to a node within the same site, before failing to a node in a different site
  • During Node Drain VMs are moved first to a node within the same site before being moved cross site
  • The CSV load balancer will distribute within the same site

Storage Affinity

Virtual Machines (VMs) follow storage and are placed in same site where their associated storage resides. VMs will begin live migrating to the same site as their associated CSV after 1 minute of the storage being moved.

Cross-Site Heartbeating

You now have the ability to configure the thresholds for heartbeating between sites. These thresholds are controlled by the following new cluster properties:

Property

Default Value

Description

CrossSiteDelay

1000

Amount of time between each heartbeat sent to nodes on dissimilar sites in milliseconds

CrossSiteThreshold

20

Missed heartbeats before interface considered down to nodes on dissimilar sites

 To configure the above properties launch PowerShell© as an Administrator and type:

(Get-Cluster).CrossSiteDelay = <value>
(Get-Cluster).CrossSiteThreshold = <value>

You can find more information on other properties controlling failover clustering heartbeating here.

The following rules define the applicability of the thresholds controlling heartbeating between two cluster nodes:

  • If the two cluster nodes are in two different sites and two different subnets, then the Cross-Site thresholds will override the Cross-Subnet thresholds.
  • If the two cluster nodes are in two different sites and the same subnets, then the Cross-Site thresholds will override the Same-Subnet thresholds.
  • If the two cluster nodes are in the same site and two different subnets, then the Cross-Subnet thresholds will be effective.
  • If the two cluster nodes are in the same site and the same subnets, then the Same-Subnet thresholds will be effective.

Configuring Preferred Site

In addition to configuring the site a cluster node belongs to, a “Preferred Site” can be configured for the cluster. The Preferred Site is a preference for placement. The Preferred Site will be your Primary datacenter site.

Before the Preferred Site can be configured, the site being chosen as the preferred site needs to be assigned to a set of cluster nodes. To configure the Preferred Site for a cluster, launch PowerShell© as an Administrator and type:

(Get-Cluster).PreferredSite = <Site assigned to a set of cluster nodes>

Configuring a Preferred Site for your cluster enhances operation in the following ways:

Cold Start

During a cold start VMs are placed in in the preferred site

Quorum

  • Dynamic Quorum drops weights from the Disaster Recovery site (DR site i.e. the site which is not designated as the Preferred Site) first to ensure that the Preferred Site survives if all things are equal. In addition, nodes are pruned from the DR site first, during regroup after events such as asymmetric network connectivity failures.
  • During a Quorum Split i.e. the even split of two datacenters with no witness, the Preferred Site is automatically elected to win
    • The nodes in the DR site drop out of cluster membership
    • This allows the cluster to survive a simultaneous 50% loss of votes
    • Note that the LowerQuorumPriorityNodeID property previously controlling this behavior is deprecated in Windows Server 2016

Preferred Site and Multi-master Datacenters

The Preferred Site can also be configured at the granularity of a cluster group i.e. a different preferred site can be configured for each group. This enables a datacenter to be active and preferred for specific groups/VMs.

To configure the Preferred Site for a cluster group, launch PowerShell© as an Administrator and type:

(Get-ClusterGroup -Name <GroupName>).PreferredSite = <Site assigned to a set of cluster nodes>

Placement Priority

Groups in a cluster are placed based on the following site priority:

  1. Storage affinity site
  2. Group preferred site
  3. Cluster preferred site

Hyper-converged with Windows Server 2016

$
0
0

One of the big hot features in Windows Server 2016 which has me really excited is Storage Spaces Direct (S2D).  With S2D you will be able to create a hyper-converged private cloud.  A hyper-converged infrastructure (HCI) consolidates compute and storage into a common set of servers.  Leveraging internal storage which is replicated, you can create a true Software-defined Storage (SDS) solution.

This is available in the Windows Server 2016 Technical Preview today!  I encourage you to go try it out and give us some feedback.  Here’s where you can learn more:

Presentation from Ignite 2015:

Storage Spaces Direct in Windows Server 2016 Technical Preview
https://channel9.msdn.com/events/Ignite/2015/BRK3474

Deployment guide:

Enabling Private Cloud Storage Using Servers with Local Disks

https://technet.microsoft.com/en-us/library/mt126109.aspx

Claus Joergensen’s blog:

Storage Spaces Direct
http://blogs.technet.com/b/clausjor/archive/2015/05/14/storage-spaces-direct.aspx

Thanks!
Elden Christensen
Principal PM Manager
High-Availability & Storage
Microsoft

Configuring Site Awareness with Multi-active Disaggregated Datacenters

$
0
0

In a previous blog,I discussed the introduction of site-aware Failover Clusters in Windows Server 2016. In this blog, I am going to walk through how you can configure site-awareness for your multi-active disaggregated datacenters. You can learn more about Software Defined Storage and the advantages of a disaggregated datacenter here.

Consider the following multi-active datacenters, with a compute and a storage cluster, stretched across two datacenters. Each cluster has two nodes on each datacenter.

To configure site-awareness for the stretched compute and storage clusters proceed as follows:

Compute Stretch Cluster

1)     Assign the nodes in the cluster to one of the two datacenters (sites).

  • Open PowerShell© as an Administrator and type:
(Get-ClusterNode Node1).Site = 1
(Get-ClusterNode Node2).Site = 1
(Get-ClusterNode Node3).Site = 2
(Get-ClusterNode Node4).Site = 2

2)     Configure the site for your primary datacenter.

(Get-Cluster).PreferredSite = 1

Storage Stretch Cluster

In multi-active disaggregated datacenters, the storage stretch cluster hosts a Scale-Out File Server (SoFS). For optimal performance, it should be ensured that the site hosting the Cluster Shared Volumes comprising the SoFS, follows the site hosting the compute workload. This avoids the cost of cross-datacenter network traffic.

1)     As in the case of the compute cluster assign the nodes in the storage cluster to one of the two datacenters (sites).

(Get-ClusterNode Node5).Site = 1
(Get-ClusterNode Node6).Site = 1
(Get-ClusterNode Node7).Site = 2
(Get-ClusterNode Node8).Site = 2

2)     For each Cluster Shared Volume (CSV) in the cluster, configure the preferred site for the CSV group to be the same as the preferred site for the Compute Cluster.

$csv1 = Get-ClusterSharedVolume "Cluster Disk 1" | Get-ClusterGroup
($csv1).PreferredSite = 1 

3)  Set each CSV group in the cluster to automatically failback to the preferred site when it is available after a datacenter outage.

($csv1).AutoFailbackType = 1

Note: Step 2 and 3 can also be used to configure the Preferred Site for a CSV group in a hyper-converged data-center deployment. You can learn more about hyper-converged deployments in Windows Server 2016 here.

 


How can we improve the installation and patching of Windows Server? (Survey Request)

$
0
0

Do you want your server OS deployment and servicing to move faster? We’re a team of Microsoft engineers who want your experiences and ideas around solving real problems of deploying and servicing your server OS infrastructure. We prefer that you don’t love server OS deployment already, and we’re interested even if you don’t use Windows Server. We need to learn it and earn it.

Click the link below if you wish to fill out a brief survey and perhaps participate in a short phone call.

https://aka.ms/deployland

Many Thanks!!!

-Rob.

Managing Failover Clusters with 5nine Manager

$
0
0

Hi Cluster Fans,

It is nice to be back on the Cluster Team Blog!  After founding this blog and working closely with the cluster team for almost eight years, I left Microsoft last year to join a Hyper-V software partner, 5nine Software.  I’ve spoken with thousands of customers and I realized that Failover Clustering is so essential to Hyper-V, that a majority of all VMs are using it, and it is businesses of all sizes that are doing this, not just enterprises.  Most organizations need continual availability for their services to run 24/7, and their customers expect it.  Failover Clustering is now commonplace even amongst small and medium-sized businesses.  I was able to bring my passion for cluster management to 5nine’s engineering team, and into 5nine’s most popular SMB product, 5nine Manager.  This blog provides an overview of how 5nine Manager can help you centralize management of your clustered resources.

 Create a Cluster

5nine Manager lets you discover hosts and create a Failover Cluster.  It will allow you to specify nodes, run Cluster Validation, provide a client access point, and then create the cluster.

Validate a Cluster

Failover Cluster validation is an essential task in all deployments as it is required for a support cluster.  With 5nine Manager you can test the health of your cluster during configuration, or afterwards as a troubleshooting tool.  You can granularly select the different tests to run, and see the same graphical report as you are familiar with.

Host Best Practice Analyzer

In addition to testing the clustering configuration, you can also run a series of advanced Hyper-V tests on each of the hosts and Scale-Out File Servers through 5nine Manager.  The results will provide recommendations to enhance your node’s stability and performance.

Configure Live Migration Settings

It is important to have a dedicated network to Live Migration to ensure that its traffic does not interfere with cluster heartbeats or other important traffic.  With 5nine Manager you can specify the number of simultaneous Live Migrations and Storage Live Migrations, and even copy those settings to the other cluster nodes.

View Cluster Summary

5nine Manager has a Summary Dashboard which centrally reports the health of the cluster and its VMs.  It quickly identifies nodes or VMs with problems, and lists any alerts from its resources.  This Summary Dashboard can also be refocused at the Datacenter, Cluster, Host, and VM level for more refined results.

Manage Cluster Nodes

Using 5nine Manager you can configure your virtual disk and network settings.  You can also perform standard maintenance tasks, such as to Pause and Resume a cluster node, which can live migrate VMs to other nodes.  A list of active and failed cluster tasks is also displayed through the interface.

Manage Clustered VMs

You can manage any type of virtual machine that is supported by Hyper-V, including Windows Server, Windows, Linux, UNIX, and Windows Server 2016 Nano Server.  5nine Manager lets you centrally manage all your virtual machines, including the latest performance and security feature for virtualization.  The full GUI console even runs on all versions of Windows Server, including the otherwise GUI-less Windows Server Core and Hyper-V Server.

Cluster Status Report

It is now easy to create a report about the configuration and health of your cluster, showing you information about the configuration and settings for every resource.  This document can be exported and retained for compliance.

Host Load Balancing

5nine Manager allows you to pair cluster nodes and hosts to form a group that will load balance VMs.  It live migrates the VMs between hosts when customizable threshold are exceeded.  This type of dynamic optimization ensures that a single host does not get overloaded, providing higher-availability and greater performance for the VMs.

Cluster Logs

Sometime it can be difficult to see all the events from across your cluster.  5nine Manager pulls together all the logs for your clusters, hosts and VMs to simplify troubleshooting.

Cluster Monitoring

5nine Manager provides a Monitor Dashboard to provide current and historical data about the usage of your clusters, hosts and VMs.  It will show you which VMs are consuming the most resources, the latest alarms, and a graphical view of CPU, memory, disk and network usage.  You can also browse through previous performance data to help isolate a past issue.

Hyper-V Replica with Clustering

Hyper-V Replica allows a virtual machine’s virtual hard disk to be copied to a secondary location for disaster recovery.  Using 5nine Manager you can configure the Replication Settings on a host, then apply them to other cluster nodes and hosts. 

 

You can also configure replication on a virtual machine that is running a cluster node with the Hyper-V Replica Broker configured.  The health state of replica is also displayed in the centralized console.

 

Failover Clustering should be an integral part of your virtualized infrastructure, and 5nine Manager provides a way to centrally manage all your clustered VMs.  Failover cluster support will continue to be enhanced in future releases of 5nine Manager.

Thanks!
Symon Perriman
VP, 5nine Software
Hyper-V MVP
@SymonPerriman

Troubleshooting Hangs Using Live Dump

$
0
0

In this blog post https://blogs.msdn.microsoft.com/clustering/2014/12/08/troubleshooting-cluster-shared-volume-auto-pauses-event-5120/ we discussed what a Cluster Shared Volumes (CSV) event ID 5120 means, and how to troubleshoot it. In particular, we discussed the reason for auto-pause due to STATUS_IO_TIMEOUT (c00000b5), and some options on how to troubleshoot it. In this post we will discuss how to troubleshoot it using LiveDumps, which enables debugging the system with no downtime for your system.

First let’s discuss what is the LiveDump. Some of you are probably familiar with kernel crash dumps https://support.microsoft.com/en-us/kb/927069. You might have at least two challenges with kernel dump.

  1. Bugcheck halts the system resulting in downtime
  2. Entire contents of memory are dumped to a file.  On a system with a lot of memory, you might not have enough space on your system drive for OS to save the dump

The good news is that LiveDump solves both of these issues. Live Dump was a new feature added in Windows Server 2012 R2. For the purpose of this discussion you can think of LiveDump as an OS feature that allows you to create a consistent snapshot of kernel memory and save it to a dump file for the future analysis. Taking this snapshot will NOT cause bugcheck so no downtime. LiveDump does not include all kernel memory, it excludes information which is not valuable in debugging. It will not include pages from stand by list and file caches. The kind of livedump that cluster collects for you also would not have pages consumed by Hypervisor. In Windows Server 2016 Cluster also makes sure to exclude from the livedump CSV Cache. As a result LiveDump has much smaller dump file size compared to what you would get when you bugcheck the server, and would not require as much space on your system drive.  In Windows Server 2016 there is a new bugcheck option called an “Active Dump”, which similarly excludes unnecessary information to create a smaller dump file during bugchecks.

You can create LiveDump manually using LiveKD from Windows Sysinternals (https://technet.microsoft.com/en-us/sysinternals/bb897415.aspx ). To generate LiveDump run command “livekd –ml –o <path to a dump file>” from an elevated command prompt. Path to the dump file does not have to be on the system drive, you can save it to any location. Here is an example of creating live dump on a Windows 10 Desktop with 12 GB RAM, which resulted in a dump file of only 3.7 GB.

D:\>livekd -ml -o d1.dmp
LiveKd v5.40 - Execute kd/windbg on a live system
Sysinternals - www.sysinternals.com

Copyright (C) 2000-2015 Mark Russinovich and Ken Johnson

Saving live dump to D:\d1.dmp... done.

D:\>dir *.dmp

Directory of D:\

02/25/2016 12:05 PM     2,773,164,032 d1.dmp
1 File(s) 2,773,164,032 bytes
0 Dir(s) 3,706,838,417,408 bytes free

If you are wondering how much disk space you would need to livedump you can generate one using LiveKD, and check its size.

You might wonder what so great about LiveDump for troubleshooting. Logs and traces work well when something fails because hopefully in a log there will be a record where someone admits that he is failing operations and blames someone who causes that. LiveDump is great when we need to troubleshoot a problem where something is taking long time, and nothing is technically failing. If we start a watchdog when operation started, and if watchdog expires before operation completes then we can try to take a dump of the system hoping that we can walk a wait chain for that operation and see who owns it and why it is not completing. Looking at the livedump is just like looking at kernel dumps. It requires some skills, and understanding of Windows Internals. It has a steep learning curve for customers, but it is a great tool for Microsoft support and product teams who already have that expertise. If you reach out to Microsoft support with an issue where something is stuck in kernel, and a live dump taken while it was stuck then chances of prompt root causing of the issue are much higher.

Windows Server Failover Clustering has many watchdogs which control how long it should wait for cluster resources to execute calls like resource online or offline. Or how long we should wait for CSVFS to complete a state transition. From our experience we know that in most cases some of these scenarios will be stuck in the kernel so we automatically ask Windows Error Reporting to generate LiveDump. It is important to notice that LiveKd uses different API that produces LiveDump without checking any other conditions. Cluster uses Windows Error Reporting. Windows Error Reporting will throttle LiveDump creation. We are using WER because it manages disk space consumption for us and it also will send telemetry information about the incident to Microsoft where we can see what issues are affecting customers. This helps us to priorities and strategize fixes. Starting from Windows Server 2016 you can control WER telemetry through common telemetry settings, and before that there was a separate control panel applet to control what WER is allowed to share with Microsoft.

By default, Windows Error Reporting will allow only one LiveDump per report type per 7 days and only 1 LiveDump per machine per 5 days. You can change that by setting following registry keys

reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v SystemThrottleThreshold /t REG_DWORD /d 0 /f
reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v ComponentThrottleThreshold /t REG_DWORD /d 0 /f

Once LiveDump is created WER would launch a user mode process that creates a minidump from LiveDump, and immediately after that would delete the LiveDump. Minidump is only couple hundred kilobytes, but unfortunately it is not helpful because it would have call stack only of the thread that invoked LiveDUmp creation, and we need all other threads in the kernel to track down where we are stuck. You can tell WER to keep original Live dumps using these two registry keys.

reg add "HKLM\Software\Microsoft\Windows\Windows Error Reporting\FullLiveKernelReports" /v FullLiveReportsMax /t REG_DWORD /d 10 /f
reg add "HKLM\SYSTEM\CurrentControlSet\Control\CrashControl" /v AlwaysKeepMemoryDump /t REG_DWORD /d 1 /f

Set FullLiveReportsMax to the number of dumps you want to keep, the decision on how many to keep depends on how much free space you have and the size of LiveDump.
You need to reboot the machine for Windows Error Reporting registry keys to take an effect.
LiveDumps created by Windows Error Reporting are located in the %SystemDrive%\Windows\LiveKernelReports.

Windows Server 2016

In Windows Server 2016 Failover Cluster Live Dump Creation is on by default. You can turn it on/off by manipulating lowest bit of the cluster DumpPolicy public property. By default, this bit is set, which means cluster is allowed to generate LiveDump.

PS C:\Windows\system32> (get-cluster).DumpPolicy
1118489

If you set this bit to 0 then cluster will stop generating LiveDumps.

PS C:\Windows\system32> (get-cluster).DumpPolicy=1118488

You can set it back to 1 to enable it again

PS C:\Windows\system32> (get-cluster).DumpPolicy=1118489

Change take effect immediately on all cluster nodes. You do NOT need to reboot cluster nodes.

Here is the list of LiveDump report types generated by cluster. Dump files will have report type string as a prefix.

Report Type Description
CsvIoT A CSV volume AutoPaused due to STATUS_IO_TIMEOUT and cluster on the coordinating node created LiveDump
CsvStateIT CSV state transition to Init state is taking too long.
CsvStatePT CSV state transition to Paused state is taking too long
CsvStateDT CSV state transition to Draining state is taking too long
CsvStateST CSV state transition to SetDownLevel state is taking too long
CsvStateAT CSV state transition to Active state is taking too long

You can learn more about CSV state transition in this blog post:

Following is the list of LiveDump report types that cluster generates when cluster resource call is taking too long

Report Type Description
ClusResCO Cluster resource Open call is taking too long
ClusResCC Cluster resource Close call is taking too long
ClusResCU Cluster resource Online call is taking too long
ClusResCD Cluster resource Offline call is taking too long
ClusResCK Cluster resource Terminate call is taking too long
ClusResCA Cluster resource Arbitrate call is taking too long
ClusResCR Cluster resource Control call is taking too long
ClusResCT Cluster resource Type Control call is taking too long
ClusResCI Cluster resource IsAlive call is taking too long
ClusResCL Cluster resource LooksAlive call is taking too long
ClusResCF Cluster resource Fail call is taking too long

You can learn more about cluster resource state machine in these two blog posts:

You can control what resource types will generate LiveDumps by changing value of the first bit of the resource type DumpPolicy public property. Here are the default values:

C:\> Get-ClusterResourceType | ft Name,DumpPolicy

Name                                DumpPolicy
----                                ----------
Cloud Witness                       5225058576
DFS Replicated Folder               5225058576
DHCP Service                        5225058576
Disjoint IPv4 Address               5225058576
Disjoint IPv6 Address               5225058576
Distributed File System             5225058576
Distributed Network Name            5225058576
Distributed Transaction Coordinator 5225058576
File Server                         5225058576
File Share Witness                  5225058576
Generic Application                 5225058576
Generic Script                      5225058576
Generic Service                     5225058576
Health Service                      5225058576
IP Address                          5225058576
IPv6 Address                        5225058576
IPv6 Tunnel Address                 5225058576
iSCSI Target Server                 5225058576
Microsoft iSNS                      5225058576
MSMQ                                5225058576
MSMQTriggers                        5225058576
Nat                                 5225058576
Network File System                 5225058577
Network Name                        5225058576
Physical Disk                       5225058577
Provider Address                    5225058576
Scale Out File Server               5225058577
Storage Pool                        5225058577
Storage QoS Policy Manager          5225058577
Storage Replica                     5225058577
Task Scheduler                      5225058576
Virtual Machine                     5225058576
Virtual Machine Cluster WMI         5225058576
Virtual Machine Configuration       5225058576
Virtual Machine Replication Broker  5225058576
Virtual Machine Replication Coor... 5225058576
WINS Service                        5225058576

By default, Physical Disk resources would produce LiveDump. You can disable that by setting lowest bit to 0. Here is an example how to do that for the physical disk resource

(Get-ClusterResourceType -Name "Physical Disk").DumpPolicy=5225058576

Later on you can enable it back

(Get-ClusterResourceType -Name "Physical Disk").DumpPolicy=5225058577

Changes take effect immediately on all new calls, no need to offline/online resource or restart the cluster.

The last group is the report types that cluster service would generate when it observes that some operations are taking too long.

Report Type Description
ClusWatchDog Cluster service watchdog

Windows Server 2012 R2

We had such a positive experience troubleshooting issues using LiveDump on Windows Server 2016 that we’ve backported a subset of that back to Windows Server R2. You need to make sure that you have all the recommended patches outlined here. On Windows Server 2012 R2 LiveDump will not be generated by default, it can be enabled using following PowerShell command:

Get-Cluster | Set-ClusterParameter -create LiveDumpEnabled -value 1

LiveDump can be disabled using the following command:

Get-Cluster | Set-ClusterParameter -create LiveDumpEnabled -value 0

Only CSV report types were backported, as a result you will not see LiveDumps from cluster resource calls or cluster service watchdog.  Windows Error Reporting throttling will also need to be adjusted as discussed above.

CSV AutoPause due to STATUS_IO_TIMEOUT (c00000b5)

Let’s see how LiveDump help troubleshooting this issue. In the blog post https://blogs.msdn.microsoft.com/clustering/2014/12/08/troubleshooting-cluster-shared-volume-auto-pauses-event-5120/ we’ve discussed that it is usually caused by an IO on the coordinating node taking long time. As a result of that CSVFS on a non-coordinating node would get an error STATUS_IO_TIMEOUT. CSVFS will notify cluster service about that event. Cluster service will create LiveDump with report type CsvIoT on the coordinating node where IO is taking time. If we are lucky, and the IO has not completed before the LiveDump has been generated then we can load the dump using WinDbg to try to find the IO that is taking a long time and see who owns that IO.

Thanks!
Vladimir Petter
Principal Software Engineer
High-Availability & Storage
Microsoft

 

Additional Resources:

To learn more, here are others in the Cluster Shared Volume (CSV) blog series:

Cluster Shared Volume (CSV) Inside Out
http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx

Cluster Shared Volume Diagnostics
http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx

Cluster Shared Volume Performance Counters
http://blogs.msdn.com/b/clustering/archive/2014/06/05/10531462.aspx

Cluster Shared Volume Failure Handling
http://blogs.msdn.com/b/clustering/archive/2014/10/27/10567706.aspx

Troubleshooting Cluster Shared Volume Auto-Pauses – Event 5120
http://blogs.msdn.com/b/clustering/archive/2014/12/08/10579131.aspx

Troubleshooting Cluster Shared Volume Recovery Failure – System Event 5142
http://blogs.msdn.com/b/clustering/archive/2015/03/26/10603160.aspx

Cluster Shared Volume – A Systematic Approach to Finding Bottlenecks
https://blogs.msdn.microsoft.com/clustering/2015/07/29/cluster-shared-volume-a-systematic-approach-to-finding-bottlenecks/

Failover Cluster Node Fairness in Windows Server 2016

$
0
0

Windows Server 2016 introduces the Node Fairness feature to optimize the utilization of nodes in a Failover Cluster. During the lifecycle of your private cloud, certain operations (such as rebooting a node for patching), results in the Virtual Machines (VMs) in your cluster being moved. This could result in an unbalanced cluster where some nodes are hosting more VMs and others are underutilized (such as a freshly rebooted server). The Node Fairness feature seeks to identify over committed nodes and re-distribute VMs from those nodes. VMs are live migrated to idle nodes with no down time. Failure policies such as anti-affinity, fault domains and possible owners are honored. Thus, the Node Fairness feature seamlessly balances your private cloud.

Heuristics for Balancing

Node Fairness evaluates a node’s load based on the following heuristics:

  1. Current Memory pressure: Memory is the most common resource constraint on a Hyper-V host
  2. CPU utilization of the Node averaged over a 5 minute window: Mitigates a node in the cluster becoming overcommitted

Controlling Aggressiveness of Balancing

The aggressiveness of balancing based on the Memory and CPU heuristics can be configured using the by the cluster common property ‘AutoBalancerLevel’. To control the aggressiveness run the following in PowerShell:

(Get-Cluster).AutoBalancerLevel = <value>
AutoBalancerLevel Aggressiveness Behavior
1 (default) Low Move when host is more than 80% loaded
2 Medium Move when host is more than 70% loaded
3 High Move when host is more than 60% loaded

 

NodeFairness

Controlling Node Fairness

Node Fairness is enabled by default and when load balancing occurs can be configured by the cluster common property ‘AutoBalancerMode’. To control when Node Fairness balances the cluster:

Using Failover Cluster Manager:

  1. Right-click on your cluster name and select the “Properties” option

Image1

2.  Select the “Balancer” pane

Image2

Using PowerShell:

Run the following: :

(Get-Cluster).AutoBalancerMode = <value>
AutoBalancerMode Behavior
0 Disabled
1 Load balance on node join
2 (default) Load balance on node join and every 30 minutes

 

Node Fairness vs. SCVMM Dynamic Optimization

The node fairness feature, provides in-box functionality, which is targeted towards deployments without System Center Virtual Machine Manager (SCVMM). SCVMM Dynamic Optimization is the recommended mechanism for balancing virtual machine load in your cluster for SCVMM deployments. SCVMM automatically disables the Node Fairness feature when Dynamic Optimization is enabled.

Speeding Up Failover Tips-n-Tricks

$
0
0

From time-to-time people ask me for suggestions on what tweaks they can do to make Windows server Failover Cluster failover faster. In this blog I’ll discuss a few tips-n-tricks.

  1. Disable NetBIOS over TCP/IP – Unless you need legacy OS compatibility, NetBIOS is doing you nothing but slow you down.  You want to disable NetBIOS in a couple different places:
    1. Every Cluster IP Address resources – Here is the syntax (again, this needs to be set on all IP Address resources).  Note: NetBIOS is disabled on all Cluster IP Addresses in Windows Server 2016 by default.
      Get-ClusterResource “Cluster IP address” | Set-ClusterParameter EnableNetBIOS 0
    2. Base Network Interfaces – In the Advanced TCP/IP Settings, go to the WINS tab, and select “Disable NetBIOS over TCP/IP.  This needs to be done on every network interface.
      NetBIOS
  2. Go Pure IPv6 – Going pure IPv6 will give faster failover as a result of optimizations in how Duplication Address Detection (DAD) works in the TCP/IP stack.
  3. Avoid IPSec on Servers – Internet Protocol Security (IPsec) is a great security feature, especially for client scenarios. But it comes at a cost, and really shouldn’t be used on servers. Specifically enabling a single IPSec policy will reduce overall network performance by ~30% and significantly delay failover times.

A few things I’ve found you can do to speed up failover and reduce downtime.

Thanks!
Elden Christensen
Principal PM Manager
High-Availability & Storage
Microsoft

NetFT Virtual Adapter Performance Filter

$
0
0

In this blog I will discuss what the NetFT Virtual Adapter Performance Filter is and the scenarios when you should or should not enable it.

The Microsoft Failover Cluster Virtual Adapter (NetFT) is a virtual adapter used by the Failover Clustering feature to build fault tolerant communication routes between nodes in a cluster for intra-cluster communication.

When the Cluster Service communicates to another node in the cluster, it sends data down over TCP to the NetFT virtual adapter.  NetFT then sends the data over UDP down to the physical network card, which then sends it over the network to another node.  See the below diagram:

NetFT

When the data is received by the other node, it follows the same flow.  Up the physical adapter, then to NetFT, and finally up to the Cluster Service.  The NetFT Virtual Adapter Performance Filter is a filter in Windows Server 2012 and Windows Server 2012 R2 which inspects traffic inbound on the physical NIC and then reroutes cluster traffic addressed for NetFT directly to the NetFT driver.  This bypasses the physical NIC UDP / IP stack and delivers increased cluster network performance.

NetFTPerfFilter

When to Enable the NetFT Virtual Adapter Performance Filter

The NetFT Virtual Adapter Performance Filter is disabled by default.  The filter is disabled because it can cause issues with Hyper-V clusters which have a Guest Cluster running in VMs running on top of them.  Issues have been seen where the NetFT Virtual Adapter Performance Filter in the host incorrectly routes NetFT traffic bound for a guest VM to the host.  This can result in communication issues with the guest cluster in the VM.  More details can be found in this article:

https://support.microsoft.com/en-us/kb/2872325

If you are deploying any workload other than Hyper-V with guest clusters, enabling the NetFT Virtual Adapter Performance Filter will optimize and improve cluster performance.

NetFTPerfFilter2

Windows Server 2016

Due to changes in the networking stack in Windows Server 2016, the NetFT Virtual Adapter Performance Filter has been removed.

Thanks!
Elden Christensen
Principal PM Manager
High-Availability & Storage
Microsoft

Using PowerShell script make any application highly available

$
0
0

Author:
Amitabh Tamhane
Senior Program Manager
Windows Server Microsoft

OS releases: Applicable to Windows Server 2008 R2 or later

Now you can use PowerShell scripts to make any application highly available with Failover Clusters!!!

The Generic Script is a built-in resource type included in Windows Server Failover Clusters. Its advantage is flexibility: you can make applications highly available by writing a simple script. For instance, you can make any PowerShell script highly available! Interested?

We created GenScript in ancient times and it supports only Visual Basic scripts – including Windows Server 2016. This means you can’t directly configure PowerShell as GenScript resource. However, in this blog post, I’ll walk you through a sample Visual Basic script – and associated PS scripts – to build a custom GenScript resource that works well with PowerShell.

Pre-requisites: This blog assumes you have the basic understanding of Windows Server Failover Cluster & built-in resource types.

Disclaimer: Microsoft does not intend to officially support any source code/sample scripts provided as part of this blog. This blog is written only for a quick walk-through on how to run PowerShell scripts using GenScript resource. To make your application highly available, you are expected to modify all the scripts (Visual Basic/PowerShell) as per the needs of your application.

Visual Basic Shell

It so happens that Visual Basic Shell supports calling PowerShell script, then passing parameters and reading output. Here’s a Visual Basic Shell sample that uses some custom Private Properties:

 

'<your application name> Resource Type

Function Open( )
    Resource.LogInformation "Enter Open()"

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.AddProperty("PSScriptsPath")
    End If

    If Resource.PropertyExists("Name") = False Then
        Resource.AddProperty("Name")
    End If

    If Resource.PropertyExists("Data1") = False Then
        Resource.AddProperty("Data1")
    End If

    If Resource.PropertyExists("Data2") = False Then
        Resource.AddProperty("Data2")
    End If

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.AddProperty("DataStorePath")
    End If

    '...Result...
    Open = 0

    Resource.LogInformation "Exit Open()"
End Function


Function Online( )
    Resource.LogInformation "Enter Online()"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        Online = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        Online = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_Online.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling Online PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Resource.LogInformation "Online Success"
        Online = 0
    Else
        Resource.LogInformation "Online Error"
        Online = 1
    End If

    Resource.LogInformation "Exit Online()"
End Function

Function Offline( )
    Resource.LogInformation "Enter Offline()"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        Offline = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        Offline = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_Offline.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling Offline PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Resource.LogInformation "Offline Success"
        Offline = 0
    Else
        Resource.LogInformation "Offline Error"
        Offline = 1
    End If

    Resource.LogInformation "Exit Offline()"
End Function

Function LooksAlive( )
    '...Result...
    LooksAlive = 0
End Function

Function IsAlive( )
    Resource.LogInformation "Entering IsAlive"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        IsAlive = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        IsAlive = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_IsAlive.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling IsAlive PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Resource.LogInformation "IsAlive Success"
        IsAlive = 0
    Else
        Resource.LogInformation "IsAlive Error"
        IsAlive = 1
    End If

    Resource.LogInformation "Exit IsAlive()"
End Function

Function Terminate( )
    Resource.LogInformation "Enter Terminate()"

    '...Check for required private properties...

    If Resource.PropertyExists("PSScriptsPath") = False Then
        Resource.LogInformation "PSScriptsPath is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "PSScriptsPath is " & Resource.PSScriptsPath

    If Resource.PropertyExists("Name") = False Then
        Resource.LogInformation "Name is a required private property."
        Terminate = 1
        Exit Function
    End If
    Resource.LogInformation "Name is " & Resource.Name

    If Resource.PropertyExists("Data1") = False Then
        Resource.LogInformation "Data1 is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data1 is " & Resource.Data1

    If Resource.PropertyExists("Data2") = False Then
        Resource.LogInformation "Data2 is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "Data2 is " & Resource.Data2

    If Resource.PropertyExists("DataStorePath") = False Then
        Resource.LogInformation "DataStorePath is a required private property."
        Terminate = 1
        Exit Function
    End If
    '...Resource.LogInformation "DataStorePath is " & Resource.DataStorePath

    PScmd = "powershell.exe -file " & Resource.PSScriptsPath & "\PS_Terminate.ps1 " & Resource.PSScriptsPath & " " & Resource.Name & " " & Resource.Data1 & " " & Resource.Data2 & " " & Resource.DataStorePath

    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")

    Resource.LogInformation "Calling Terminate PS script= " & PSCmd
    rv = WshShell.Run(PScmd, , True)
    Resource.LogInformation "PS return value is: " & rv

    '...Translate result from PowerShell ...
    '...1 (True in PS) == 0 (True in VB)
    '...0 (False in PS) == 1 (False in VB)
    If rv = 1 Then
        Terminate = 0
    Else
        Terminate = 1
    End If

    Resource.LogInformation "Exit Terminate()"
End Function

Function Close( )
    '...Result...
    Close = 0
End Function

 

Entry Points

In the above sample VB script, the following entry points are defined:

  • Open – Ensures all necessary steps complete before starting your application
  • Online – Function to start your application
  • Offline – Function to stop your application
  • IsAlive – Function to validate your application startup and monitor health
  • Terminate – Function to forcefully cleanup application state (ex: Error during Online/Offline)
  • Close – Ensures all necessary cleanup completes after stopping your application

Each of the above entry points is defined as a function (ex: “Function Online( )”). Failover Cluster then calls these entry point functions as part of the GenScript resource type definition.

Private Properties

For resources of any type, Failover Cluster supports two types of properties:

  • Common Properties – Generic properties that can have unique value for each resource
  • Private Properties – Custom properties that are unique to that resource type. Each resource of that resource type has these private properties.

When writing a GenScript resource, you need to evaluate if you need private properties. In the above VB sample script, I have defined five sample private properties (only as an example):

  • PSScriptsPath – Path to the folder containing PS scripts
  • Name
  • Data1 – some custom data field
  • Data2 – another custom data field
  • DataStorePath – path to a common backend store (if any)

The above private properties are shown as example only & you are expected to modify the above VB script to customize it for your application.

PowerShell Scripts

The Visual Basic script simply connects the Failover Clusters’ RHS (Resource Hosting Service) to call PowerShell scripts. You may notice the “PScmd” parameter containing the actual PS command that will be called to perform the action (Online, Offline etc.) by calling into corresponding PS scripts.

For this sample, here are four PowerShell scripts:

  • Online.ps1 – To start your application
  • Offline.ps1 – To stop your application
  • Terminate.ps1 – To forcefully cleanup your application
  • IsAlive.ps1 – To monitor health of your application

Example of PS scripts:

Entry Point: Online

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_Online.log"

@"
    Starting Online...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your online script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

Entry Point: Offline

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_Offline.log"

@"
    Starting Offline...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your offline script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

 

Entry Point: Terminate

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_Terminate.log"

@"
    Starting Terminate...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your terminate script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

 

Entry Point: IsAlive

Param(
    # Sample properties…
    [Parameter(Mandatory=$true, Position=0)]
    [ValidateNotNullOrEmpty()]
    [string]
    $PSScriptsPath,

    #
    [Parameter(Mandatory=$true, Position=1)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Name,

    #
    [Parameter(Mandatory=$true, Position=2)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data1,

    #
    [Parameter(Mandatory=$true, Position=3)]
    [ValidateNotNullOrEmpty()]
    [string]
    $Data2,

    #
    [Parameter(Mandatory=$true, Position=4)]
    [ValidateNotNullOrEmpty()]
    [string]
    $DataStorePath
)

$filePath = Join-Path $PSScriptsPath "Output_IsAlive.log"

@"
    Starting IsAlive...
    Name= $Name
    Data1= $Data1
    Data2= $Data2
    DataStorePath= $DataStorePath
"@ | Out-File -FilePath $filePath

$error.clear()

### Do your isalive script logic here

if ($errorOut -eq $true)
{
    "Error $error" | Out-File -FilePath $filePath -Append
    exit $false
}

"Success" | Out-File -FilePath $filePath -Append
exit $true

 

Parameters

The private properties are passed in as arguments to the PS script. In the sample scripts, these are all string values. You can potentially pass in different value types with more advanced VB script magic.

Note: Another way to simplify this is by writing only one PS script, such that the entry points are all functions, with only a single primary function called by the VB script. To achieve this, you can pass in additional parameters giving the context of the action expected (ex: Online, Offline etc.).

Step-By-Step Walk-Through

Great! Now that you have the VB Shell & Entry Point Scripts ready, let’s make the application highly available…

Copy VB + PS Scripts to Server

It is important to copy the VB script & all PS scripts to a folder on each cluster node. Ensure that the scripts is copied to the same folder on all cluster nodes. In this walk-through, the VB + PS scripts are copied to “C:\SampleScripts” folder:

Copy Scripts

Create Group & Resource

Using PowerShell:

Create Group Resource

The “ScriptFilePath” private property gets automatically added. This is the path to the VB script file. There are no other private properties which get added (see above).

You can also create Group & Resource using Failover Cluster Manager GUI:

Add Resource

Specify VB Script

To specify VB script, set the “ScriptFilePath” private property as:

Get Properties - Not Set

When the VB script is specified, cluster automatically calls the Open Entry Point (in VB script). In the above VB script, additional private properties are added as part of the Open Entry Point.

Configure Private Properties

You can configure the private properties defined for the Generic Script resource as:

Configure Properties

In the above example, “PSScriptsPath” was specified as “C:\SampleScripts” which is the folder where all my PS scripts are stored. Additional example private properties like Name, Data1, Data2, DataStoragePath are set with custom values as well.

At this point, the Generic Script resource using PS scripts is now ready!

Starting Your Application

To start your application, you simply will need to start (aka online) the group (ex: SampleGroup) or resource (ex: SampleResUsingPS). You can start the group or resource using PS as:

Start Application

You can use Failover Cluster Manager GUI to start your Group/Role as well:

Start Application GUI

To view your application state in Failover Cluster Manager GUI:

View Online Application GUI

Verify PS script output:

In the sample PS script, the output log is stored in the same directory as the PS script corresponding to each entry point. You can see the output of PS scripts for Online & IsAlive Entry Points below:

Verify Scripts

Awesome! Now, let’s see what it takes to customize the generic scripts for your application.

Customizing Scripts For Your Application

The sample VB Script above is a generic shell that any application can reuse. There are few important things that you may need to edit:

  1. Defining Custom Private Properties: The “Function Open” in the VB script defines sample private properties. You will need to edit those add/remove private properties for your application.
  2. Validating Custom Private Properties: The “Function Online”, “Function Offline”, “Function Terminate”, “Function IsAlive” validate private properties whether they are set or not (in addition to being required or not). You will need to edit the validation checks for any private properties added/removed.
  3. Calling the PS scripts: The “PSCmd” variable contains the exact syntax of the PS script which gets called. For any private properties added/removed you would need to edit that PS script syntax as well.
  4. PowerShell scripts: Parameters for the PowerShell scripts would need to be edited for any private properties added/removed. In addition, your application specific logic would need to be added as specified by the comment in the PS scripts.

Summary

Now you can use PowerShell scripts to make any application highly available with Failover Clusters!!!

The sample VB script & the corresponding PS scripts allow you to take any custom application & make it highly available using PowerShell scripts.
Thanks,
Amitabh


Failover Clustering @ Ignite 2016

$
0
0

I am packing my bags getting ready for Ignite 2016 in Atlanta, and I thought I would post all the cluster and related sessions you might want to check out.  See you there!
If you couldn’t make it to Ignite this year, don’t worry you can stream all these sessions online.

Cluster

  • BRK3196 – Keep the lights on with Windows Server 2016 Failover Clustering
  • BRK2169 – Explore Windows Server 2016 Software Defined Datacenter

Storage Spaces Direct for clusters with no shared storage:

  • BRK3088 – Discover Storage Spaces Direct, the ultimate software-defined storage for Hyper-V
  • BRK2189 – Discover Hyper-converged infrastructure with Windows Server 2016
  • BRK3085 – Optimize your software-defined storage investment with Windows Server 2016
  • BRK2167 – Enterprise-grade Building Blocks for Windows Server 2016 SDDC: Partner Offers

Storage Replica for stretched clusters:

  • BRK3072 – Drill into Storage Replica in Windows Server 2016

SQL Clusters

  • BRK3187 – Learn how SQL Server 2016 on Windows Server 2016 are better together
  • BRK3286 – Design a Private and Hybrid Cloud for High Availability and Disaster Recovery with SQL Server 2016

Thanks!
Elden Christensen
Principal PM Manager
High Availability & Storage

Failover Clustering Sets for Start Ordering

$
0
0

IStartOrderingn a private cloud there may be multi-tier applications which are deployed across a set of virtual machines. Such as a database running in one VM, and an application leveraging that database running in another VM. It may be desired to have start ordering of highly available virtual machines which have dependencies.

 

Sets:

Virtual machines and other clustered applications are controlled by cluster resources, and those resources are inside of a Cluster Group. A cluster group represents the smallest unit of failover within a cluster.

A new concept is being introduced in Windows Server 2016 called a “Set”. A set can contain one or more groups, and sets can have dependencies on each other. This enables creating dependencies between cluster groups for controlling start ordering. While Sets were primarily focused at virtual machines, it is generic cluster infrastructure which will work with any clustered role. Such as SQL Server, etc…

Sets

Here is some details on how to create and manage Sets.

Basic Set Creation:

To create a new Set and place app group in it

PS C:\> New-ClusterGroupSet -Name SetforApp -Group App

Name                : SetforApp
GroupNames          : {App}
ProviderNames       : {}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : False
StartupDelay        : 20

Now create a Set and place the database group in it:

PS C:\> New-ClusterGroupSet -Name SetforDatabase -Group Database

Name                : SetforDatase
GroupNames          : {Database}
ProviderNames       : {}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : False
StartupDelay        : 20

To view the newly created set’s

PS C:\> Get-ClusterGroupSet

Name                : SetforApp
GroupNames          : {App}
ProviderNames       : {}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : False
StartupDelay        : 20

Name                : SetforDatabase
GroupNames          : {Database}
ProviderNames       : {}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : False
StartupDelay        : 20

Now let’s add a dependency between the two newly created set’s, so that the App set will depend on the Database set.

PS C:\> Add-ClusterGroupSetDependency -Name SetforApp -ProviderSet SetforDatabase

To view the newly created dependency between the set’s. You will see that the name of the set is “SetforApp” and that it contains a single group named “App” and the set is dependent based on the ProvidersNames property on the set named “SetforDatabase”

PS C:\> Get-ClusterGroupSetDependency


Name                : SetforApp
GroupNames          : {App}
ProviderNames       : {SetforDatabase}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : False
StartupDelay        : 20

Name                : SetforDatabase
GroupNames          : {}
ProviderNames       : {}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : False
StartupDelay        : 20

After completing these steps, the result is that the Database group will be brought online first and once complete there will be a delay of 20 seconds and then the App group will be brought online.

Set Configuration:

With the defaults, dependencies between sets will start the next set 20 seconds after all the groups come online. There are a few configuration settings to modify the start behavior of dependencies between sets:

  • StartupDelayTrigger – This defines what action should trigger the start and can have one of two values
    • Online – Waits until the group has reached an online state
    • Delay – Waits the number of seconds as defined by StartupDelay (default)
  • StartupDelay – This defines a delay time in seconds (default value of 20) which is used if StartupDelayTrigger is set to Delay
  • StartupCount – This defines the number of groups in the set which must have achieved StartupDelayTrigger before the Set is considered started.
    • -1 for all groups in the set (default)
    • 0 for majority of groups in the set
    • N (user defined) for the specific number of groups
      • Note: If N exceeds the number of groups in the set, it effectively results in All behavior.

You can view the configuration with the following syntax:

PS C:\> Get-ClusterGroupSetDependency -Name SetforApp

Name                : SetforApp
GroupNames          : {App}
ProviderNames       : {SetforDatabase}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : False
StartupDelay        : 20

A set can be configured for Online with the following syntax:

PS C:\> Set-ClusterGroupSet -name SetforApp -StartupDelayTrigger Online

Infrastructure Groups

There may be some groups which you wish to start before all others, such as a utility VM for example. This might be a VM which runs a domain controller, or a DNS server, or maybe a storage appliance. These infrastructure groups may need to be running before attempting to start any other tenant VM which is running apps. It would be cumbersome to create a set, and make all other sets dependent on it. To simplify this configuration, a single property can be configured on a set.

A set can be marked to start before all others with the following setting:

  • IsGlobal – This defines if the set should start before all other sets

Example of configuring a set:

PS C:\> Set-ClusterGroupSet -name SetforInfra -IsGlobal 1

Now you can see the set is configured as True for IsGlobal.

PS C:\> Get-ClusterGroupSetDependency

Name                : SetforInfra
GroupNames          : {ApplianceVM}
ProviderNames       : {}
StartupDelayTrigger : Delay
StartupCount        : 4294967295
IsGlobal            : True
StartupDelay        : 20

PowerShell cmdlet’s Reference

The only UI for VM Start Ordering is through PowerShell, there is no Failover Cluster Manager support in Windows Server 2016. Here is a list of all the relevant Set cmdlet’s

  • New-ClusterGroupSet
  • Remove-ClusterGroupSet
  • Set-ClusterGroupSet
  • Get-ClusterGroupSet
  • Get-ClusterGroupSetDependency
  • Add-ClusterGroupToSet
  • Add-ClusterGroupSetDependency
  • Remove-ClusterGroupSetDependency
  • Remove-ClusterGroupFromSet

Thanks!
Elden Christensen
Principal PM Manager
High Availability & Storage

Deploying IaaS VM Guest Clusters in Microsoft Azure

$
0
0

Authors: Rob Hindman and Subhasish Bhattacharya, Program Manager, Windows Server

In this blog I am going to discuss deployment considerations and scenarios for IaaS VM Guest Clusters in Microsoft Azure.

IaaS VM Guest Clustering in Microsoft Azure

guestclustering

A guest cluster in Microsoft Azure is a Failover Cluster comprised of IaaS VMs. This allows hosted VM workloads to failover across the guest cluster. This provides a higher availability SLA for your applications than a single Azure VM can provide. It is especially usefully in scenarios where your VM hosting a critical application needs to be patched or requires configuration changes.

    SQL Server Failover Cluster Instance (FCI) on Azure

    A sizable SQL Server FCI install base today is on expensive SAN storage on-premises. In the future, we see this install base taking the following paths:

    1. Conversion to virtual deployments leveraging SQL Azure (PaaS): Not all on-premises SQL FCI deployments are a good fit for migration to SQL Azure.
    2. Conversion to virtual deployments leveraging Guest Clustering of Azure IaaS VMs and low cost software defined storage  technologies such as Storage Replica (SR) and Storage Spaces Direct(S2D): This is the focus of this blog.
    3. Maintaining a physical deployment on-premises while leveraging low cost SDS technologies such as SR and S2D
    4. Preserving the current deployment on-premises

    sqlserverfci

    Deployment guidance for the second path can be found here

    Creating a Guest Cluster using Azure Templates:

    Azure templates decrease the complexity and speed of your deployment to production. In addition it provides a repeatable mechanism to replicate your production deployments. The following are recommended templates to use for your IaaS VM guest cluster deployments to Azure.

    1. Deploying Scale out File Server (SoFS)  on Storage Spaces Direct

      Find template here

      a

    2. Deploying SoFS on Storage Spaces Direct (with Managed Disk)

      Find template here

      b

    3. Deploying SQL Server FCI on Storage Spaces Direct

      Find template here

      c

    4. Deploying SQL Server AG on Storage Spaces Direct

      Find template here

      template2

    5. Deploying a Storage Spaces Direct Cluster-Cluster replication with Storage Replica and Managed Disks

      Find template here

      template3a template3

    6. Deploying Server-Server replication with Storage Replica and Managed Disks

    Find template here

    template4 template4a

    Deployment Considerations:

    Cluster Witness:

    It is recommended to use a Cloud Witness for Azure Guest Clusters.

    cloudwitness

    Cluster Authentication:

    There are three options for Cluster Authentication for your guest cluster:

    1. Traditional Domain Controller

      This is the default and predominant cluster authentication model where one or two (for higher availability) IaaS VM Domain Controllers are deployed.

    domainjoined

    Azure template to create a new Azure VM with a new AD Forest can be found here

    dj3

    Azure template to create a new AD Domain with 2 Domain Controllers can be found here

    dj2

    2. Workgroup Cluster

    A workgroup cluster reduces the cost of the deployment due to no DC VMs required. It reduces dependencies on Active Directory helping deployment complexity. It is an ideal fit for small deployments and test environments. Learn more here.

    workgroup

    3. Using Azure Active Directory

    Azure Active Directory provides a multi-tenant cloud based directory and identity management service which can be leveraged for cluster authentication. Learn more here

    aad

    Cluster Storage:

    There are three predominant options for cluster storage in Microsoft Azure:

    1. Storage Spaces Direct

      s2d

      Creates virtual shared storage across Azure IaaS VMs. Learn more here

    2. Application Replication

      apprep

    Replicates data in application layer across Azure IaaS VMs. A typical scenario is seen with SQL Server 2012 (or higher) Availability Groups (AG).

    3. Volume Replication

    Replicates data at volume layer across Azure IaaS VMs. This is application agnostic and works with any solution. In Windows Server 2016 volume replication is provided in-box with Storage Replica. 3rd party solutions for volume replication includes SIOS Datakeeper.

    Cluster Networking:

    The recommended approach to configure the IP address for the VCO (for instance for the SQL Server FCI) is through an Azure load balancer. The load balancer holds the IP address, on 1 cluster node at a time. The below video walks through the configuration of the VCO through a load balancer.

     

    Storage Space Direct Requirements in Azure:

    • Number of IaaS VMs: A minimum of 2
    • Data Disks attached to VMs:
      • A minimum of 4 data disks required per cluster i.e. 2 data disks per VM
      • Data disks must be Premium Azure Storage
      • Minimum size of data disk 512GB
    • VM Size: The following are the guidelines for minimum VM deployment sizes.
      • Small: DS2_V2
      • Medium: DS5_V2
      • Large: GS5
      • It is recommended to run the DskSpd utility to evaluate the IOPS provided for a VM deployment size. This will help in planning an appropriate deployment for your production environment. The following video outlines how to run the DskSpd tool for this evaluation.

    Using Storage Replica for a File Server

    The following are the workload characteristics for which Storage Replica is a better fit than Storage Spaces Direct for your guest cluster.

    • Large number of small random reads and writes
    • Lot of meta-data operations
    • Information Worker features that don’t work with Cluster Shared Volumes.

    srcomp

    UDP using File Share (SoFS) Guest Cluster

    Remote Desktop Services (RDS) requires a domain-joined file server for user profile disks (UPDs). This can be facilitated by deploying a SoFS on a domain-joined IaaS VM guest cluster in Azure. Learn about UPDs and Remote Desktop Services here

    Container Storage Support with Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D), SMB Global Mapping

    $
    0
    0

    By Amitabh Tamhane

    Goals: This topic provides an overview of providing persistent storage for containers with data volumes backed by Cluster Shared Volumes (CSV), Storage Spaces Direct (S2D) and SMB Global Mapping.

    Applicable OS releases: Windows Server 2016, Windows Server RS3

    Prerequisites:

    Blog:

    With Windows Server 2016, many new infrastructure and application workload features were added that deliver significant value to our customers today. Amongst this long list, two very distinct features that were added: Windows Containers & Storage Spaces Direct!

    1.   Quick Introductions

    Let’s review a few technologies that have evolved independently. Together these technologies provide a platform for persistent data store for applications when running inside containers.

    1.1         Containers

    In the cloud-first world, our industry is going through a fundamental change in how applications are being developed & deployed. New applications are optimized for cloud scale, portability & deployment agility. Existing applications are also transitioning to containers to achieve deployment agility.

    Containers provide a virtualized operating system environment where an application can safely & independently run without being aware of other applications running on the same host. With applications running inside containers, customers benefit from the ease of deployment, ability to scale up/down and save costs by better resource utilization.

    More about Windows Containers.

    1.2         Cluster Shared Volumes

    Cluster Shared Volumes (CSV) provides a multi-host read/write file system access to a shared disk. Applications can read/write to the same shared data from any node of the Failover Cluster. The shared block volume can be provided by various storage technologies like Storage Spaces Direct (more about it below), Traditional SANs, or iSCSI Target etc.

    More about Cluster Shared Volumes (CSV).

    1.3         Storage Spaces Direct

    Storage Spaces Direct (S2D) enables highly available & scalable replicated storage amongst nodes by providing an easy way to pool locally attached storage across multiple nodes.

    Create a virtual disk on top of this single storage pool & any node in the cluster can access this virtual disk. CSV (discussed above) seamlessly integrates with this virtual disk to provide read/write shared storage access for any application deployed on the cluster nodes.

    S2D works seamlessly when configured on physical servers or any set of virtual machines. Simply attach data disks to your VMs and configure S2D to get shared storage for your applications. In Azure, S2D can also be configured on Azure VMs that have premium data disks attached for faster performance.

    More about Storage Spaces Direct (S2D). S2D Overview Video.

    1.4         Container Data Volumes

    With containers, any persistent data needed by the application running inside will need to be stored outside of the container or its image. This persistent data can be some shared read-only config state or read-only cached web-pages, or individual instance data (ex: replica of a database) or shared read-write state. A single containerized application instance can access this data from any container host in the fabric or multiple application containers can access this shared state from multiple container hosts.

    With Data Volumes, a folder inside the container is mapped to another folder on the container host using local or remote storage. Using data volumes, application running inside containers access its persistent data while not being aware of the infrastructure storage topology. Application developer can simply assume a well-known directory/path to have the persistent data needed by the application. This enables the same container application to run on various deployment infrastructures.

    2.   Better Together: Persistent Store for Container Fabric

    This data volume functionality is great but what if a container orchestrator decides to place the application container to a different node? The persistent data needs to be available on all nodes where the container may run. These technologies together can provide a seamless way to provide persistent store for container fabric.

    2.1         Data Volumes with CSV + S2D

    Using S2D, you can leverage locally attached storage disks to form a single pool of storage across nodes. After the single pool of storage is created, simply create a new virtual disk, and it automatically gets added as a new Cluster Shared Volume (CSV). Once configured, this CSV volume gives you read/write access to the container persistent data shared across all nodes in your cluster.

    With Windows Server 2016 (plus latest updates), we now have enabled support for mapping container data volumes on top of Cluster Shared Volumes (CSV) backed by S2D shared volumes. This gives application container access to its persistent data no matter which node the container orchestrator places the container instance.

    Configuration Steps

    Consider this example (assumes you have Docker & container orchestrator of your choice already installed):

    1. Create a cluster (in this example 4-node cluster)

    New-Cluster -Name <name> -Node <list of nodes>

    (Note: The generic warning text above is referring to the quorum witness configuration which you can add later.)

    1. Enable Cluster S2D Functionality

    Enable-ClusterStorageSpacesDirect or Enable-ClusterS2D

    (Note: To get the optimal performance from your shared storage, it is recommended to have SSD cache disks. It is not a must have for getting a shared volume created from locally attached storage.)

    Verify single storage pool is now configured:

    Get-StoragePool S2D*

    1. Create new virtual disk + CSV on top of S2D:

    New-Volume -StoragePoolFriendlyName *S2D* -FriendlyName <name> -FileSystem CSVFS_REFS -Size 50GB

     

    Verify new CSV volume getting created:

    Get-ClusterSharedVolume

    This shared path is now accessible on all nodes in your cluster:

    1. Create a folder on this volume & write some data:

    1. Start a container with data volume linked to the shared path above:

    This assumes you have installed Docker & able to run containers. Start a container with data volume:

    docker run -it –name demo -v C:\ClusterStorage\Volume1\ContainerData:G:\AppData nanoserver cmd.exe

    Once started the application inside this container will have access to “G:\AppData” which will be shared across multiple nodes. Multiple containers started with this syntax can get read/write access to this shared data.

    Inside the container, G:\AppData1 will then be mapped to the CSV volume’s “ContainerData” folder. Any data stored on “C:\ClusterStorage\Volume1\ContainerData” will then be accessible to the application running inside the container.

    2.2         Data Volumes with SMB Global Mapping (Available in Windows Server RS3 Only)

    Now what if the container fabric needs to scale independently of the storage cluster? Typically, this is possible through SMB share remote access. With containers, wouldn’t it be great to support container data volumes mapped to a remote SMB share?

    In Windows Server RS3, there is a new support for SMB Global Mapping which allows a remote SMB Share to be mapped to a drive letter. This mapped drive is then accessible to all users on the local host. This is required to enable container I/O on the data volume to traverse the remote mount point.

    With Scaleout File Server, created on top of the S2D cluster, the same CSV data folder can be made accessible via SMB share. This remote SMB share can then be mapped locally on a container host, using the new SMB Global Mapping PowerShell.

    Caution: When using SMB global mapping for containers, all users on the container host can access the remote share. Any application running on the container host will also have access to the mapped remote share.

    Configuration Steps

    Consider this example (assumes you have Docker & container orchestrator of your choice already installed):

    1. On the container host, globally map the remote SMB share:

    $creds = Get-Credentials

    New-SmbGlobalMapping -RemotePath \\contosofileserver\share1 -Credential $creds -LocalPath G:

    This command will use the credentials to authenticate with the remote SMB server. Then, map the remote share path to G: drive letter (can be any other available drive letter). Containers created on this container host can now have their data volumes mapped to a path on the G: drive.

    1. Create containers with data volumes mapped to local path where the SMB share is globally mapped.

    Inside the container, G:\AppData1 will then be mapped to the remote share’s “ContainerData” folder. Any data stored on globally mapped remote share will then be accessible to the application running inside the container. Multiple containers started with this syntax can get read/write access to this shared data.

    This SMB global mapping support is SMB client-side feature which can work on top of any compatible SMB server including:

    • Scaleout File Server on top of S2D or Traditional SAN
    • Azure Files (SMB share)
    • Traditional File Server
    • 3rd party implementation of SMB protocol (ex: NAS appliances)

    Caution: SMB global mapping does not support DFS, DFSN, DFSR shares in Windows Server RS3.

    2.3 Data Volumes with CSV + Traditional SANs (iSCSI, FCoE block devices)

    In Windows Server 2016, container data volumes are now supported on top of Cluster Shared Volumes (CSV). Given that CSV already works with most traditional block storage devices (iSCSI, FCoE). With container data volumes mapped to CSV, enables reusing existing storage topology for your container persistent storage needs.

    How to Switch a Failover Cluster to a New Domain

    $
    0
    0

    In this blog I will describe some new capabilities in Windows Server, version 1709 that enables changing a deployed Failover Cluster from one domain to another.

    For the last two decades, changing the domain membership of a Failover Cluster has always required that the cluster be destroyed and re-created. This is a time-consuming process, and we have worked to improve this.

    This is going to enable scenarios such as building a Failover Cluster in one location and then ship it to its final location or in the event that companies have merged and need to move them to their domain structure.

    Moving a Cluster from one domain is a straight-forward process. To accomplish this, we introduced two new PowerShell commandlets.

    • New-ClusterNameAccount – creates a Cluster Name Account in Active Directory
    • Remove-ClusterNameAccount – removes the Cluster Name Accounts from Active Directory

    In the following example, this is my setup and goal:

    • 2-node Windows Server, version 1709 Failover Cluster
    • In the Cluster, the Cluster Name is CLUSCLUS and I have a File Server called FS-CLUSCLUS
    • Both nodes are member of the same domain
    • Both nodes and Cluster need to move to a new domain

    The process to accomplish to accomplish this is to change the cluster from one domain to a workgroup and back to the new domain. For example:

    Steps to Change Domain Membership

    Create a local Administrator account with the same name and password on all nodes.

    Log on to the first node with a domain user account that has Active Directory permissions to the Cluster Name Object (CNO) and Virtual Computer Objects (VCO) and open PowerShell.

    Ensure all cluster Network Name resources are in an Offline state and run the below command to change the type of the Cluster to a workgroup.

    Remove-ClusterNameAccount -Cluster CLUSCLUS -DeleteComputerAccounts

    Use Active Directory Users and Computers to ensure the CNO and VCO computer objects associated with all cluster names have been removed.

    If so, it is a good idea to go ahead and stop the Cluster Service on both nodes and set the service to MANUAL so that it does not start during this process.

    Stop-Service -Name ClusSvc
    
    Set-Service -Name ClusSvc -StartupType Manual

    Change the nodes domain membership to a workgroup, reboot, then join to the new domain, and reboot again.

    Once the nodes are in the new domain, log on to a node with local Administrator account, start the Cluster Service, and set it back to Automatic.

    Start-Service -Name ClusSvc
    
    Set-Service -Name ClusSvc -StartupType Automatic

    Bring the Cluster Name and all other cluster Network Name resources to an Online state.

    Start-ClusterGroup -Name "Cluster Group"
    
    Start-ClusterResource -Name FS-CLUSCLUS

    We now need to change Cluster to be a part of the new domain with associated active directory objects. To do this, the command is below. The network name resources must be in an online state.

    New-ClusterNameAccount -Name CLUSTERNAME -Domain NEWDOMAINNAME.com -UpgradeVCOs

    Please note that if you do not have any additional groups with names (i.e. a Hyper-V Cluster with only virtual machines), the -UpgradeVCOs parameter switch is not needed.

    Use Active Directory Users and Computers to check the new domain and ensure the associated computer objects were created. If they have, then bring the remaining resources in the file server group online.

    Start-ClusterGroup -Name FS-CLUSCLUS

     

    Viewing all 46 articles
    Browse latest View live