Microsoft Download Center Archive

HPC Pack 2008 R2 SP4 Cumulative Update for Job Scheduling, Cluster Management, and SOA Runtime Issues on Windows Azure

  • Published:
  • Version: 3.4.4236.0
  • Category: Update
  • Language: English

This update fixes several issues on HPC Pack 2008 R2 with SP4, related to job scheduling, cluster management, and the SOA runtime on a cluster with Windows Azure nodes

  • This update is a cumulative fix for a Microsoft HPC Pack 2008 R2 with Service Pack 4 (SP4) cluster. It supersedes KB2816845 and KB2852726 and includes some additional fixes.

    This update fixes the following issues that may occur when you manage a Microsoft Windows HPC Pack 2008 R2 with SP4 cluster that contains Windows Azure nodes

    Job Scheduling Issues
    • Job state is incorrect or appears to be “stuck”.
      • Job remains in running state with tasks completed or failed
      • Jobs fail with "parent job cannot be validated" exception
      • Job completes successfully but is marked failed after failover of a high availability (HA) head node
      • Jobs remain in running state after database access errors
      • Job remains in draining state and prevents taking compute nodes offline
      • Job in running state cannot be cancelled
      • Cannot cancel jobs when the compute node is running a CPU intensive job
      • Tasks fail on Windows Azure compute nodes with error message: Exception 'Safe handle has been closed' when creating the task
      • Clusrun jobs fail on compute nodes in Windows Azure
      • Error ‘The password given is too short’ occurs when submittomg a job on Windows Azure nodes
    • Issues with exceptions and memory leaks have been addressed.
      • Crash in job scheduler during a large deployment to Windows Azure
      • Job scheduler memory leak after cancelling multiple jobs running in Windows Azure
      • Database timeout exceptions with large deployments to Windows Azure
      • Exception occurs when viewing job state from the command line
      • Exception that the job identifier is invalid when creating a task
      • Error message "Object reference not set to an instance of an object." for a failed job
      • Job fails validation with message “Node AZURECN-xxxx specified in required/requested nodes could not be found. Check the required/requested nodes to ensure the names are correct and try again”
    • Issue about privileges has been addressed
      • On Windows Azure worker nodes, group ‘users’ is missing the privilege ‘Allow Log on Locally’

    Cluster Management Issues
    • Windows Azure compute nodes fail to deploy or have an incorrect state. This hotfix provides greater tolerance for network latency and failures in communication between HPC services and Windows Azure.
      • Compute nodes in Windows Azure appear unreachable but are available in the Management Portal
      • Windows Azure compute nodes remain in the online state and cannot be deleted or stopped if the head node in a high availability cluster fails
      • The list of Windows Azure compute nodes becomes out of sync between the HPC Management Service and the HPC Job Scheduler Service for multiple deployments if one deployment fails
      • Compute nodes in Windows Azure repeatedly change between the reachable and unreachable states becuase an incorrect deployment ID isreported when there was a failure creating the deployment and the action is retried
      • There is a long delay between trying to stop a Windows Azure compute node and the operation failing
      • A deployed Windows Azure compute node in an offline state does not come online when an availability policy is enabled after the start time is passed
      • Cannot add Windows Azure compute nodes after high availability failover during a large deployment
      • Configuration package is not applied after a Windows Azure compute node is ready
      • Failure or timeout when uploading proxy certificates to Windows Azure
    • Issues with exceptions and memory leaks have been addressed.
      • Invalid XML in the Windows Azure configuration file when a startup script parameter contains special characters
      • HPC PowerShell cmdlets may leak memory or hang during certain operations
      • A memory leak in hpcmanagement.exe can occur with a large number of node templates
      • Crash in HPC Cluster Manager when reconnecting to an HA head node with the virtual cluster name

    SOA Runtime Issues
    • When Windows Azure nodes start, they fail to synchronize the SOA service package or application packages from Windows Azure Storage. This issue is more likely to occur in deployments with a large number of compute node role instances or a large deployment package to upload. This hotfix makes the synchronization more resilient to failures when accessing Windows Azure Storage
    • When the SOA session broker is running on a clustered broker node, SOA message level preemption is enabled, and a SOA task is preempted, the task does not exit as expected. When the task cancel grace period expires, the task is stopped by the scheduler. This hotfix resolves the problem by making the task exit gracefully.
    • When SOA session broker is running on a clustered broker node, the autoshrink feature of SOA doesn’t work because the broker fails to make a task exit. This hotfix resolves the problem by making the task exit gracefully.
    • The BrokerResponseEnumerator.MoveNext() method and BrokerResponse.Result property return error message “Heartbeat lost for broker node” when clients using the SOA session API attempt to retrieve more than 632 responses.

    This update fixes the follwoing issue that may occur when you run HPC SOA jobs on a Microsoft HPC Pack 2008 R2 with SP4 cluster that contains HA broker nodes.

    • An HPC SOA job may fail with an “Access Denied” error when running on a cluster with HA broker nodes.

    This update fixes the following issue that may occur when you run a large number of jobs on a Microsoft HPC Pack 2008 R2 with SP4 cluster

    • Job submission failures with unclear error messages
Knowledge Base Articles:

Files

Status: Deleted

This download is no longer available on microsoft.com. The downloads below are archives provided by the Internet Archive Wayback Machine from the Microsoft Download Center prior to August 2020.

FileSHA1 HashSize
KB2867769-x64.exe77b2713ca74744cb8c555fd992a66ca0ec871b9434.44 MB
KB2867769-x86.exe570654754d87b7618d26ac1df5941543c02429cd2.87 MB

System Requirements

Operating Systems: Windows 7, Windows HPC Server 2008 R2, Windows Server 2008 R2

Installation Instructions

  • This update need to be run on all compute nodes, broker nodes, head nodes, workstation nodes, unmanaged server nodes, and computers running the HPC Pack Client Utilities.

    Perform the following actions before installing the update
    1. Take all compute, workstation, and unmanaged server nodes offline and wait for all current jobs to drain
    2. Change the availability policy setting in the Windows Azure node template to manual
    3. Stop all existing Windows Azure compute nodes
    4. Close any HPC Cluster Manager and HPC Job Manager applications that are connected to the cluster head node
    5. Back up all HPC databases after all active operations on the cluster have stopped

    Applying the update
    To start the download, click the Download button next to the appropriate file (KB2867769-x64.exe for the 64-bit version, KB2867769-x86.exe for the 32-bit version) and then:
    1. Click Save to copy the download to your computer.
    2. Close any open HPC Cluster Manager or HPC Job Manager windows.
      Note: Any open HPC Cluster Manager or HPC Job Manager may unexpectedly quit or show an error message during the update process if left open. This does not affect installation of the update.
    3. Run the download on the headnode by using an administrator account, and reboot the headnode.
      Note: If you have HA head nodes, first update the active node, move the node to passive, and after failover occurs update the new active node. Do this for all head nodes in the cluster.
    4. Log on interactively, or use clusrun, to to deploy the fix to the compute, broker, workstation, and unmanaged server nodes.

    To use clusrun to deploy an HPC Pack 2008 R2 update on clusters that are running Service Pack 2, or higher:
    1. Copy the appropriate version of the update to a shared folder such as \\headnodename\HPCUpdates.
    2. Open an elevated command prompt window and type the appropriate clusrun command for the operating system of the patch, e.g.:
      clusrun /nodegroup:ComputeNodes \\headnodname\HPCUpdates\KB2867769-x64.exe -unattend -SystemReboot
      clusrun /nodegroup:BrokerNodes \\headnodname\HPCUpdates\KB2867769-x64.exe -unattend -SystemReboot
      Note: HPC Pack updates, other than Service Packs, are not automatically applied when you add a new node to the cluster or reimage an existing node. You must either manually apply or use clusrun to apply the update after adding or reimaging a node, or modify your node template to include a line to install the appropriate updates from a file share on your head node.
      Note: If the cluster administrator does not have administrative privileges on workstation nodes and unmanaged server nodes, the clusrun utility may not be able to apply the update. In these cases the update should be performed by the administrator of the workstation and unmanaged servers.
    3. To update workstation nodes and unmanaged server nodes you may need to reboot.

    To update computers that run HPC Pack Client Utilities apply the following actions:
    1. Stop any HPC client applications including HPC Job Manager and HPC Cluster Manager
    2. Run the update executable
    3. Reboot your client computer


    Uninstalling the update
    To uninstall the update
    1. Take all compute nodes, workstation nodes, and unmanaged server nodes offline and wait for all current jobs to drain
    2. Change the availability policy setting in the Windows Azure node template to manual
    3. Back up all HPC databases
    4. Stop existing Windows Azure compute nodes. You will need to redeploy Azure nodes after uninstalling the update. You do not need to delete them if you want to redeploy them in the future.
    5. You can uninstall the update in any order across all types of nodes.

    Some updates may apply to more than one piece of HPC Pack software. In order to uninstall those updates, remove them in the following order:
    1. Update for HPC Pack 2008 R2 Services for Excel 2010
    2. Update for HPC Pack 2008 R2 Client Components
    3. Update for HPC Pack 2008 R2 Server Components
    4. Update for HPC Pack 2008 R2 MS-MPI Redistributable Pack
    Note: If you don't follow this order, you might not be able to uninstall the updates because of dependencies across components

    Note: If you have HA head nodes, uninstall the updates on a passive head node, move the node to active, and then repeat uninstallation on the new passive head node. Do this for all head nodes in the cluster.

Related Resources