HPC Pack 2012 R2 Update 3 Fixes

Microsoft Download Center Archive

Published: June 3, 2017
Version: 4.05.5161
Category: Update
Language: English

This update fixes SOA, Management and Scheduler issues of HPC Pack 2012 R2 Update 3.

This update fixes some known issues of HPC Pack 2012 R2 Update 3:

SOA fixes
- Removed 4 MB message size limit - Now in SOA requests you can send requests that are larger than 4 MB in size. A large request will be split into smaller messages when persisting in MSMQ, where there is 4MB message size restriction;
- Configurable broker dispatcher capacity - Users can specify the broker dispatcher capacity instead of the calculated cores. This achieves more accurate grow and shrink behavior if the resource type is node or socket. Please refer the sample below:
      <loadBalancing dispatcherCapacityInGrowShrink="0"/>
   If value is 0 – dispatcher capacity is auto calculated by the number of cores. If value is an positive integer, dispatcher capacity will be the value specified.
   Dispatcher Capacity is defined as the number of requests that a service host can handle at a time, by default it is the number of cores a service host occupies. This value can also be specified by sessionStartInfo.BrokerSettings.DispatcherCapacityInGrowShrink per session
- An optional parameter ‘jobPriority’ is added in ExcelClient.OpenSession method for Excel VBA;
- Added GPU Unit type in SOA session API so that you can specify GPU resource in the SOA job;
- Fixed an issue that HA broker nodes may not be found by the system due to AccessViolationException in session service;
- Fixed an issue that SOA job may be stuck in queued state;
- Reduced the SOA job queue time in Balance/Graceful Preemption mode;
- Fixed an issue that durable session job may runaway when the client sends requests without flush and then disconnects;
- Fixed broker worker crash issue in some rare situation;
- Fixed an issue that a session job may stall when azure worker role nodes get re-deployed in a large deployment;
- Fixed an issue that SOA request may fail in some rare condition with large azure burst deployment;
- Added ParentJobIds in SessionStartInfo for SOA Session API so that parent jobs can be specified during session creation;
- Added ServiceHostIdleTimeout for SOA service, and the default value is 60 minutes;

Scheduler and API fixes
- Fix overflow in AllocationHistory table; This requires SQL Server 2012 or later version;
- Add cluster property JobCleanUpDayOfWeek to specify on which day of week should HPC Pack clean up Scheduler DB. For example, to let the service do job clean up on every Saturday, admin need to set:
      Set-HPCClusterProperty -JobCleanUpDayOfWeek “Saturday”
- Fix an issue that task may failed with “The parameter is incorrect” message for both on-premise and azure HPC IaaS cluster
- Fix a scheduler crash issue during startup;
- Enabled GPU related metrics;
- Improved of error handling for linux node manager;
- Fix a deadlock issue when finishing a job or a task to avoid queuing the whole cluster;
- Fix an issue that a job stuck in canceling and won’t release resource for other jobs resulting the whole cluster being blocked;
- Improve performance (added a few SQL index) when there is huge historical data;
- Added cluster configuration “DisableResourceValidation”. Now admin can set this value to true to skip validation on job resource requirement whether can be met by the current available resource. This allows user to submit jobs to a cluster without resource added or provisioned. To change the setting:
      Set-HPCClusterProperty -DisableResourceValidation $newValue
- Included job modification in job audit events. To see all job modification and activities, please try view the “Activity Log” in the job management UI or the output of command “job view <jobid> /history";
- Added new job action "Hold" in job GUI; Now you can hold a running job so that no new resources will be allocated to this job. And the job will be in “Draining” state if there is still active tasks running;
- Fix an issue that release task may be skipped to run in exclusive job;
- Fix an issue that clusrun may fail to get output from azure compute nodes due to compute node IP changes under auto grow shrink situation;
- Task execution filter - Task execution filter for Linux compute nodes to enable calling administrator-customized scripts that each time a task is executed on Linux nodes. This helps to enable scenarios such as executing tasks with an Active Directory account on Linux nodes and mounting a user's home folder for task execution. For more information, check "Get started with HPC Pack task execution filter";
- Set maximum memory for your task to be allowed during execution. User can add environment variable ‘CCP_MAXIMUMMEMORY’ in task, then the task will be marked as failed if the task tries to exceed the memory limitation set by this value on windows compute node. This setting currently isn't appliable on linux compute node;
- Task Level Node Group: We added initial support for specifying node group information for your tasks instead of specifying this information at the job level. A few things you need to be aware when using this feature:
      1. You’d better using this feature in Queued Scheduling Mode
      2. You can only assign one requested node group for your task and meanwhile, you shall not specify node groups for your job
      3. It is better you specify node groups without overlapping for your tasks within a job
      4. Currently you can specify the task requested node group in the scheduler API, job GUI or CLI

Management fixes
- Fix a socket exhaustion issue when AzureStorageConnectionString isn’t correctly configured;
- Fix an issue that SDM service can consume 100% of CPU time on the headnode some time;
- Add ‘Networks’ in return object in ‘Get-HpcNode’ powershell cmdlet;
- Support new Azure role size in azure bursting including Av2 and H series
- Fix an issue that admin may fail to remove an HPC user whose account is already removed from AD;
- Support GPU on workstation node as well as Linux nodes;
- Fix one issue for selecting OS version when add Azure batch pool in HPC Pack;
- Fix one issue that the heatmap sometime showing empty;
- Improve Auto grow shrink script, make the node online first before growing instead of waiting all nodes in OK state. Now the node will be able to accept jobs once it is started;
- Auto grow shrink script supports to grow/shrink compute nodes created with VM scaleset;
- Fix the issue that sometimes auto grow shrink script doesn’t use certificate for Azure authentication even when the certificate is configured;
- Fix the issue that Export-HpcConfiguration.ps1 exports the built-in template “LinuxNode Template” which shall not be exported;
- Support excludeNodeGroups property in built-in auto grow shrink, user can specify the node group in which he want those nodes to be excluded from auto grow shrink logic;
- Add option to disable the node from syncing hpc cluster admin to local administrator group, to do this, you need to add following value on the target node under registry HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\HPC
       Name: DisableSyncWithAdminGroup
       Type: REG_DWORD
       Data: 1

GUI Improvements
- Show total core in use, running job, running tasks in heatmap view status bar when no node selected;
- Now you can copy “allocated nodes” in job/task detail page;
- Custom properties page - In the Job dialog, you can now view and edit a job’s custom properties. And if the value of the property is a link, the link is displayed on the page and can be clicked by the user. If you would like a file location to be clickable as well, use the format file:///<location>, for example, file:///c:/users
- Substitution of mount point - When a task is executed on a Linux node, the user usually can’t open the working directory. Now within the job management UI you can substitute the mount point by specifying the job custom properties linuxMountPoint and windowsMountPoint so that the user can access the folder as well. For example, you can create a job with the following settings:
      Custom Property: linuxMountPoint = /gpfs/Production
      Custom Property: windowsMountPoint = Z:\Production
      Task Working Directory: /gpfs/Production/myjob
Then when you view the job from GUI, the working directory value in the Job dialog > View Tasks page > Details tab will be z:\production\myjob. And if you previously mounted the /gpfs to your local Z: drive, you will be able to view the job output file.
- Set subscribed information for node - The Administrator can set node subscribed cores or sockets from the GUI. Select offline nodes and perform the Edit Properties action;
- No copy job – If you specify the job custom property noGUICopy as true, the Copy action on the GUI will be disabled;
- Improve HPC job manager heatmap performance issue when there are more than 1000 nodes;
- Support to copy multiple SOA jobs on SOA job view with the correct format;

REST API and WebComponent fix
- Add new REST API Info/DateTimeFormat to query DataTime format info on HPC Pack REST server so that the client side can do DataTime parsing with the correct format;
- Improved job searching in HPC Web Portal; Now if you want to get all jobs with name contains “MyJobName” you need to specify “%MyJobName” in search box;
- Add new odata filter parameters “TaskStates”, “TaskIds”, and “TaskInstanceIds” to the REST API GetTaskList;

Note: This QFE had been refreshed in early June 2017 to address patching issues when your existing cluster is using remote DB with customized scheduler DB name;

This QFE will supercede QFE 3134307, QFE 3147178 and QFE 3161422 released earlier

Known issue:
Installation of the upgrade package KB318996 does not support SQL Server 2008 R2 or earlier.

Files

Status: Live

This download is still available on microsoft.com. The downloads below will come directly from the Microsoft Download Center.

File	Size
HpcWebComponents.msi SHA1: `a82fd8527f394d32e521204643db55081d517e41`	787 KB
KB3189996-x64.exe SHA1: `f7bd3413fea9c8ba8aa505324771e61283d1db66`	62.13 MB
KB3189996-x86.exe SHA1: `0d01ba21644125bc15a8a94a34ef27d0816e96cf`	62.07 MB

File sizes and hashes are retrieved from the Wayback Machine’s indexes. They may not match the latest versions of files hosted on Microsoft servers.

System Requirements

Operating Systems: Windows 7, Windows 8, Windows 8.1, Windows Server 2012, Windows Server 2012 R2

HPC Pack 2012 R2 Update 3 (build 4.5.5079.0) with or without QFE 3134307/3147178/3161422 (build 4.5.5094.0/4.5.5102.0/4.5.5111) installed.

Installation Instructions

This update needs to be run on all head nodes, broker nodes, workstation nodes, compute nodes and clients

Before applying the update, please check if HPC Pack 2012 R2 Update 3 is installed. The version number (in HPC Cluster Manager, click Help>About) should be 4.5.5079.0 (Or 4.5.5094.0/4.5.5102.0/4.5.5111 if early QFE is installed). Please take all nodes offline and ensure all active jobs finished or canceled. If there are Azure nodes, please make sure they are stopped before applying this patch. After all active operations on the cluster have stopped, please back up the head node (or head nodes) and all HPC databases by using a backup method of your choice.

Important:
The upgrade package KB318996 does not support uninstallation. After you upgrade, if you want to restore to HPC Pack 2012 R2 Update 3, you must completely uninstall the HPC Pack 2012 R2 Update 3 features from the head node computer and the other computers in your cluster, and reinstall HPC Pack 2012 R2 Update 3 and restore the data in the HPC databases.

Applying the update

To start the download, click the Download button next to the appropriate file (KB3189996-x64.exe for the 64-bit version, KB3189996-x86.exe for the 32-bit version) and then:

1. Click Save to copy the download to your computer.
2. Close any open HPC Cluster Manager or HPC Job Manager windows.
Note: Any open instances of HPC Cluster Manager or HPC Job Manager may unexpectedly quit or show an error message during the update process if left open. This does not affect installation of the update.
3. Run the download on the head node using an administrator account, and reboot the head node.
Note: If you have high availability head nodes, run the fix on the active node, move the node to passive, and after failover occurs run the fix on the new active node. Do this for all head nodes in the cluster.
4. Log on interactively, or use clusrun to deploy the fix to the compute nodes, broker nodes, unmanaged server nodes, and workstation nodes.
To use clusrun to patch the QFE on the compute nodes, broker nodes, unmanaged server nodes, and workstation nodes:
      a. Copy the appropriate version of the update to a shared folder such as \\headnodename\HPCUpdates .
      b. Open an elevated command prompt window and type the appropriate clusrun command for the operating system of the patch, e.g.:
      clusrun /nodegroup:ComputeNodes \\headnodname\HPCUpdates\KB3189996-x64.exe -unattend -SystemReboot
      clusrun /nodegroup:BrokerNodes \\headnodname\HPCUpdates\KB3189996-x64.exe -unattend -SystemReboot
Note: HPC Pack updates, other than Service Packs, do not get automatically applied when you add a new node to the cluster or re-image an existing node. You must either manually/clusrun apply the update after adding or reimaging a node or modify your node template to include a line to install the appropriate updates from a file share on your head node.
Note: If the cluster administrator doesn’t have administrative privileges on workstation nodes and unmanaged server node, the clusrun utility may not be able to apply the update. In these cases the update should be performed by the administrator of the workstation and unmanaged servers.
      c .To update workstation nodes and unmanaged server nodes you may need to reboot.

5. To update computers that run HPC Pack client applications apply the following actions:
      a. Stop any HPC client applications including HPC Job Manager and HPC Cluster Manager
      b. Run the update executable
      c. Reboot your client computer

6. You can run get-hpcpatchstatus.ps1 on headnode/computenodes/client under %CCP_HOME%bin to check the patch status. And the client version and server version will be 4.05.5158.0

7. To update on premise Linux compute nodes:
      a. Set up a file share to share update binaries from head node to Linux compute nodes , for see Get started with on-premises Linux compute nodes).Here we suppose we have established an SMB share C:\SmbShare on the head node as \\headnodename\SmbShare, and mount it on all the Linux compute nodes with path /smbshare.
      b. Find the on-premises Linux compute node installation binaries in the following folder: %CCP_DATA%InstallShare\LinuxNodeAgent. Then, copy the binaries hpcnodeagent.tar.gz and setup.py into \\headnodename\SmbShare in the head node, and check that the files can be seen in the path /smbshare from the Linux compute nodes.
      c. Run update command on the Linux compute nodes in the /smbshare with HPC 2012 R2 Update 3 nodemanger (version 1.7.11.0), e.g.:
            python setup.py –update
      d. (Optional) You can check whether the Linux compute nodes have been updated successfully by using command “/opt/hpcnodemanager/nodemanager -v” on the Linux compute nodes. The updated nodemanager version is 1.7.11.0.

Legacy Update: Get back online, activate, and install updates on your legacy Windows PC