HPC Pack 2016 Update 1 fixes

Microsoft Download Center Archive

Published: May 28, 2018
Version: 5.01.6112.0
Category: Application
Language: English

This Update Fixes some issues of HPC Pack 2016 Update 1

This update fixes some known issues of HPC Pack 2016 update 1 as described below

Scheduler Fixes
- Fix “task min unit” not specified error when submitting job after switching job's unit type;
- Add a new value “KEEP” for job environment variable HPC_CREATECONSOLE, when this value specified, we will create a new logon console session if not exists or attach to the existing one and keep the console session after the job completes on the compute nodes;
- Fix a regression for cross-domain user when running job on windows compute nodes;
- Fix an issue that if scheduler object is GCed, its connection will not be closed;
- Fix task failure issue when cgroup is not installed on linux nodes;
- We now generate hostfile or machinefile for Intel MPI, Open MPI, MPICH or other MPI applications on linux nodes. A host file or machine file containing nodes or cores allocation information for MPI applications will be generated when rank 0 task is starting. User could use job or task environment variable $CCP_MPI_HOSTFILE in task command to get the file name, and $CCP_MPI_HOSTFILE_FORMAT to specify the format of host file or machine file. Currently, we have 4 format as below (Suppose we allocate N nodes and each node with 4 cores):
       1. The default host file format:
             nodename1
             nodename2
             …
             nodenameN
       2. When $CCP_MPI_HOSTFILE_FORMAT=1, the format is:
             nodename1:4
             nodename2:4
             …
             nodenameN:4
       3. When $CCP_MPI_HOSTFILE_FORMAT=2, the format is like:
             nodename1 slots=4
             nodename2 slots=4
             …
             nodenameN slots=4
       4. When $CCP_MPI_HOSTFILE_FORMAT=3, the format is like:
             nodename1 4
             nodename2 4
             …
             nodenameN 4
Here is an example how you can use this in your MPI PINGPONG run:
       source /opt/intel/impi/`ls /opt/intel/impi`/bin64/mpivars.sh && mpirun -f $CCP_MPI_HOSTFILE IMB-MPI1 pingpong
- Mutual trust for multi-node task (usually MPI task) on Linux nodes will be automatically set for all users including cluster admin. It is not required to set extendedData with the tool “HPCCred.exe”;
- By default, scheduler will use job’s runas user credential to do an “Interactive” logon on the compute node. And sometime the “Interactive” logon permission may be banned by your domain policy. We now introduced a new job environment variable "HPC_JOBLOGONTYPE" so that user could specify different logon type to mitigate the issue. The value of job environment variable could be set to 2,3,4,5,7,8,9 as below, more refer to "https://msdn.microsoft.com/en-us/library/windows/desktop/aa378184(v=vs.85).aspx"
      public enum LogonType
      {
             Interactive = 2,
             Network = 3,
             Batch = 4,
             Service = 5,
             Unlock = 7,
             NetworkClearText = 8,
             NewCredentials = 9,
      }
- Fix an issue that job would be stuck in queued state and block other jobs when it meets all the conditions: unit type is Node, run on a single node, node group assigned;
- Fix regression on Activation/Submission Filter;
- Enable HTML formated email notification;
- Fix the issue that HPC Pack 2012 R2 Update 3 scheduler API may not able to get node and group information from HPC Pack 2016 Update 1 Cluster (by adding back .net remoting for scheduler node service on port 6729);
- User mapping changes With this change, cluster admin will not be mapped to Linux “root” user by default any more. It will be mapped to an local Linux user with the same name(with or without domain prefix) in Linux compute node instead, which is a member of group “sudo” if the user is created by HPC Pack Linux nodemanager. Only Windows local system account “NT AUTHORITY\SYSTEM” will be mapped to Linux root user. When you use Clusrun, you can specify this local system account on the cluster manager Clusrun GUI or through Clusrun command “Clusrun /user:system <your command>”. Setting environment variable “CCP_MAP_ADMIN_USER” to “0” to map cluster admin to Linux root user as previous default behavior, but under this case mutual trust for root user between Linux compute nodes will not be set automatically;
- A checkbox named ‘Run as local system account "NT AUTHORITY\SYSTEM"’ is added to Clusrun dialog in HPC Cluster Manager. HPC administrators can run clusrun command in Linux compute nodes as root user by checking it;
- Hpccred.exe improvement. Command “hpccred listcreds” can be used to display the credentials owned by current user. HPC administrator can use “hpccred listcreds [/owner:username]” and “hpccred setcreds [/owner:username]” to display or to set credentials owned by other users. Extended data of each cluster user will be filled with RSA key for SSH automatically if user do not set it manually;
- Fix issue that our service won’t authenticate client machine in 3-headnode cluster mode with certification;
- Fix NullReferenceException if HPC registry key is absent on the client machine;

SOA Fixes
- Fix issue that exception in finding SOA service registration file shall not affect continuing finding in other paths;
- Fix an issue that SOA service registration folder cannot be on a network share other than the head node. When the share is on another machine, just make sure the share allow read access for <Domain>\<HeadNode>$ machine account;
- Fix issue that SOA service host may not able to access SOA service registration share in 3-headnode cluster mode;
- Fix an issue that causes SOA job failing if system.ServiceModel/bindings configuration settings is missing in the corresponding service configuration file;
- Removed authentication requirement for non-admin user get Service registration file via https endpoint;
- Fix issue that Non-domain joined SOA client cannot connect to Session Service using Net.Tcp binding;
- Fix Azure storage connection string loading issue in SOA data service;
- Fix two issues in v2 proxy client/operation contract mode: one is client/session does not idle time out; another is the total request counter is incorrectly deducte;

Management Fixes
- Fix issue that reporting service may not work in single headnode cluster;
- DB connection string plugin improvement: In addition to scheduler service, DB connection string in monitoring service, reporting service, diagnostics service and SDM service will also be refreshed when connection fails so that user can use customized assembly as plugin to refresh DB connection strings in these services.
- Fix deployment failure when auto scale is enabled for batch pool;
- Fix an issue that auto grow shrink service does not grow for jobs without auto calculated grow by min or max and a specified min for one core. In another word, the issue happens if GrowByMin is set to false (default) and the job is specified with 1-[N] or *-[N] cores, or GrowByMin is set to true and the job with 1-[N] or 1-* cores. * means auto calc;
- Fix node selection performance issue in HPC Cluster Manager;
- Fix auto grow issue in a group of nodes with mixed number of cores while the resource type of jobs is node or socket;
- Fix case sensitive issue in node group name for auto grow shrink service;
- Fix the issue that busy online nodes will be selected for grow if the queued job has RequestedNodes specified without NodeGroups;
- Add the missing script files Import-HPCConfiguration.ps1/Export-HPCConfiguration.ps1 and fix the issue that Set-HpcClusterName.ps1/Move-HpcNode.ps1 may not work;
- Fix issue that compute nodes may enter into WinPE many times during bare metal deployment;
- Make availability set optional for Azure IaaS node deployment;
- Fix NullRefereceException in bare metal deployment if deployment certificate not specified;
- Fix issue that Azure IaaS node failed to deploy when Headnode name has less than 4 characters;
- Fix Linux nodes unavailable issue when there is NIC on headnode associated with "169.254.*" IP;
- Fix the issue that management service fails to start when there is invalid IP address in hosts file;
- Fix "Specified cast is not valid" issue when run Get-HPCMetricValueHistory cmdlet in powershell;
- Fix job property deserialize issue in REST API;

Files

Status: Live

This download is still available on microsoft.com. The downloads below will come directly from the Microsoft Download Center.

File	Size
HpcApplicationType.sfpkg SHA1: `522a97738098e7b1573f7aafce437a7171cee7f0`	71.90 MB
hpcnodeagent.tar.gz SHA1: `ba2bed59249f4588320bb38049becdef6d170b8f`	4.63 MB
KB4135110_x64.exe SHA1: `fec9282dd9bfcac928f0c62216c87fad1ab9a507`	12.50 MB
KB4135110_x86.exe SHA1: `aae1ed577106be3e74767dc472ae628ceefb9a92`	7.42 MB
Upgrade-HpcApplication.ps1 SHA1: `e1eaa70a3115307ed48dda67afccadac283dc918`	9 KB

File sizes and hashes are retrieved from the Wayback Machine’s indexes. They may not match the latest versions of files hosted on Microsoft servers.

System Requirements

Operating Systems: Windows 10, Windows 7, Windows 8, Windows 8.1, Windows Server 2012, Windows Server 2012 R2, Windows Server 2016

HPC Pack 2016 Update 1 (build 5.1.6086.0) installed

Installation Instructions

Installation Instructions
This update needs to be run on all head nodes, broker nodes, workstation nodes, compute nodes (Windows and Linux), Azure IaaS nodes and clients

Before applying the update, please check if HPC Pack 2016 Update 1 is installed. The version number (in HPC Cluster Manager, click Help->About) should be 5.1.6086.0. Please take all nodes offline and ensure all active jobs finished or canceled. If there are Azure PaaS Nodes, please make sure they are stopped before applying this patch. After all active operations on the cluster have stopped, please back up the head node (or head nodes) and all HPC databases by using a backup method of your choice.

To start the download, click the Download button next to the appropriate file (KB4135110_x64.exe for the 64-bit version, KB4135110-x86.exe for the 32_bit version) and then:

Applying the update on Single Headnode
1. Click Save to copy the download to your computer;
2. Close any open HPC Cluster Manager or HPC Job Manager windows;
  Note: Any open instances of HPC Cluster Manager or HPC Job Manager may unexpectedly quit or show an error message during the update process if left open. This does not affect installation of the update;
3. Run the download on the head node using an administrator account, and reboot the head node;
4. The version number (Start HPC Cluster Manager, click Help->About) now should be 5.1.6114.0;
5. If you want to revert the patching, please go to Control Panel -->Programs and Features -->View installed updates, un-install below updates in order (Please don’t reboot in the middle): KB4135110 under “Microsoft ® HPC Pack 2016 Web Components”, “Microsoft ® HPC Pack 2016 Client Components” and then “Microsoft ® HPC Pack 2016 Server Components, then reboot;

Applying the update on Three Headnodes
Note: If you have at least three head nodes, you need to download HpcApplicationType.sfpkg and Upgrade-HpcApplication.ps1 as well and put them together on one of the headnode. And run through below steps:
1. To upgrade service fabric application, please open an elevated powershell command prompt window, run:
       Upgrade-HpcApplication.ps1
2. After the service fabric application upgrade done, you then need to run the KB4135110_x64.exe on headnode one by one;
3. Reboot the headnode one by one (Please check https://localhost:10400 to make sure the original reboot headnode back to health state before you reboot a new headnode, so that your service will keep available);
4. The version number (Start HPC Cluster Manager, click Help->About) now should be 5.1.6114.0 on all headnodes;
5. If you want to revert the patching, for every headnode please go to Control Panel --> Programs and Features --> View installed updates, un-install below updates in order (Please don’t reboot in the middle): KB4135110 under “Microsoft ® HPC Pack 2016 Web Components”, “Microsoft ® HPC Pack 2016 Client Components” and then “Microsoft ® HPC Pack 2016 Server Components; then downgrade service fabric application through below command;
       Connect-ServiceFabricCluster
       $hpcApplication = Get-ServiceFabricApplication -ApplicationName fabric:/HpcApplication
       $appParameters = @{}
       foreach($appParam in $hpcApplication.ApplicationParameters)
       {
       $appParameters[$appParam.Name] = $appParam.Value
       }

       Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/HpcApplication -ApplicationTypeVersion 1.0.1 -ApplicationParameter $appParameters -HealthCheckStableDurationSec 60 -UpgradeDomainTimeoutSec 1800 -UpgradeTimeout 3000 -FailureAction Rollback -Monitored | Out-Null
You can run "Get-ServiceFabricApplicationUpgrade -ApplicationName fabric:/HpcApplication" to track the upgrade status. If you find it is stuck at "PreUpgradeSafetyCheck" due to some service fail to cancel, you could try to manually kill the corresponding process on the right node. When if finished, reboot headnode one by one (you need wait the original rebooted node fully healthy in the service fabric cluster before you reboot the next one.

Applying the update on Windows nodes
1. Log on interactively, or use clusrun to deploy the fix to the compute nodes, broker nodes, unmanaged server nodes, Azure IaaS nodes and workstation nodes;
If you want to use clusrun to patch the QFE on the compute nodes, broker nodes, unmanaged server nodes, Azure IaaS nodes and workstation nodes:
       a. Copy the appropriate version of the update to a shared folder such as \\headnodename\HPCUpdates
       b. Open an elevated command prompt window and type the appropriate clusrun command for the operating system of the patch, e.g.:
             clusrun /nodegroup:ComputeNodes \\<headnodname>\HPCUpdates\ KB4135110-x64.exe -unattend -SystemReboot
             clusrun /nodegroup:BrokerNodes \\<headnodname>\HPCUpdates\ KB4135110-x64.exe -unattend -SystemReboot
Note: HPC Pack updates, other than Service Packs, do not get automatically applied when you add a new node to the cluster or re-image an existing node. You must either manually/clusrun apply the update after adding or reimaging a node or modify your node template to include a line to install the appropriate updates from a file share on your head node.
Note: If the cluster administrator doesn’t have administrative privileges on workstation nodes and unmanaged server node, the clusrun utility may not be able to apply the update. In these cases the update should be performed by the administrator of the workstation and unmanaged servers.
2. If you want to revert the patching, please go to Control Panel --> Programs and Features --> View installed updates, un-install below updates in order (Please don’t reboot in the middle): KB4135110 under “Microsoft ® HPC Pack 2016 Web Components”, “Microsoft ® HPC Pack 2016 Client Components” and then “Microsoft ® HPC Pack 2016 Server Components, then reboot;

Applying the update on Linux nodes
1. For 3-headnode Cluster, download and Copy “hpcnodeagent.tar.gz” to the remote install share of the HPC Cluster (Default should be \\<HN>\REMINST\LinuxNodeAgent ); while for single headnode cluster, this package will be replaced during headnode patching. Thus please back up the existing one so that you could downgrade to original version
2. Mount the share on linux node (Suppose you already created /mnt/share on all linux node):
       Clusrun /env:CCP_MAP_ADMIN_USER=0 /user:system /NodeGroup:LinuxNodes mkdir /mnt/share ^& mount -t cifs //<yourheadnode>/REMINST/LinuxNodeAgent /mnt/share -o vers=2.1, domain=<domainname>,username=<username>,password='<password>',dir_mode=0777,file_mode=0777
3. Clusrun with root on all linux node to update the package:
       Clusrun /env:CCP_MAP_ADMIN_USER=0 /user:system /NodeGroup:LinuxNodes /workdir:/mnt/share echo “python /mnt/share/setup.py -update” ^| at now + 1 minute
4. Wait for clusrun completion and the real update will start in a minute on the linux node; After the update completes, you can run Clusrun command to check the linux agent version by running below command (the version now will be 2.3.4.0):
       Clusrun /env:CCP_MAP_ADMIN_USER=0 /user:system /NodeGroup:LinuxNodes /workdir:/opt/hpcnodemanager ./nodemanager -v
5. If you want to revert back to the original linux agent version, you could restore the "hpcnodeagent.tar.gz" to the old version and apply the same above steps;

Applying the update on Client
1. To update computers that run HPC Pack client applications apply the following actions:
       a. Stop any HPC client applications including HPC Job Manager and HPC Cluster Manager;
       b. Run the update executable;
       c. Reboot your client computer;
2. If you want to revert the patching, please go to Control Panel --> Programs and Features --> View installed updates, un-install KB4135110 under “Microsoft ® HPC Pack 2016 Client Components” then reboot;

Legacy Update: Get back online, activate, and install updates on your legacy Windows PC