mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Reply
 
Thread Tools
Old 2020-11-29, 16:24   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·251 Posts
Default Xeon Phi draft

Draft/placeholder, please wait.
This will be a reference thread. Don't post comments here. Post comments in a discussion thread.


Part of this was originally posted at https://www.mersenneforum.org/showpo...&postcount=128

An inexpensive way of experimenting with the Xeon Phi many-integrated-core hardware is to pick up a used Knights Corner coprocessor card. Depending on model and seller, these may range $50 to 200 currently (late 2020). With the low cost comes more complexity. See https://en.wikipedia.org/wiki/Xeon_Phi
In the model numbers, A suffix indicates one that includes active cooling, P indicates one that has passive cooling dependent on a server case designed to propel air through it.

Knights Corner is an older, slower, limited instruction set design with high power requirements.
There are many models. Here are two examples.
The 7120A has an integral blower propelling air lengthwise. (Post with photos) It needs:
PCIe socket (I think any flavor)
2 slots width
power connectors for 300W TDP: 75W PCIe slot + 75W 6-pin connector +150W 8pin connector
PCIe support for >4GB addressing in the system BIOS
A high priced example is https://www.amazon.com/Intel-Xeon-1-...HL/ref=sr_1_26

5110P or similar need cooling air flow provided by the system it's installed in (think high pressure blower, not ordinary fan) or a custom connection (3d printed?) to a high pressure blower. See https://www.mersenneforum.org/showpo...5&postcount=23 for photos with a cover removed.
PCIe socket
2 slots width
power connector for 225W TDP (75W from PCIe, other 150W from 8-pin connector)
PCIe support for >4GB addressing in the system BIOS

MPSS is the software that runs on the host system to talk to the uOS Linux variant embedded on the coprocessor card. Versions of MPSS are available for Linux and for Windows hosts.

Both of those are Knights Corner, not supporting AVX512. Puget Systems has some old articles about it. https://www.pugetsystems.com/all_hpc.php?query=phi
See also https://www.mersenneforum.org/showthread.php?t=16912

Knights Landing was made available both as the multicore cpu for system builders, and as an add-in co-processor. As the basis of a system apparently needs either RHEL or Centos, or a current build of Windows 10. No MPSS needed. It's the main (only) processor in the system in that case. There were also co-processor cards made with Knights Landing; 72xx, very hard to find. Knights Landing will run Mlucas, mprime, prime95 with AVX512 instruction set support, and will run Mfactor. The purchase price is higher, the performance higher (about triple that of Knights Corner per the wikipedia article tables), the power consumption lower. Occasionally a system will become available used for ~$500 (late 2020 pricing). See https://www.mersenneforum.org/showpo...8&postcount=34 These systems were configured oddly, with the case power switch nonfunctional.
https://www.mersenneforum.org/showpo...0&postcount=48 begins Ernst's saga over 16 days of occasional effort, of getting a usable Linux in place on his including GUI. I think for a more direct path, begin with the 7.7GB big Centos ISO, not the minimal, and a wired network connection.

The system has the SuperMicro K1SPE motherboard.
(previously posted) It's a different style case, but the following has a lot of info on the K1SPE MB and BIOS; https://www.supermicro.com/manuals/s...r/MNL-1891.pdf (and yes it was well hidden)
downloads: https://www.supermicro.com/support/r...urce_links.cfm

First Windows 10 install: https://www.mersenneforum.org/showpo...4&postcount=57
Updating to build 1909 of Windows 10 paid off, recognizing all cores, and running prime95 successfully after automatically defaulting to 4 cores per worker. It was found to be a 7250, 68 cores rather than the 7210 64-core listed online, which is also 1.4Ghz base clock rather than 1.3, for a nice 17% throughput boost. Shipper's error is our gain.
I had intended a hybrid dual-boot Windows 10 & WSL2 Ubuntu 18.04 / native Ubuntu 18.04 installation. WSL2 is a nonstarter since the processor lacks VT-x support.
Some observations reiterated from here:
1) Sometimes mine seems to get stuck during the BIOS initialization / POST, requiring a power cycle to try again.
2) The POST sequence is interminable but the time window for F11, F12, or DEL to select options from the white SuperMicro boot screen is brief. It would be nice to be able to shorten the one or lengthen the other.
3) Haven't experimented with BIOS settings to possibly skip / disable some portions of the initialization. SuperMicro tech support contact discouraged trying it.
4) The BIOS seems to support a commercial-size-kitchen-sink set of approaches. Disabling the unused ones might provide a considerable startup speedup, if possible, by eliminating timeout periods for things that ain't gonna happen (IPMI IP# issuance for example).
5) Jumper changes are another possibility. BMC disable.
6) One more way my system is wired oddly; documentation for the motherboard indicates the two adjacent RJ45 jacks are regular LAN ports, but if the one nearer the USB (#7 in fig 5-2 of the manual found online) is connected, DHCP fills in an IPMI IP# (remote console via IP), instead of providing LAN connectivity.
Some good news is checking prime.log and the worker windows of prime95 shows no sign of errors detected, in the 17 workers' 58.3M-59.1M LL DC progress, to 31-35% each so far and a few Jacobi checks each. These should all complete by November 2020's end. (Subsequent to this, mine became very difficult to coax through or even into POST.)

At the first opportunity, set the BIOS to Always Start to avoid the shutdown and won't restart issue 73; or use the workaround 74 and then visit the BIOS soon after.

The high core count exceeds the capabilities of some Windows temperature monitoring utilities. CoreTemp minimized to the System Tray works, and confirms Ernst's observations of the big water loop cooler keeping core temps in the 50-56C range at full load with the cpu-side panel off.

Observed power input (as indicated by the self monitoring of a Cyberpower sine output UPS) is ~298 W for the 7250 equipped system including a 500GB HDD, ~265 W for the 7210 equipped system including a 1TB HDD, while running prime95 on all 68 or 64 cores (whether also running Mfactor, 64 processes on the 7250, or not). Idle power of the 7250 was ~132 W.




(MORE)

Last fiddled with by kriesel on 2021-01-13 at 15:30
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GWNUM and Xeon Phi's paulunderwood Software 0 2019-07-11 14:02
z1d instances are available (Xeon @ up to 4.0 GHz) GP2 Cloud Computing 2 2018-07-27 18:15
Motherboard for Xeon ATH Hardware 7 2015-10-10 02:13
New Xeon firejuggler Hardware 8 2014-09-10 06:37
Xeon Phi TObject Hardware 34 2013-10-17 20:52

All times are UTC. The time now is 21:27.

Wed Apr 14 21:27:10 UTC 2021 up 6 days, 16:08, 0 users, load averages: 2.20, 2.28, 2.31

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.