View Single Post
Old 2020-07-23, 17:04   #8
kriesel's Avatar
Mar 2017
US midwest

163068 Posts
Default Xeon phi

Knight's Corner PCIe 7120A Coprocessor

Software Setup
Because it's a coprocessor with its own RAM and OS, installation and setup involves two inherently differing operating systems (coprocessor uOS and host Linux or Windows) and communication between them. This arrangement is discussed in the Puget Systems online articles.

I chose to initially install mine on an active PCIe extender, in a system with an Asrock BTC Pro motherboard and i7-4790 cpu, and plenty of power. (An error, as we'll see later.)
First attempt, Asrock; partitioned drive for Windows 10, FAT32 commons area, Linux; installed Windows 10 first; Ubuntu later; then install onto Ubuntu and use GRUB Customizer to put Windows 10 as default boot to accommodate my usual use)
I was pleasantly surprised to see Ubuntu recognize and make sense of the Windows NTFS partition.

(MPSS download, unzip, install)

This system had plenty of power to run the card, but after progressing through the MPSS install process, I ran into "not enough resources", checked the BIOS settings, and found it had no way to set >4GB addressing for PCI/PCIe. I remembered setting exactly that in preparation for card installation but apparently it had been done to some other system. Dead end for this attempt. Maybe useful practice up to that point.

Second attempt, Lenovo D30 with Windows 10 already installed. Evict an RX550 to free up space. This system had had a Radeon VII installed before and has a ~KW power supply. The physical installation of the 7120A is a very tight fit though.
Install MPSS after Cygwin and doing ssh-keygen.
Was able to get various mic utilities to run, display coprocessor status, etc.

Before I got back to it, the system crashed with stop code 116. So not sure if I recorded all the relevant installation process & messages.

The coprocessor should be able to run Linux GIMPS cpu applications such as Mlucas, mprime, Mfactor, subject to the limits of on-card RAM size.

It's expected to have about a third the performance of the 7250, while using equal electrical power. That's using its unusual instruction set, IMCI. Which as far as I know, is not supported in any GIMPS software. Performance using lower more common instructions is likely to be lower yet.

ssh/login issues / setup
Similar issue on the coprocessor. Sorting that out should prove interesting.
Meanwhile it's been removed to free the slot and power for use of a fast GPU.

(add photos)

Knight's Landing as CPU on SuperMicro K1SPE motherboard
Taking advantage of an eBay listing for a 7210 bare bones system found on eBay, I placed an order ($499 US plus taxes & shipping) before alerting Ernst and the rest of GIMPS to the opportunity. The system received was a 7250 Xeon Phi with large radiator, liquid loop, no DRAM, no OS, no HD, and unexpectedly no power cord. The front power switch was not connected. Motherboard info is here. It's a different style case, but the following has a lot of info on the K1SPE MB and BIOS Motherboard related downloads are at; enter K1SPE as motherboard model.
It includes IPMI among its features. Haven't tried using that yet. SuperMicro's downloads related to IPMI might be useful for such an attempt. Windows IPMI support
From the Linux perspective,
And there's also this cautionary tale

I chose to set up for dual boot from the start; install and partition the boot drive as part of first OS install.

I installed Windows 10 Pro x64 from USB DVD drive, then did successive updates until current. The DVD version of Windows I had handy recognized 28 of the 68 real cores, ~1.8MB of L1 cache, 14MB of the L2 cache, 16GB MCDRAM. Prime95 v30.3b6 run attempts crashed before even appearing on the monitor. If curious about the details, see this post and its attached image.; CPU-Z capture here;

Updating Windows 10 to v1909 brought recognition of all the processor's cores, all 4.3MB of L1 cache, all 34MB of L2 cache and 16GB MCDRAM, and effortless running of prime95 V30.3b6. (More about that here.) Defaulting to dozens of workers with 4 cores each was not what I would have chosen. It's nearly optimal for aggregate throughput, but latency for the current GIMPS 100M wavefront could be a problem. I switched it later to 4 workers. That change sped getting enough successful double checks done to allow assignments at lower GIMPS assignment category numbers.
Benchmarking with hyperthreading in prime95 does x2, x3, and x4 HT, which are almost always progressively slower. (More detail at and its following post.) Benchmark results in second attachment of this post.
It commonly runs prime95 or prime95+Mfactor at above the nominal clock rate, sometimes at full turbo rate or even above. (With the motherboard-top side cover off.) While preparing this I checked clock rate, and was surprised to see 0.00 GHz displayed occasionally by Task Manager, 1.49 GHz otherwise.

I've also run Mfactor on it as a multitude of single-thread processes under Windows. Even managing them with batch files, when the system crashes or power fails, recovery from a large number of processes is unpleasant.

Haven't gotten around to installing Linux in the other drive partition yet. I had planned to also put Ubuntu on it, but after reading Ernst Mayer's experience attempting Linux installs, beginning at, Ernst's experience suggests the full CentOS image and a wired NIC is the way to go for Linux on this type system. This post reiterates Intel's statements on OS compatibility.

This system has generally been very problematic for getting through POST and a boot. One morning it took 45 minutes and many attempts to POST, completing boot on the second time it reached any sign of Windows starting. It had always failed on the consecutive few previous days when I was less persistent.
It also seems VERY sensitive to being tilted or touched while running, or even stepping on the LAN cable that lies on some carpet. These actions produce a red HDD light on the motherboard and all function ceases, until a power cycle, POST and boot succeed. Supermicro first level tech support did not have much to say about that.

An attempt at a WSL2 install was a bust. The cpu lacks some required virtualization support, or I missed some BIOS setting or Windows installation choice to enable its support. However, WSL 1 does not require such hardware virtualization support. After some fumbling about and searching online, I was able to demote the WSL installation from v2 to v1 by remove and reinstall, and finally in powershell clear some lingering setting that was causing trouble:
wsl --set-default-version 1

I was then able to install Ubunto 18.04 LTS to operate under WSL1. This and following updates/installs were done while prime95 ran on Windows. Running Ubuntu top showed only ~400M Ram available, so I stopped one of the four prime95 workers, increasing available ram for Ubuntu on WSL1 to ~4.5GB.

#what I've found I need to do on WSL 1 after a fresh Ubuntu 18.04 LTS install,
#to be able to build Mlucas V20.0 natively
sudo apt update
which gcc
which libgmp-dev
which make

# as needed, for each null response to which above, do the relevant following line; typically it's all 3:
sudo apt-get install gcc
sudo apt-get install libgmp-dev
sudo apt-get install make

#the build process
wget --no-check-certificate
tar xJf mlucas_v20.txz
cd ./mlucas_v20
cd ./obj
ls -l M*
cat Makefile
The parallel make finds 64 of the 7250's 68 full cores, and uses many of them, which makes for an interesting 'top' display and a quick build. Stop prime95 entirely, and user activity on the system, for the following:
#test for Ernst's recommended cases, 4 cores one process, without and with HT
./Mlucas -s m -cpu 0:3 -iters 1000 >& test4c.log
mv mlucas.cfg mlucas.cfg.4c.4t
./Mlucas -s m -cpu 0:3,68:71 >& test4c8t.log
mv mlucas.cfg mlucas.cfg.4c.8t
The Intel spec sheet for the 7250 says no VT-x or VT-d support. There are several other quite large "datasheets" available for the Xeon Phi x200:

Datasheet Volume 1, Electrical (147 pages)

Datasheet Volume 2, Registers (378 pages)

Thermal/Mechanical Specification and Design Guide (114 pages)

Specification Update (19 pages)

Some observations and possible BIOS tweaks later are in this post.
Also, configure BIOS power-returns setting at the FIRST opportunity, from "Last State" to "Power On". See posts 73 and 74 of the Xeon Phi forum thread for why.

The high core count (68 on a 7250) is too much for some Windows utilities for checking core loads or temperatures. See

An attempt to add both a 64GB DIMM and an RX550 low profile gpu worked only in the mechanical sense. There was just enough space to maneuver the gpu into installation position under the coolant lines that pass over the PCIe slots, only into the slot nearest the cpu. All 10+ attempts to start failed, by instant illumination of the HDD LED red on the motherboard, with no POST progress or video signal produced. The power and reset switches connecting to the motherboard are ineffectual after that LED lights; the power must be cut externally to try again.

Removal of the gpu but leaving the DIMM in place in slot A eventually produced a successful POST and boot after many tries. The system crashed about an hour later. A second boot ran overnight and continued until shutdown, >12 hours. Prime95 operation was drastically slowed however; two of 4 workers running primality tests show 40+ TIMES the previous iteration time, and one shows 80+ TIMES; there's no comparison data for the fourth which was running P-1 factoring. Windows Task Manager is confused, showing 234 cores and similarly but not quite proportionally overstated cache amounts. The logical processor count was correct.

Which memory mode the system ran on is unknown, but it appears to be flat mode or hybrid, since Windows 10 Task Manager reported 80GB in 9 occupied out of 14 slots; 16GB of MCDRAM occupies 8 nominal slots, and the 6 DIMM slots have only A occupied, with a 64GB DIMM. Cache amounts were increased also. I saw nowhere in the BIOS portion of the manual to select memory mode. Since the DRAM's presence was devastating to performance, it was removed and returned for a refund.

More recently, I found the following in the specification update:
"KNL25.Operating With DDR4-2400 Memory May Cause Unpredictable System Behavior
Problem:Operating the processor with DDR4 memory configured to operate at 2400 MT/s may cause unpredictable system behavior.
Implication:When the erratum occurs, the system will exhibit unpredictable system behavior.
Workaround:It is possible for BIOS to contain a workaround for this erratum.
Status:No Fix"

There are multiple indications the hardware does not support DIMMS larger than 64GB each. SuperMicro's web pages list DIMM product numbers up to 32GB DIMM or 64GB LRDIMM as possible types to install, but nothing larger. Intel's cpu specifications for the 7210 or 7250 indicate maximum addressable memory 384GB consistent with 64GB DIMM as the maximum size in the 6 DIMM slots. Page 12 of the datasheet volume 1 says "Six DDR4 channels, each channel limited to one DIMM per channel (max. DIMM capacity is 64 GB), 384 GB capacity total."

There's also a considerable cost premium currently per GB with higher density; 8 or 16 GB are about $55 ($7 or $3.5/GB); 32GB are about $125 ($4/GB); 64GB LRDIMM ~$300 each ($4.7/GB); 128GB ~$1000 each ($7.8/GB) or higher each. DIMM memory provides much lower bandwidth than the MCDRAM. Fully populating the slots would probably help. Even at 32GB DIMM prices that costs more than the base unit.

The indicated RAM bandwidth is 7200MHz on the 16GB of MCDRAM in the package.
At 1 DIMM, total RAM is 48GB; filling all 6 slots would be 192+16=208GB. That might make it pretty effective at P-1 factoring, if the massive slowdown seen with the 2400 MHz 64GB LRDIMM does not reoccur.

Updating to Windows 10 Pro v20H2 is slow and is completed.

The same RX550 that seemingly caused boot problems in the 7250 system was successfully installed and operating in a 7210 system on the same motherboard type.

I have installed a single 32GB 2133 MHz DIMM in slot A. (listed on eBay as "HMA84GR7MFR4N-TF Hynix 32GB PC4-17000P-R DDR4-2133P ECC REG 2RX4 Memory Module") Windows Task Manager displays 48GB of installed ram, and 9 of 14 slots filled. Prime95 V30.6 seems confused about the L2 cache, indicating 118x1MB on the 68-core 7250 system. Probably because Windows is, showing 234 cores, 272 logical processors, 14.6MB of L1 cache, and 118MB of L2 cache in the CPU pane of Task Manager.
Similar issues are seen with 6 such DIMMs installed.

The memory configuration for all the preceding was "Flat". It may fare better using the 16GiB of MCDRAM as "Cache". Windows compatibility as yet untested. There are some options indicated in the BIOS setup screens as not Windows compatible.

"the exact path is Advanced -> Chipset Configuration -> North Bridge -> Uncore Configuration."

(For more background, see the Xeon Phi discussion thread in the hardware subforum)
Attached Thumbnails
Click image for larger version

Name:	7250 with a 32GB 2133 HYNIX DIMM added.png
Views:	121
Size:	96.8 KB
ID:	24933  

Last fiddled with by kriesel on 2022-10-13 at 15:02
kriesel is offline