![]() |
![]() |
#12 |
Sep 2016
2·5·37 Posts |
![]()
Just curious, what happens if you remove all the computation and test the raw access pattern?
This could be a good starting for point inserting hardware counters. |
![]() |
![]() |
![]() |
#13 | |
"/X\(‘-‘)/X\"
Jan 2013
13×239 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#14 |
"/X\(‘-‘)/X\"
Jan 2013
13×239 Posts |
![]()
Do you have a DDR5 system for testing on as well? I only mention this because DDR5 is capable of transferring 64 bytes of memory in a single burst, whereas DDR4 requires two. If you hit main memory this may have an impact on your algorithm's performance.
|
![]() |
![]() |
![]() |
#15 | |
P90 years forever!
Aug 2002
Yeehaw, FL
816810 Posts |
![]() Quote:
I struggled with this for quite a while trying various padding strategies. I can't say I learned much about how memory layouts affect performance. I did find that streaming stores were beneficial once data no longer fits in the L3 cache. In the end, best results were obtained by allocating the array of gwnums linearly in memory, then just randomly scrambling the array of pointers. |
|
![]() |
![]() |
![]() |
#16 |
If I May
"Chris Halsall"
Sep 2002
Barbados
22·47·59 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How much time is spent on memory access in PRPs? | bentonsar | Hardware | 7 | 2021-11-21 12:41 |
Direct Graphics Memory Access | Xyzzy | GPU Computing | 0 | 2020-12-11 03:11 |
cpu memory access speed in detail | tServo | Hardware | 0 | 2020-07-21 23:43 |
Too Much Internet Access. | M0CZY | Software | 3 | 2005-10-17 15:41 |
Need access to a PowerPC G4 and G5 | ewmayer | Hardware | 0 | 2005-05-03 22:15 |