mersenneforum.org Running fstrim on SSD while mprime is running might cause errors in mprime
 Register FAQ Search Today's Posts Mark Forums Read

 2019-10-15, 06:59 #1 AwesomeMachine   Apr 2018 USA 1310 Posts Running fstrim on SSD while mprime is running might cause errors in mprime I have good history with the system throwing the errors. After every output I get this: Hardware errors have occurred during the test! 1 Jacobi error. This started after I ran fstrim on the file system that mprime and its files reside on, while it was running. Because of wear-leveling algorithms, SSDs have no way to tell natively which parts of the file system are no longer in use by the operating system, and vice versa. Fstrim is a program that marks the reusable areas of a file system so the SSD firmware knows it can reuse them. I suspect there is a flaw somewhere in the fstrim>kernel>filesystem>mprime>filesystem chain such that fstrim marks parts of mprime's files as not in use when in fact this is an error. Since the problem seems unique to mprime, it is possible it is using some old kernel calls that fail under certain more recently developed circumstances, or with less sophisticated file formats. I am really not fit to troubleshoot this possibility. But I will say it is probably better to close mprime before running fstrim.
 2019-11-16, 00:44 #2 phillipsjk   Nov 2019 5×13 Posts It is possible that the SSD has buggy firmware. The fstrim command should only tell the SSD to TRIM unallocated space, unless there is a kernel bug. A work-around may be to disable automatic TRIM (I think it would be in the mount options); and only run it monthly or similar. If the error persists without running TRIM, you may actually have an unrelated hardware error (I would guess RAM). Last fiddled with by phillipsjk on 2019-11-16 at 00:45 Reason: Grammar, spelling
 2020-02-22, 00:20 #3 AwesomeMachine   Apr 2018 USA 11012 Posts Happened again! The problem occurred again. No fstrim was run between times. Code: [Worker #1 Feb 21 18:38] Iteration: 37610000 / 101988773 [36.87%], ms/iter: 52.999, ETA: 39d 11:47 [Worker #1 Feb 21 18:38] Hardware errors have occurred during the test! [Worker #1 Feb 21 18:38] 1 Gerbicz/double-check error. [Worker #1 Feb 21 18:38] Confidence in final result is excellent. [Worker #1 Feb 21 18:40] Gerbicz error check passed at iteration 37611256. [Worker #3 Feb 21 18:40] M103931309 stage 1 is 32.05% complete. Time: 467.809 sec. [Worker #4 Feb 21 18:41] Iteration: 9890000 / 103946203 [9.51%], ms/iter: 45.156, ETA: 49d 03:46 [Worker #2 Feb 21 18:45] Iteration: 35440000 / 101992529 [34.74%], ms/iter: 44.817, ETA: 34d 12:31 Only happens when I run fstrim. System isn't configured for auto trim, only manual trim. It is possible it's a drive firmware bug, but those aren't generally application specific. This time I paused mprime, but did not exit completely. If I remember in a few months--when I trim the file system next--I'll completely exit mprime, and see if that makes a difference. I predict it will!
 2020-02-22, 04:57 #4 retina Undefined     "The unspeakable one" Jun 2006 My evil lair 6,449 Posts The data in RAM is being corrupted, thus you get the error reported. So if you are sure it is related to fstrim then there can be a number of possible cause. Buggy driver (already mentioned above). Bad PSU dropping voltage when the drive is sucking more current during the trim. Overheating of the system during trim. etc. But also be open to the idea that trim is just a coincidence. It could be a flaky RAM stick. Cosmic ray upsets. Alpha decay in the RAM packaging. Overzealous clocking of some part. etc. Last fiddled with by retina on 2020-02-22 at 04:58
 2021-10-07, 23:49 #5 AwesomeMachine   Apr 2018 USA 13 Posts PSU doubtful cause Well, I doubt if it's the PSU, because it's a laptop, and the mprime program itself requires more power than executing the trim command. The drive passes every test of it's functionality. The problem only occurs with the combination of mprime and fstrim. And now the problem has mysteriously disappeared without even the most insignificant hardware change. I doubt if the ram was being written over, because that has nothing to do with the issue, and if it was the cause, it would occur in other scenarios. Alpha particles were a problem for system memory in the 1970s. So, probably not currently relevant. I surmise the program, to avoid making huge files outright, uses sparse files, and fstrim doesn't handle sparse files well if they are open for r/w. Mprime, when stopped temporarily. When the mprime program is quit, using the menu item, it writes it's data and closes the files. Then, fstrim has no trouble determining the correct boundaries. Or, since I'm guessing, I might be completely incorrect! I want to thank the contributors to this discussion thread, for sparking my mind to think.

 Similar Threads Thread Thread Starter Forum Replies Last Post tshinozk Information & Answers 3 2013-12-10 16:26 jimmychauck Information & Answers 1 2010-06-16 04:42 Unregistered Information & Answers 14 2009-02-16 14:01 mhnaras Linux 2 2007-10-21 15:58 happyraul Software 4 2004-05-06 15:54

All times are UTC. The time now is 09:28.

Thu May 19 09:28:19 UTC 2022 up 35 days, 7:29, 0 users, load averages: 1.71, 1.81, 1.79