mersenneforum.org How-to guide for running LL tests on the Amazon EC2 cloud
 Register FAQ Search Today's Posts Mark Forums Read

 2017-03-04, 02:23 #23 pdr   "Philip Rogers" Feb 2017 San Francisco, CA 2 Posts Thank you for this awesome guide! I did some cost analysis of EC2 configurations and found some interesting results:c4.large (xeon E5 2666 v3 / 1 physical core, 2ht) is the most cost efficient configuration at around $6.21 per 80M LL. p2.xlarge (GPU+CPU) is not cost efficient compared to pure CPU tests on c4.large. At current spot prices, p2.xlarge is around$9.77 per 80M LL. There's a performance benefit to using one worker per physical core instead of the guide's one thread per physical core. The x1.16xlarge (2 x xeon E7 8880 v3 / 32 physical cores, 64ht) was the fastest machine I tested with a single 80M LL test taking around 30 hours. The x1.32xlarge (4 xeons) is probably around 20hrs per 80M LL. Neither are cost effective. Matching curtisc's performance on EC2 will set you back around $136k/yr :) Here's a link to the raw data: https://docs.google.com/document/d/1...LeHcSwpLk/view  2017-03-04, 20:42 #24 CRGreathouse Aug 2006 135338 Posts Very cool! Thanks for doing the experiment!  2017-03-04, 21:07 #25 GP2 Sep 2003 5×11×47 Posts Hey, thanks for giving it a try. Note: when using this guide, make sure you use the latest version of the user_data_body script (attached at the bottom of this message). Versions prior to 1.07 had a bug that failed to detect some of the existing (already-running) instances, which caused problems. Also, don't forget to edit the script to put in your own FILE_SYSTEM_ID values.$6.21 for a 80M exponent sounds like the right ballpark. I did a rough estimate just now and got about $6.70 for a 79M exponent, assuming 1.25 cents per hour for a c4.large. Of course, you have to search for the region with the cheapest spot prices (hint: the only American state whose name shares no letters with "mackerel"), because spot prices in other regions can be a lot higher, and there seems to be no effective arbitrage mechanism operating to make prices more consistent with one another. When running p2.xlarge, remember that you can simultaneously run mprime on the CPU and CUDALucas on the GPU (running different exponents on each, obviously), and they don't interfere with one another. The user_data_body script lets you do that. So this two-for-one should be factored into the cost calculation. On the other hand, spot prices for p2 instances can fluctuate fairly dramatically, much more so than the c4 instances, so it's hard to get a handle on how much a p2 is really costing you at any given time. You should probably exclude all the t2.* instances from benchmarks. They aren't intended for sustained use, and Amazon will drastically throttle them back after you use more than a certain rather small quota of CPU time per month. So they're basically unusable for number crunching applications. x1.16xlarge instances aren't really suited for LL testing, I use the nearly 1 TB of memory to run humungous GMP-ECM on a few of the 32 cores and mprime ECM on the remaining cores, hunting for additional factors of small exponents just for fun. It's hard to do exact benchmarks, because you're sharing physical machines with other AWS users, and their code is running on the other cores of the same physical machine and sometimes competing with you for cache, at least at the higher levels like L3. The mprime/Prime95 program is very sensitive to cache usage, so this can have some effect. However, there is a "dedicated tenancy" option that you can specify for your spot instances, so that you don't share hardware with other AWS users. When mprime runs on a multi-core AWS instance, it sometimes fails to properly detect which virtual cores share the same physical core (with hyperthreading). It does a runtime check at startup each time and sometimes this randomly fails to detect the layout of the cores correctly. This is usually only a problem for .2xlarge instances and higher. The solution would be to add the following AffinityScramble2 lines: For .xlarge instances, add this line to local.txt (local-init.txt): Code: AffinityScramble2=0213 For .2xlarge instances, add this line to local.txt (local-init.txt): Code: AffinityScramble2=04152637 For .4xlarge instances, add this line to local.txt (local-init.txt): Code: AffinityScramble2=08192A3B4C5D6E7F For c4.8xlarge instances, I'm not really sure because it exceptionally has 18 cores instead of 16, and I haven't tested it. Add this line to local.txt (local-init.txt): Code: AffinityScramble2=0I1J2K3L4M5N6O7P8Q9RASBTCUDVEWFXGYHZ For the other .8xlarge instances (other than c4.8xlarge) which have 16 cores rather than 18 (such as r4.8xlarge), add this line to local.txt: Code: AffinityScramble2=0G1H2I3J4K5L6M7N8O9PAQBRCSDTEUFV For .16xlarge instances, I'm not really sure because I haven't tested it. Add this line to local.txt (local-init.txt): Code: AffinityScramble2=0W1X2Y3Z4a5b6c7d8e9fAgBhCiDjEkFlGmHnIoJpKqLrMsNtOuPvQwRxSyTzU(V) I haven't really tested the last two (.8xlarge and .16xlarge), they might be incorrect. Edit: see the next message. The AffinityScramble2 lines will be obsolete in the next version 29.1 of mprime/Prime95, which will use improved code to automatically detect the layout of the cores. But if running version 28.10, you could try running the benchmarks with and without the AffinityScramble2 line and see if it makes a difference. The comparison is complicated by the fact that when the AffinityScramble2 line is omitted, sometimes the layout of the cores is correctly determined and sometimes it isn't, and it seems to be somewhat random. So you'd have to pay attention to the output at startup to see whether it did or didn't. If your benchmarks can confirm the effectiveness of AffinityScramble2 I can add it to the guide. It seemed to help when I ran it, but I just did a quick visual check rather than a proper benchmark. Last fiddled with by GP2 on 2017-03-04 at 22:12 Reason: delete paragraphs which misread spot prices in the data table  2017-03-04, 21:37 #26 GP2 Sep 2003 5·11·47 Posts Here is the /proc/cpuinfo data for c4.8xlarge and x1.16xlarge instances, maybe someone can help verify the AffinityScramble2 lines in the previous message: c4.8xlarge Code: processor : 0 physical id : 0 core id : 0 processor : 1 physical id : 0 core id : 1 processor : 2 physical id : 0 core id : 2 processor : 3 physical id : 0 core id : 3 processor : 4 physical id : 0 core id : 4 processor : 5 physical id : 0 core id : 5 processor : 6 physical id : 0 core id : 6 processor : 7 physical id : 0 core id : 7 processor : 8 physical id : 0 core id : 8 processor : 9 physical id : 1 core id : 0 processor : 10 physical id : 1 core id : 1 processor : 11 physical id : 1 core id : 2 processor : 12 physical id : 1 core id : 3 processor : 13 physical id : 1 core id : 4 processor : 14 physical id : 1 core id : 5 processor : 15 physical id : 1 core id : 6 processor : 16 physical id : 1 core id : 7 processor : 17 physical id : 1 core id : 8 processor : 18 physical id : 0 core id : 0 processor : 19 physical id : 0 core id : 1 processor : 20 physical id : 0 core id : 2 processor : 21 physical id : 0 core id : 3 processor : 22 physical id : 0 core id : 4 processor : 23 physical id : 0 core id : 5 processor : 24 physical id : 0 core id : 6 processor : 25 physical id : 0 core id : 7 processor : 26 physical id : 0 core id : 8 processor : 27 physical id : 1 core id : 0 processor : 28 physical id : 1 core id : 1 processor : 29 physical id : 1 core id : 2 processor : 30 physical id : 1 core id : 3 processor : 31 physical id : 1 core id : 4 processor : 32 physical id : 1 core id : 5 processor : 33 physical id : 1 core id : 6 processor : 34 physical id : 1 core id : 7 processor : 35 physical id : 1 core id : 8 x1.16xlarge Code: processor : 0 physical id : 0 core id : 0 processor : 1 physical id : 0 core id : 1 processor : 2 physical id : 0 core id : 2 processor : 3 physical id : 0 core id : 3 processor : 4 physical id : 0 core id : 4 processor : 5 physical id : 0 core id : 5 processor : 6 physical id : 0 core id : 6 processor : 7 physical id : 0 core id : 7 processor : 8 physical id : 0 core id : 8 processor : 9 physical id : 0 core id : 9 processor : 10 physical id : 0 core id : 10 processor : 11 physical id : 0 core id : 11 processor : 12 physical id : 0 core id : 12 processor : 13 physical id : 0 core id : 13 processor : 14 physical id : 0 core id : 14 processor : 15 physical id : 0 core id : 15 processor : 16 physical id : 1 core id : 0 processor : 17 physical id : 1 core id : 1 processor : 18 physical id : 1 core id : 2 processor : 19 physical id : 1 core id : 3 processor : 20 physical id : 1 core id : 4 processor : 21 physical id : 1 core id : 5 processor : 22 physical id : 1 core id : 6 processor : 23 physical id : 1 core id : 7 processor : 24 physical id : 1 core id : 8 processor : 25 physical id : 1 core id : 9 processor : 26 physical id : 1 core id : 10 processor : 27 physical id : 1 core id : 11 processor : 28 physical id : 1 core id : 12 processor : 29 physical id : 1 core id : 13 processor : 30 physical id : 1 core id : 14 processor : 31 physical id : 1 core id : 15 processor : 32 physical id : 0 core id : 0 processor : 33 physical id : 0 core id : 1 processor : 34 physical id : 0 core id : 2 processor : 35 physical id : 0 core id : 3 processor : 36 physical id : 0 core id : 4 processor : 37 physical id : 0 core id : 5 processor : 38 physical id : 0 core id : 6 processor : 39 physical id : 0 core id : 7 processor : 40 physical id : 0 core id : 8 processor : 41 physical id : 0 core id : 9 processor : 42 physical id : 0 core id : 10 processor : 43 physical id : 0 core id : 11 processor : 44 physical id : 0 core id : 12 processor : 45 physical id : 0 core id : 13 processor : 46 physical id : 0 core id : 14 processor : 47 physical id : 0 core id : 15 processor : 48 physical id : 1 core id : 0 processor : 49 physical id : 1 core id : 1 processor : 50 physical id : 1 core id : 2 processor : 51 physical id : 1 core id : 3 processor : 52 physical id : 1 core id : 4 processor : 53 physical id : 1 core id : 5 processor : 54 physical id : 1 core id : 6 processor : 55 physical id : 1 core id : 7 processor : 56 physical id : 1 core id : 8 processor : 57 physical id : 1 core id : 9 processor : 58 physical id : 1 core id : 10 processor : 59 physical id : 1 core id : 11 processor : 60 physical id : 1 core id : 12 processor : 61 physical id : 1 core id : 13 processor : 62 physical id : 1 core id : 14 processor : 63 physical id : 1 core id : 15 The x1.32xlarge has 128 vCPUs (64 physical cores). Can mprime handle that many cores? Last fiddled with by GP2 on 2017-03-04 at 21:49 2017-03-04, 22:28 #27 henryzz Just call me Henry "David" Sep 2007 Liverpool (GMT/BST) 2·32·331 Posts Quote:  Originally Posted by GP2 The x1.32xlarge has 128 vCPUs (64 physical cores). Can mprime handle that many cores? I think you may need 29.x.  2017-04-04, 00:34 #28 ATH Einyen Dec 2003 Denmark 62718 Posts About 1 week ago I ordered a c4x8large spot instance, and I managed to choose "Dedicated - run a dedicated instance". I did not think much of it, just that it sounded preferable over "shared hardware instance". Luckily I did it only a few days before the end of the month, so I noticed it on the final bill. The cost is an extra$2 per hour if you have at least one dedicated instance running, over 7 times more than the instance itself cost, and it does not mention that anywhere on the spot request form, I just went back and checked. That little click just cost me an extra ~ $300 until I just noticed it today, so be careful. It could have been much worse, I saw here that back in 2013 it was lowered from$10 per hour to $2 per hour: https://aws.amazon.com/blogs/aws/ec2...ice-reduction/ 2017-04-04, 04:47 #29 GP2 Sep 2003 5×11×47 Posts Quote:  Originally Posted by ATH About 1 week ago I ordered a c4x8large spot instance, and I managed to choose "Dedicated - run a dedicated instance".[/url] At first, that doesn't seem to make sense, because a c4.8xlarge uses all 18 cores of a physical machine, so you can't have anyone else running on it at the same time. But with spot instances, one user could be running on a physical machine that someone else was running on mere minutes earlier. Maybe the paranoid fear is that the second user could somehow read leftover data in memory on or the local hard drive, or find some way to install an exploit that would let them read the data of subsequent users. Maybe best security practices dictate that some very time-consuming decontamination procedure is needed before that physical machine can be used by others, including reflashing the firmware and reformatting the hard drives. Who knows? You could try contacting customer support, plead ignorance and try to get unexpected charges reversed.  2017-04-05, 05:15 #30 LaurV Romulan Interpreter "name field" Jun 2011 Thailand 2×11×449 Posts "Learn more about RDS reserved instances", ha! should I click on it? I am a bit afraid it may say something about my mother or tell me I am futile and send me back to the books... Edit: I clicked on it! It says "error 404"... (on the link provided by ATH, pricing tab, last paragraph, learn more about...).. haha... either he has no reserved instances, or they banned him too (actually, I am missing him a little bit, you know...) Last fiddled with by LaurV on 2017-04-05 at 05:19 2017-04-05, 16:53 #31 VBCurtis "Curtis" Feb 2005 Riverside, CA 120568 Posts Quote:  Originally Posted by LaurV "Learn more about RDS reserved instances", ha! should I click on it? (actually, I am missing him a little bit, you know...) Likewise, though I have no doubt it's a net gain for the forum. If only he had something like a blog where he tore apart public musings from supposed scientists... (I laughed out loud at the RDS instances! Thanks)  2017-04-09, 14:40 #32 ATH Einyen Dec 2003 Denmark 3,257 Posts They did actually did refund all the$296 I spent on the dedicated hardware, so that is a great service. I did not specifically ask for a refund in my ticket, so maybe that was the correct way to do it. I just said I got this unexpected extra cost (which was over 7 times the actually cost of the instance) and I think they should add a warning in the future to the spot request form about the extra charge. Last fiddled with by ATH on 2017-04-09 at 14:41
 2017-08-03, 18:21 #33 GP2     Sep 2003 5·11·47 Posts One additional factor to consider when trying to determine which instance type is most cost-effective: If your AWS account is more than one year old, then certain charges which were free in your first year become non-free. In particular, each c4 instance of any size (c4.large, c4.xlarge, etc) uses 8 GB of EBS-backed storage for the root filesystem. In your first year of usage, this is free, but after that it is charged at $0.10 per GB-month, or$0.80 per month per instance. This adds up, for instance if you are running 100 instances then obviously you would pay an additional $80 per month in total. Unfortunately you can't use less than 8 GB for the root filesystem, and you can't specify an "instance store" AMI to try to avoid the EBS charges, because those aren't compatible with c4 instances. There are 720 hours in a 30-day month, so if you have a c4.large instance (one core) with a spot price of, say, 1.6 cents per hour, then the additional$0.80 per month would be the equivalent of an additional 50 hours per month that you are billed for, on top of the actual 720 hours, or about an additional 7%. If the spot price for a c4.large instance were to fall to 1.0 cents per hour, then that same $0.80 would be the equivalent of an additional 80 hours per month, or approximately an additional 11%. The additional charges are less significant for large instances, for instance a c4.xlarge instance (two cores) with a spot price of, say, 3.2 cents per hour, then the additional$0.80 per month would represent only an additional 3.5%. So, the EBS charges slightly worsen the cost-effectiveness of the one-core c4.large instances versus the two-core c4.xlarge instances. Of course there are other factors as well. The actual spot prices fluctuate, and it will very often not be the case that the two-core c4.xlarge instances cost exactly twice as much per hour as the one-core c4.large instances. And performance-wise, the c4.xlarge will usually have a throughput that is a few percent less than the throughput of two c4.large instances when doing double-check exponents in the 40M range, although the throughputs seem to be nearly equivalent for first-time exponents in the 70M range. So there are multiple factors to consider when deciding whether running c4.large or c4.xlarge instances is more cost-effective at any given time. Note that anything bigger, such as the four-core c4.2xlarge all the way up to the 18-core (not 16-core) c4.8xlarge is rarely worthwhile, because the price of an N-core instance for larger N will usually be a lot more than N times the cost of a one-core c4.large instance, and the throughput of an N-core instance running mprime will usually be significantly less than the total throughput of N one-core c4.large instances.

 Similar Threads Thread Thread Starter Forum Replies Last Post GP2 Cloud Computing 4 2020-08-03 11:21 ZFR Software 4 2018-02-02 20:18 kladner Science & Technology 7 2017-03-02 14:18 dragonbud20 Information & Answers 12 2015-09-26 21:40 GARYP166 Information & Answers 11 2009-07-13 19:39

All times are UTC. The time now is 06:43.

Sat Jan 29 06:43:44 UTC 2022 up 190 days, 1:12, 1 user, load averages: 1.26, 1.31, 1.15