mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2007-11-15, 23:39   #1
roger
 
roger's Avatar
 
Oct 2006

22·5·13 Posts
Default Breaking up files

Thanks to whoever posted how to combine files, but how do you break them up? I've got a 6.5GB file from newpgen, and when I try to sieve it it says "out of memory." I set the max RAM to use to the maximum it would accept, but it still gave me the same error.

I don't know why it is giving me this error though. I looked at how many k's it left for every billion sieved through, and I should only be getting around 350M total, well under the 1B maximum.

Thanks!
roger is offline   Reply With Quote
Old 2007-11-16, 00:27   #2
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

10111011110102 Posts
Default

You probably need to increase the size of virtual memory, but with a file that large, you are going to experience performance issues.
rogue is offline   Reply With Quote
Old 2007-11-16, 01:20   #3
roger
 
roger's Avatar
 
Oct 2006

10416 Posts
Default

Yeah, I tried increasing the Max RAM, that didn't help.
Also, I have another file: 1.5GB, and it was cutting k's off at a speed around 20000/second when sieving in 1B segments. Now when I try to sieve it, it's doing around 15.

It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that??

Thanks!

Last fiddled with by roger on 2007-11-16 at 01:22
roger is offline   Reply With Quote
Old 2007-11-16, 04:03   #4
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×5×601 Posts
Default

I wasn't referring to RAM, but to virtual memory. If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want. The main problem is that with virtual memory you will be swapping physical memory to disk which can significantly slow the process down.

BTW, to edit the file, I would suggest that you write a small program or script to split it.

Last fiddled with by rogue on 2007-11-16 at 04:05
rogue is offline   Reply With Quote
Old 2007-11-16, 04:16   #5
jasong
 
jasong's Avatar
 
"Jason Goatcher"
Mar 2005

66618 Posts
Default

I think that in the future, as more cores are added, people will have less and less RAM per core. I think RAM amounts will continue to go up, but I think it will portray a downward trend when divided by number of cores.

For this reason, I think it would be a good idea for sieving programmers to consider the possibility of having sieving files that do double-duty, meaning one file is being acted on by more than one factoring program thread. This could cut down on total RAM usage and mean that people will be more likely to actually run the sieve software on more cores, meaning more throughput.
jasong is offline   Reply With Quote
Old 2007-11-16, 04:24   #6
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

3×19×137 Posts
Default

Quote:
It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that??
No idea if this is helpful:

Code:
SPLIT(1)                         User Commands                        SPLIT(1)

NAME
       split - split a file into pieces

SYNOPSIS
       split [OPTION] [INPUT [PREFIX]]

DESCRIPTION
       Output  fixed-size  pieces of INPUT to PREFIXaa, PREFIXab, ...; default
       size is 1000 lines, and default PREFIX is ‘x’.  With no INPUT, or  when
       INPUT is -, read standard input.

       Mandatory  arguments  to  long  options are mandatory for short options
       too.

       -a, --suffix-length=N
              use suffixes of length N (default 2)

       -b, --bytes=SIZE
              put SIZE bytes per output file

       -C, --line-bytes=SIZE
              put at most SIZE bytes of lines per output file

       -d, --numeric-suffixes
              use numeric suffixes instead of alphabetic

       -l, --lines=NUMBER
              put NUMBER lines per output file

       --verbose
              print a diagnostic to standard error  just  before  each  output
              file is opened

       --help display this help and exit

       --version
              output version information and exit

       SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

AUTHOR
       Written by Torbjorn Granlund and Richard M. Stallman.

REPORTING BUGS
       Report bugs to <bug-coreutils@gnu.org>.

COPYRIGHT
       Copyright © 2006 Free Software Foundation, Inc.
       This  is  free  software.   You may redistribute copies of it under the
       terms      of      the      GNU      General       Public       License
       <http://www.gnu.org/licenses/gpl.html>.   There  is NO WARRANTY, to the
       extent permitted by law.

SEE ALSO
       The full documentation for split is maintained as a Texinfo manual.  If
       the  info  and  split programs are properly installed at your site, the
       command

              info split

       should give you access to the complete manual.

split 5.97                       January 2007                         SPLIT(1)
Xyzzy is offline   Reply With Quote
Old 2007-11-16, 06:08   #7
roger
 
roger's Avatar
 
Oct 2006

22×5×13 Posts
Default

Quote:
If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want.
How would I do that?

Quote:
BTW, to edit the file, I would suggest that you write a small program or script to split it.
Sorry, I don't have any programming skills

@xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines.
roger is offline   Reply With Quote
Old 2007-11-16, 11:25   #8
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

780910 Posts
Default

Quote:
@xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines.
From above:

Code:
       -l, --lines=NUMBER
              put NUMBER lines per output file
We're sure somebody has ported "split" for DOS.
Xyzzy is offline   Reply With Quote
Old 2007-11-16, 14:17   #9
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
From above:

Code:
       -l, --lines=NUMBER
              put NUMBER lines per output file
We're sure somebody has ported "split" for DOS.
Yep. It is part of the MKS toolkit; a pretty good collection of
Unix tools that run under DOS. I use them all the time. head, tail,
split, sort, uniq, chmod, ps, etc. etc. etc.
R.D. Silverman is offline   Reply With Quote
Old 2007-11-16, 16:00   #10
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

29·41 Posts
Default

See http://sourceforge.net/project/showf...?group_id=9328 for a collection of unix utilities.

Just put the files in the wbin directory somewhere in your system path and you can use them in any directory from a cmd window
smh is offline   Reply With Quote
Old 2007-11-16, 16:33   #11
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186116 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
From above:

Code:
       -l, --lines=NUMBER
              put NUMBER lines per output file
We're sure somebody has ported "split" for DOS.
You can find a similar application for the Windows command line (and presumably DOS too) here:
http://www.fourmilab.ch/splits/
It's less flexible than the Linux one you listed the man output for, but it does the trick as long as you just need to split up a file into even-kilobyte chunks.

However, the one that smh listed is probably a more true port of the Linux one, so it would probably be more flexible.
mdettweiler is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Posting log files or other text files Xyzzy Forum Feedback 3 2018-12-30 19:37
Grammar rule-breaking ftw jasong Lounge 71 2013-10-15 03:39
Breaking: US DOJ Spied for Months on AP Reporters ewmayer Soap Box 11 2013-06-06 06:15
Breaking a prime p into a^2 + 3* b^2 SPWorley Math 3 2009-08-26 03:05
Beowolf cluster on the Cheap, breaking 100$/GFlop jflin Hardware 8 2007-09-06 08:25

All times are UTC. The time now is 11:42.

Sat Nov 28 11:42:21 UTC 2020 up 79 days, 8:53, 3 users, load averages: 0.95, 1.07, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.