mersenneforum.org Breaking up files
 Register FAQ Search Today's Posts Mark Forums Read

 2007-11-15, 23:39 #1 roger     Oct 2006 22·5·13 Posts Breaking up files Thanks to whoever posted how to combine files, but how do you break them up? I've got a 6.5GB file from newpgen, and when I try to sieve it it says "out of memory." I set the max RAM to use to the maximum it would accept, but it still gave me the same error. I don't know why it is giving me this error though. I looked at how many k's it left for every billion sieved through, and I should only be getting around 350M total, well under the 1B maximum. Thanks!
 2007-11-16, 00:27 #2 rogue     "Mark" Apr 2003 Between here and the 10111011110102 Posts You probably need to increase the size of virtual memory, but with a file that large, you are going to experience performance issues.
 2007-11-16, 01:20 #3 roger     Oct 2006 10416 Posts Yeah, I tried increasing the Max RAM, that didn't help. Also, I have another file: 1.5GB, and it was cutting k's off at a speed around 20000/second when sieving in 1B segments. Now when I try to sieve it, it's doing around 15. It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that?? Thanks! Last fiddled with by roger on 2007-11-16 at 01:22
 2007-11-16, 04:03 #4 rogue     "Mark" Apr 2003 Between here and the 2×5×601 Posts I wasn't referring to RAM, but to virtual memory. If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want. The main problem is that with virtual memory you will be swapping physical memory to disk which can significantly slow the process down. BTW, to edit the file, I would suggest that you write a small program or script to split it. Last fiddled with by rogue on 2007-11-16 at 04:05
 2007-11-16, 04:16 #5 jasong     "Jason Goatcher" Mar 2005 66618 Posts I think that in the future, as more cores are added, people will have less and less RAM per core. I think RAM amounts will continue to go up, but I think it will portray a downward trend when divided by number of cores. For this reason, I think it would be a good idea for sieving programmers to consider the possibility of having sieving files that do double-duty, meaning one file is being acted on by more than one factoring program thread. This could cut down on total RAM usage and mean that people will be more likely to actually run the sieve software on more cores, meaning more throughput.
2007-11-16, 04:24   #6
Xyzzy

"Mike"
Aug 2002

3×19×137 Posts

Quote:
 It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that??
No idea if this is helpful:

Code:
SPLIT(1)                         User Commands                        SPLIT(1)

NAME
split - split a file into pieces

SYNOPSIS
split [OPTION] [INPUT [PREFIX]]

DESCRIPTION
Output  fixed-size  pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is ‘x’.  With no INPUT, or  when
INPUT is -, read standard input.

Mandatory  arguments  to  long  options are mandatory for short options
too.

-a, --suffix-length=N
use suffixes of length N (default 2)

-b, --bytes=SIZE
put SIZE bytes per output file

-C, --line-bytes=SIZE
put at most SIZE bytes of lines per output file

-d, --numeric-suffixes
use numeric suffixes instead of alphabetic

-l, --lines=NUMBER
put NUMBER lines per output file

--verbose
print a diagnostic to standard error  just  before  each  output
file is opened

--help display this help and exit

--version
output version information and exit

SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

AUTHOR
Written by Torbjorn Granlund and Richard M. Stallman.

REPORTING BUGS
Report bugs to <bug-coreutils@gnu.org>.

This  is  free  software.   You may redistribute copies of it under the
terms      of      the      GNU      General       Public       License
<http://www.gnu.org/licenses/gpl.html>.   There  is NO WARRANTY, to the
extent permitted by law.

The full documentation for split is maintained as a Texinfo manual.  If
the  info  and  split programs are properly installed at your site, the
command

info split

split 5.97                       January 2007                         SPLIT(1)

2007-11-16, 06:08   #7
roger

Oct 2006

22×5×13 Posts

Quote:
 If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want.
How would I do that?

Quote:
 BTW, to edit the file, I would suggest that you write a small program or script to split it.
Sorry, I don't have any programming skills

@xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines.

2007-11-16, 11:25   #8
Xyzzy

"Mike"
Aug 2002

780910 Posts

Quote:
 @xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines.
From above:

Code:
       -l, --lines=NUMBER
put NUMBER lines per output file
We're sure somebody has ported "split" for DOS.

2007-11-16, 14:17   #9
R.D. Silverman

Nov 2003

26×113 Posts

Quote:
 Originally Posted by Xyzzy From above: Code:  -l, --lines=NUMBER put NUMBER lines per output file We're sure somebody has ported "split" for DOS.
Yep. It is part of the MKS toolkit; a pretty good collection of
Unix tools that run under DOS. I use them all the time. head, tail,
split, sort, uniq, chmod, ps, etc. etc. etc.

 2007-11-16, 16:00 #10 smh     "Sander" Oct 2002 52.345322,5.52471 29·41 Posts See http://sourceforge.net/project/showf...?group_id=9328 for a collection of unix utilities. Just put the files in the wbin directory somewhere in your system path and you can use them in any directory from a cmd window
2007-11-16, 16:33   #11
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

186116 Posts

Quote:
 Originally Posted by Xyzzy From above: Code:  -l, --lines=NUMBER put NUMBER lines per output file We're sure somebody has ported "split" for DOS.
You can find a similar application for the Windows command line (and presumably DOS too) here:
http://www.fourmilab.ch/splits/
It's less flexible than the Linux one you listed the man output for, but it does the trick as long as you just need to split up a file into even-kilobyte chunks.

However, the one that smh listed is probably a more true port of the Linux one, so it would probably be more flexible.

 Similar Threads Thread Thread Starter Forum Replies Last Post Xyzzy Forum Feedback 3 2018-12-30 19:37 jasong Lounge 71 2013-10-15 03:39 ewmayer Soap Box 11 2013-06-06 06:15 SPWorley Math 3 2009-08-26 03:05 jflin Hardware 8 2007-09-06 08:25

All times are UTC. The time now is 11:42.

Sat Nov 28 11:42:21 UTC 2020 up 79 days, 8:53, 3 users, load averages: 0.95, 1.07, 1.15

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.