mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Open Projects > y-cruncher

Reply
 
Thread Tools
Old 2023-02-05, 10:51   #1
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

25·32·5 Posts
Default One trillion digits of Pi download

Around five years ago, I finished a calculation of 1e12 digits of Pi, and wanted to share the results. Back in the days, I had problems getting a Torrent setup to work. With a bit of James Heinrich's guidance, it now works. The Torrent file is attached.

Be aware that this a huge download (around 406 GB)!
Attached Files
File Type: zip 1tdp.zip (509.0 KB, 31 views)
kruoli is online now   Reply With Quote
Old 2023-02-05, 13:06   #2
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

65516 Posts
Default

Quote:
Originally Posted by kruoli View Post
Around five years ago, I finished a calculation of 1e12 digits of Pi,
...
Be aware that this a huge download (around 406 GB)!
That is a pretty bad compression if really only digits are in the file:
Code:
? log(10)/log(256)*10^12/2^30.
%1 = 386.72332825224873934294090483912618086
R. Gerbicz is offline   Reply With Quote
Old 2023-02-05, 14:08   #3
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

25·32·5 Posts
Default

FWIW, I used pigz with the -H flag (only use Huffman compression) and increased the block size dramatically.

Another option would have been to use the digit compressor by mysticial, but that is not a "standard compression".

For Huffman, one should expect a tree with six entries of size 3 and four of size 4. That would give an expected file size of 6e11 * 3/8 + 4e11 * 4/8 = 4.25e11 bytes which would be nearly 396 binary GB. This is of course more than optimal, but also less than the real value likely because of overhead and suboptimal tree choices.

The best option − of course − would be to use the hexadecimal representation and then use two hexadecimal digits per byte. I used this procedure to be practical. Yes, I used around 5 % more data than optimal, but I think I can argue that this is not useless. You can use all standard gz tools like zcat etc. and extracting is fast. Base converting takes much more time and LOTS of memory.
kruoli is online now   Reply With Quote
Old 2023-02-05, 15:26   #4
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

1,621 Posts
Default

Quote:
Originally Posted by kruoli View Post
The best option − of course − would be to use the hexadecimal representation and then use two hexadecimal digits per byte. I used this procedure to be practical.
That is actually much worse:
Code:
? 10^12/2/2^30.
%2 = 465.66128730773925781250000000000000001
A better, close to optimal, and don't need large integers to code 12 digits using 5 bytes, this works since 10^12<256^5, and gives a GB size:
Code:
? 10^12/12*5/2^30.
%10 = 388.05107275644938151041666666666666668
For a real compression problem see:
https://www.spoj.com/problems/MAGIC2/

The first few solvers (including me) have better compression rate than the popular programs are giving.
R. Gerbicz is offline   Reply With Quote
Old 2023-02-05, 15:36   #5
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

1A3A16 Posts
Default

Quote:
Originally Posted by kruoli View Post
The best option − of course − would be to use the hexadecimal representation and then use two hexadecimal digits per byte.
I like your alternative way of expressing "plain binary": two hexadecimal digits per byte.

Last fiddled with by retina on 2023-02-05 at 15:37
retina is online now   Reply With Quote
Old 2023-02-05, 15:43   #6
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

25×32×5 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
That is actually much worse:
Code:
? 10^12/2/2^30.
%2 = 465.66128730773925781250000000000000001
What? Only if I use 1e12 hexadecimal digits. But 1e12 hexadecimal are not equal to 1e12 decimal digits (when looking at entropy). We can strip a lot of them off. We should get to your "optimal" value or nearly to it.
Quote:
Originally Posted by retina View Post
I like your alternative way of expressing "plain binary": two hexadecimal digits per byte.
Yes, exactly because of this. :D
Quote:
Originally Posted by R. Gerbicz View Post
A better, close to optimal, and don't need large integers to code 12 digits using 5 bytes, this works since 10^12<256^5, and gives a GB size:
Code:
? 10^12/12*5/2^30.
%10 = 388.05107275644938151041666666666666668
For a real compression problem see:
https://www.spoj.com/problems/MAGIC2/

The first few solvers (including me) have better compression rate than the popular programs are giving.
Thanks, this would definitely be a solution if someone has to save the last 5 % of data. I also thought of using more than one byte per Huffman token, but this is also non-standard.
kruoli is online now   Reply With Quote
Old 2023-02-05, 16:02   #7
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

31258 Posts
Default

Quote:
Originally Posted by retina View Post
I like your alternative way of expressing "plain binary": two hexadecimal digits per byte.
Yeah, intelligent. And I misread it as two digits per byte.
R. Gerbicz is offline   Reply With Quote
Old 2023-02-07, 15:46   #8
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

314610 Posts
Default

Just out of curiosity, I took the first million decimal and first million hexadecimal digits of pi to see how various compression commands/algorithms worked. Obviously not as good as plain binary.

Source files were 1,000,002 bytes (from All Digits of Pi).

brotli pi_dec_1m.txt => 424825
brotli pi_hex_1m.txt = > 500051

bzip2 pi_dec_1m.txt => 431435
bzip2 pi_hex_1m.txt => 509456

gzip -9 pi_dec_1m.txt => 470449
gzip -9 pi_hex_1m.txt => 569818

lz4 -z pi_dec_1m.txt => 948602
lz4 -z pi_hex_1m.txt => 1000021

cat pi_dec_1m.txt | ~/go/bin/snappy-compress => 851417
cat pi_hex_1m.txt | ~/go/bin/snappy-compress => 1000140

xz -z pi_dec_1m.txt => 437952
xz -z pi_hex_1m.txt => 519644

zip -9 pi_dec_1m.txt => 470593
zip -9 pi_hex_1m.txt => 569962

cat pi_dec_1m.txt | zstd -z => 484361
cat pi_hex_1m.txt | zstd -z => 516778

The conclusion is that algorithms at both ends of the alphabet do better, while those in the middle barely compress if at all.
Mark Rose is offline   Reply With Quote
Old 2023-02-07, 15:49   #9
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

25×32×5 Posts
Default

Compared to pigz -H (437102 bytes), only brotli and bzip2 procduce smaller files.
kruoli is online now   Reply With Quote
Old 2023-02-08, 20:43   #10
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

141416 Posts
Default

Thank you. I'm still downloading it, another day and should be completed.
pinhodecarlos is offline   Reply With Quote
Old 2023-02-20, 00:46   #11
Mysticial
 
Mysticial's Avatar
 
Sep 2016

373 Posts
Default

y-cruncher's .ycd format does 19 decimal digits in 8 bytes. (1.40% overhead)

Not as good as 3 digits / 10 bits (0.34% overhead), but the 8-byte alignment worked better from a coding perspective.

In retrospect, I could've used 40-byte chunks for 320-bit blocks with 3 digits/10 bits as that would achieve both the 1024/1000 efficiency and alignment at the same time. But I made this decision some 10 years ago, and the 1.40% -> 0.34% improvement isn't big enough to justify redesigning the whole thing.
Mysticial is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
31.4 ... 62.8 ... 100 trillion digits of Pi - GWR Mysticial y-cruncher 70 2022-06-17 22:30
operation trillion digits? mersenneNoob Operation Billion Digits 11 2021-06-02 07:37
Google Cloud Compute 31.4 Trillion Digits of Pi Mysticial y-cruncher 30 2019-10-11 14:45
Can't Download Anything ima wana be Software 6 2012-03-14 03:44
Can't download Unregistered Information & Answers 1 2011-07-23 14:30

All times are UTC. The time now is 23:44.


Tue Mar 21 23:44:05 UTC 2023 up 215 days, 21:12, 0 users, load averages: 1.19, 1.03, 0.91

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔