mersenneforum.org One trillion digits of Pi download
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

2023-02-05, 10:51   #1
kruoli

"Oliver"
Sep 2017
Porta Westfalica, DE

26408 Posts

Around five years ago, I finished a calculation of 1e12 digits of Pi, and wanted to share the results. Back in the days, I had problems getting a Torrent setup to work. With a bit of James Heinrich's guidance, it now works. The Torrent file is attached.

Be aware that this a huge download (around 406 GB)!
Attached Files
 1tdp.zip (509.0 KB, 31 views)

2023-02-05, 13:06   #2
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

1,621 Posts

Quote:
 Originally Posted by kruoli Around five years ago, I finished a calculation of 1e12 digits of Pi, ... Be aware that this a huge download (around 406 GB)!
That is a pretty bad compression if really only digits are in the file:
Code:
? log(10)/log(256)*10^12/2^30.
%1 = 386.72332825224873934294090483912618086

 2023-02-05, 14:08 #3 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 25×32×5 Posts FWIW, I used pigz with the -H flag (only use Huffman compression) and increased the block size dramatically. Another option would have been to use the digit compressor by mysticial, but that is not a "standard compression". For Huffman, one should expect a tree with six entries of size 3 and four of size 4. That would give an expected file size of 6e11 * 3/8 + 4e11 * 4/8 = 4.25e11 bytes which would be nearly 396 binary GB. This is of course more than optimal, but also less than the real value likely because of overhead and suboptimal tree choices. The best option − of course − would be to use the hexadecimal representation and then use two hexadecimal digits per byte. I used this procedure to be practical. Yes, I used around 5 % more data than optimal, but I think I can argue that this is not useless. You can use all standard gz tools like zcat etc. and extracting is fast. Base converting takes much more time and LOTS of memory.
2023-02-05, 15:26   #4
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

1,621 Posts

Quote:
 Originally Posted by kruoli The best option − of course − would be to use the hexadecimal representation and then use two hexadecimal digits per byte. I used this procedure to be practical.
That is actually much worse:
Code:
? 10^12/2/2^30.
%2 = 465.66128730773925781250000000000000001
A better, close to optimal, and don't need large integers to code 12 digits using 5 bytes, this works since 10^12<256^5, and gives a GB size:
Code:
? 10^12/12*5/2^30.
%10 = 388.05107275644938151041666666666666668
For a real compression problem see:
https://www.spoj.com/problems/MAGIC2/

The first few solvers (including me) have better compression rate than the popular programs are giving.

2023-02-05, 15:36   #5
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

1A3A16 Posts

Quote:
 Originally Posted by kruoli The best option − of course − would be to use the hexadecimal representation and then use two hexadecimal digits per byte.
I like your alternative way of expressing "plain binary": two hexadecimal digits per byte.

Last fiddled with by retina on 2023-02-05 at 15:37

2023-02-05, 15:43   #6
kruoli

"Oliver"
Sep 2017
Porta Westfalica, DE

25·32·5 Posts

Quote:
 Originally Posted by R. Gerbicz That is actually much worse: Code: ? 10^12/2/2^30. %2 = 465.66128730773925781250000000000000001
What? Only if I use 1e12 hexadecimal digits. But 1e12 hexadecimal are not equal to 1e12 decimal digits (when looking at entropy). We can strip a lot of them off. We should get to your "optimal" value or nearly to it.
Quote:
 Originally Posted by retina I like your alternative way of expressing "plain binary": two hexadecimal digits per byte.
Yes, exactly because of this. :D
Quote:
 Originally Posted by R. Gerbicz A better, close to optimal, and don't need large integers to code 12 digits using 5 bytes, this works since 10^12<256^5, and gives a GB size: Code: ? 10^12/12*5/2^30. %10 = 388.05107275644938151041666666666666668 For a real compression problem see: https://www.spoj.com/problems/MAGIC2/ The first few solvers (including me) have better compression rate than the popular programs are giving.
Thanks, this would definitely be a solution if someone has to save the last 5 % of data. I also thought of using more than one byte per Huffman token, but this is also non-standard.

2023-02-05, 16:02   #7
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

1,621 Posts

Quote:
 Originally Posted by retina I like your alternative way of expressing "plain binary": two hexadecimal digits per byte.
Yeah, intelligent. And I misread it as two digits per byte.

 2023-02-07, 15:46 #8 Mark Rose     "/X\(‘-‘)/X\" Jan 2013 2·112·13 Posts Just out of curiosity, I took the first million decimal and first million hexadecimal digits of pi to see how various compression commands/algorithms worked. Obviously not as good as plain binary. Source files were 1,000,002 bytes (from All Digits of Pi). brotli pi_dec_1m.txt => 424825 brotli pi_hex_1m.txt = > 500051 bzip2 pi_dec_1m.txt => 431435 bzip2 pi_hex_1m.txt => 509456 gzip -9 pi_dec_1m.txt => 470449 gzip -9 pi_hex_1m.txt => 569818 lz4 -z pi_dec_1m.txt => 948602 lz4 -z pi_hex_1m.txt => 1000021 cat pi_dec_1m.txt | ~/go/bin/snappy-compress => 851417 cat pi_hex_1m.txt | ~/go/bin/snappy-compress => 1000140 xz -z pi_dec_1m.txt => 437952 xz -z pi_hex_1m.txt => 519644 zip -9 pi_dec_1m.txt => 470593 zip -9 pi_hex_1m.txt => 569962 cat pi_dec_1m.txt | zstd -z => 484361 cat pi_hex_1m.txt | zstd -z => 516778 The conclusion is that algorithms at both ends of the alphabet do better, while those in the middle barely compress if at all.
 2023-02-07, 15:49 #9 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 26408 Posts Compared to pigz -H (437102 bytes), only brotli and bzip2 procduce smaller files.
 2023-02-08, 20:43 #10 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 22×5×257 Posts Thank you. I'm still downloading it, another day and should be completed.
 2023-02-20, 00:46 #11 Mysticial     Sep 2016 373 Posts y-cruncher's .ycd format does 19 decimal digits in 8 bytes. (1.40% overhead) Not as good as 3 digits / 10 bits (0.34% overhead), but the 8-byte alignment worked better from a coding perspective. In retrospect, I could've used 40-byte chunks for 320-bit blocks with 3 digits/10 bits as that would achieve both the 1024/1000 efficiency and alignment at the same time. But I made this decision some 10 years ago, and the 1.40% -> 0.34% improvement isn't big enough to justify redesigning the whole thing.

 Similar Threads Thread Thread Starter Forum Replies Last Post Mysticial y-cruncher 70 2022-06-17 22:30 mersenneNoob Operation Billion Digits 11 2021-06-02 07:37 Mysticial y-cruncher 30 2019-10-11 14:45 ima wana be Software 6 2012-03-14 03:44 Unregistered Information & Answers 1 2011-07-23 14:30

All times are UTC. The time now is 00:06.

Wed Mar 22 00:06:22 UTC 2023 up 215 days, 21:34, 0 users, load averages: 0.93, 0.75, 0.77

Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔