View Single Post
Old 2020-10-04, 22:45   #5
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

83416 Posts
Default

The script just counts the number of lines. A simple
zcat L1510.dat.gz | wc -l
agrees with the webpage. Sometimes the project gets odd garbage in returned files as we use a forgiving validation check. Likely that.

Edit: Doesn't seem to be that. I just did a quick remdups and got
Found 305246658 unique, 88901570 duplicate (22.6% of total), and 57712 bad relations.
Largest dimension used: 1000 of 1000
Average dimension used: 931.4 of 1000
*** Some redundant relations may have been retained (increase DIM)
*** 47650 (quasi-unique) relations were not hashed

So seems like >300M unique ones there.

Last fiddled with by frmky on 2020-10-05 at 04:44
frmky is offline