![]() |
![]() |
#1 |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
52·211 Posts |
![]()
My cell phone has a feature called T9WORD ... not sure what it means or if it is only Canadian ... but anyway:
This feature is for Text Messaging and uses intelligence to determine what it thinks I am trying to spell. To clarify, before T9WORD if I wanted to spell: "WORD" I would press: 9 once for W 6 three times for O (m-n-o) 7 three times for R (p-q-r) 3 for D --- 8 key strokes total. With T9WORD I press 9-6-7-3 (once each - 4 key strokes). As I press each number it determines what word it could be and "guesses" that. Now if it guesses wrong and I wanted "WORE" instead of "WORD" I just need to press a key labelled NEXT (0 on my phone) to have it change to the next option for the digits pressed. So WORD becomes WORE becomes YORE and back to WORD as I press NEXT repeatedly. If the digits entered do not form a word it recognizes there are other options but that is not relevant to this puzzle. Anyway to the puzzle portion: What digit sequence produces the longest NEXT-CHAIN, i.e. the most possible words. My example above had a chain of 3 words. I just made up this puzzle so honestly I don't have the answer but I thought collectively we could find it. If you prefer we can also find the longest chain for each number of digits from 2 to n. I don't know what dictionary my phone uses so for the sake of this puzzle let me just suggest we use the Webster dictionary: http://www.webster.com Now you're probably thinking: Do you accept slang words, capatilized, abbreviations, etc.? There is no money involved, only pride, so I don't want to get into big debates about whether a word is acceptable or not. If you can find it in the dictionary, it's good enough for me. |
![]() |
![]() |
![]() |
#2 |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
52×211 Posts |
![]()
in case you don't have a phone / cell phone.
The digit 2 represents a,b, and c 3 = d,e,f 4 = g,h,i 5 = j,k,l 6 = m,n,o 7 = p,q,r,s 8 = t,u,v 9 = w,x,y,z |
![]() |
![]() |
![]() |
#3 | |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
23·1,361 Posts |
![]() Quote:
As to the puzzle. I am guessing it should be around 4 letters long, unless it ends in 3-3 (ed, de, fe, ee) or 3-3-3. Last fiddled with by Uncwilly on 2007-11-30 at 23:08 |
|
![]() |
![]() |
![]() |
#4 |
"Jason Goatcher"
Mar 2005
DB316 Posts |
![]()
If the feature is trainable, then it would simply be a matter of taking the time to train it. I remember a program I had on my Atari which worked as a spellchecker. it started out with about 100 words and you trained it by telling it whether or not a word it flagged was correct or not. Cool idea, but the first few documents you gave it took more time to spellcheck than they did to type.
Kind of reminds me of the whole concept of computer programmers being intelligently lazy people. A little work at the beginning to cut down on work later. |
![]() |
![]() |
![]() |
#5 |
Oct 2007
Manchester, UK
1,373 Posts |
![]()
This sounds like something that would be perfect for a regular expression or 153.4 million. Generate a list of numbers up to say 9 (have to go up to 2, up to 3, up to 4 etc. all individually) digits long (with leading zeroes) in base 8 (0-7 === 2-9), then from those create a list of regular expressions, then run them all on a wordlist.
Eg. 012 (=== 234) would become the reg exp /^[a|b|c][d|e|f][g|h|i]$/i. 000000012 (=== 222222234) would become the reg exp /^[a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][d|e|f][g|h|i]$/i. An alternative to creating a larger number of reg exps would be to make them more complicated, but that would drastically increase the time to run a test. Eg. 012345670 (=== 234567892) would become the reg exp /^[a|b|c]([d|e|f]([g|h|i]([j|k|l]([m|n|o]([p|q|r|s]([t|u|v]([w|x|y|z]([a|b|c])?)?)?)?)?)?)?)?$/i The major downside approaching the problem this way is that running _millions_ of regular expressions on a word list that contains 216,555 words (I don't know how many are in the webster list, so I'm going off SOWPODS), will take a REALLY long time and require running a splash over 33.2 trillion tests. Not to mention that longer reg exps take longer to run. The process could be sped significantly by separating out all of the two, three, four etc. letter words and only running the appropriate reg exps on them. Code:
One letters: 8 RE combinations, don't know if there are any 1 letter words in the list, I can only think of "a", "I", and maybe "O". Two letters: 64 RE combinations, 121 words, 7,744 operations. Three letters: 512 RE combinations, 1,229 words, 629,248 operations. Four letters: 4,096 RE combinations, 5,155 words, 21,114,880 operations. Five letters: 32,768 RE combinations, 11,812 words, 387055616 operations. Six letters: 262,144 RE combinations, 20,964 words, 5495586816 operations. Seven letters: 2,097,152 RE combinations, 31,229 words, 65491959808 operations. Eight letters: 16,777,216 RE combinations, 38,043 words, 638255628288 operations. Nine letters: 134,217,728 RE combinations, 36,847 words, 4945520623616 operations. At least the memory usage could be kept small by only keeping a list of the top 10 reg exps. I should mention that I haven't really thought this through, so my numbers may well be wrong, and there are no doubt additional ways to speed up the search. However, I decided to throw together some JavaScript to see how quickly it's possible to run tests, here's the output from it. Code:
To generate 64 regular expressions and check 124 words it took: 0 hours 0 mins 0.015 secs First: /^[m|n|o][m|n|o]$/i with 6 matches. Second: /^[a|b|c][g|h|i]$/i with 5 matches. Third: /^[d|e|f][d|e|f]$/i with 5 matches. Fourth: /^[g|h|i][m|n|o]$/i with 5 matches. Fifth: /^[m|n|o][d|e|f]$/i with 5 matches. Since I didn't spend much time on the code, it's pretty unoptimised. I might tidy it up at some point and post it on the forum if anyone is interested. Though if you seriously want to stand a chance of answering your question, I think C would be the way to go, JavaScript isn't going to set any speed records no matter how optimised. Perhaps an even faster way would be to run through all strings one character at a time and put them in a very multi-dimentional array (every character deeper adds an extra dimention). Eg. words[0] = first char abc words[0][0] = second char abc words[0][0][0] = third char abc In C this method will probably eat up quite a lot of RAM, but might turn out to be pretty speedy. Yet another method might be to split each word into an array of it's characters, then loop through and convert all the characters into the corresponding number on the phone keypad, then rejoin the array. At that point it would be a matter of comparing numbers to find the most common, and computers are pretty damn fast when it comes to numbers, especially integers. |
![]() |
![]() |
![]() |
#6 | |
Jun 2003
The Texas Hill Country
32·112 Posts |
![]() Quote:
Given a flat file of all the words, all of the remaining operations can be performed with simple Unix commands. |
|
![]() |
![]() |
![]() |
#7 |
Oct 2007
Manchester, UK
1,373 Posts |
![]()
It's possible to download the dictionary from this page on Scrabulous, it's just a simple word list, one word to a line.
|
![]() |
![]() |
![]() |
#8 |
Jun 2003
The Texas Hill Country
32×112 Posts |
![]()
The Scrabulous dictionary has 1/4 million words.
Code:
M21:Warehouse rkw$ wc -l ~/Downloads/sowpods.txt 267751 /Volumes/Home/Users/rkw/Downloads/sowpods.txt Code:
M21:Warehouse rkw$ cat ~/Downloads/sowpods.txt | dos2unix | awk '/[A-Z]*/ { print tolower($1), $1 }' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ 22233344455566677778889999 | sort -k 2 | uniq -c -f 1 | sort -r -k 1 | head -n 10 14 paper 72737 13 pomp 7667 13 acres 22737 12 purer 78737 12 popes 76737 12 poker 76537 12 pawer 72937 12 pater 72837 12 gomer 46637 11 pomes 76637 M21:Warehouse rkw$ cat ~/Downloads/sowpods.txt | dos2unix | awk '/[A-Z]*/ { print tolower($1), $1 }' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ 22233344455566677778889999 | sort -k 2 | grep " 72737$" paper 72737 papes 72737 pards 72737 parer 72737 pares 72737 pases 72737 raper 72737 rapes 72737 rarer 72737 rares 72737 raser 72737 rases 72737 sards 72737 saser 72737 |
![]() |
![]() |
![]() |
#9 |
Oct 2007
Manchester, UK
25358 Posts |
![]()
Heh, I have no idea what those commands mean, but WOW that code is so much shorter than what I came up with.
There's a very odd thing when I run my code, Firefox uses twice the RAM and takes 20 times as long to run it than Internet Exploder. Ah well, I guess even IE has to have something going for it. Anyway, here's what mine outputs: Code:
To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.531 secs To determine the top 5: 0 hours 0 mins 3.406 secs First: 72737 with 14 words. PAPER, PAPES, PARDS, PARER, PARES, PASES, RAPER, RAPES, RARER, RARES, RASER, RASES, SARDS, SASER Second: 22737 with 13 words. ACRES, BARDS, BARER, BARES, BARFS, BASER, BASES, CAPER, CAPES, CARDS, CARER, CARES, CASES Third: 7667 with 13 words. POMP, POMS, PONS, POOP, POOR, POOS, ROMP, ROMS, ROOP, ROOS, SOMS, SONS, SOOP Fourth: 46637 with 12 words. GOMER, GONER, GOODS, GOOFS, HOMER, HOMES, HONDS, HONER, HONES , HOODS, HOOFS, INNER Fifth: 72837 with 12 words. PATER, PATES, PAVER, PAVES, RATER, RATES, RAVER, RAVES, SATES, SAVER, SAVES, SCUDS Code:
To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.453 secs To determine the top 5 by length (minimum 2 words): 0 hours 0 mins 3.360 secs First: 232735377637737 with 2 words and 16 characters. BEARDLESSNESSES, CEASELESSNESSES Second: 247666843727437 with 2 words and 16 characters. CHROMOTHERAPIES, CHRONOTHERAPIES Third: 268476767462427 with 2 words and 16 characters. ANTHROPOPHOBIAS, ANTHROPOPHOBICS Fourth: 333328483637737 with 2 words and 16 characters. DEFECTIVENESSES, EFFECTIVENESSES Fifth: 333378372362437 with 2 words and 16 characters. DEFERVESCENCIES, EFFERVESCENCIES Code:
To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.469 secs To determine the top 5 by length: 0 hours 0 mins 3.281 secs First: 42774637737 with 4 words and 12 characters. GASPINESSES, GASSINESSES, HAPPINESSES, HARSHNESSES Second: 78664637737 with 4 words and 12 characters. RUMMINESSES, RUNNINESSES, STONINESSES, SUNNINESSES Third: 25663637737 with 3 words and 12 characters. ALONENESSES, ALOOFNESSES, BLONDNESSES Fourth: 28886646537 with 3 words and 12 characters. BUTTONHOLDS, BUTTONHOLER, BUTTONHOLES Fifth: 68372665464 with 3 words and 12 characters. OVERBOOKING, OVERCOOKING, OVERCOOLING Code:
74335464 with 7 words and 9 characters. PIDDLING, PIFFLING, RIDDLING, RIFFLING, SHEELING, SIDELING, SIFFLING 7277464 with 10 words and 8 characters. PAPPING, PARPING, PARRING, PARSING, PASSING, RAPPING, RAPPINI, RASPING, SAPPING, SASSING 2666437 with 9 words and 8 characters. ANOMIES, BOMMIES, BONNIER, BONNIES, BOOMIER, BOONIES, COMMIES, CONOIDS, COOMIER |
![]() |
![]() |
![]() |
#10 |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
52·211 Posts |
![]()
I'm impressed how quickly the results have come in and how thorough they are!!!
By the way my phone apparently does not have that elaborate a dictionary. For example: When I enter 72737 I only get 4 of the 14: paper, rapes, rarer, rares 22737 gives me 7 of 13 (better): cases, cards, bases, acres, cares, bards, bares |
![]() |
![]() |
![]() |
#11 |
Oct 2007
Manchester, UK
1,373 Posts |
![]()
Yeah, SOWPODS is a word list that integrates lots and LOTS of words, many of which are bizarre and unusual, but nevertheless valid. Since texts are mainly used for quick and short messages the phone probably only has the most common words in it's list.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
running OS in text mode | wildrabbitt | Information & Answers | 10 | 2014-12-14 06:50 |
Output text in data format? | TheMawn | PrimeNet | 47 | 2014-08-31 16:08 |
Intelligent Text Messaging - Part II | petrw1 | Puzzles | 4 | 2007-12-24 23:01 |
Text books... | Xyzzy | Soap Box | 9 | 2007-07-10 17:09 |
A text problem | akruppa | Puzzles | 13 | 2005-12-30 20:44 |