mersenneforum.org  

Go Back   mersenneforum.org > Fun Stuff > Puzzles

Reply
 
Thread Tools
Old 2007-11-30, 21:32   #1
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

52·211 Posts
Default Intelligent Text Messaging...

My cell phone has a feature called T9WORD ... not sure what it means or if it is only Canadian ... but anyway:

This feature is for Text Messaging and uses intelligence to determine what it thinks I am trying to spell.

To clarify, before T9WORD if I wanted to spell: "WORD" I would press:
9 once for W
6 three times for O (m-n-o)
7 three times for R (p-q-r)
3 for D
--- 8 key strokes total.

With T9WORD I press 9-6-7-3 (once each - 4 key strokes). As I press each number it determines what word it could be and "guesses" that. Now if it guesses wrong and I wanted "WORE" instead of "WORD" I just need to press a key labelled NEXT (0 on my phone) to have it change to the next option for the digits pressed. So WORD becomes WORE becomes YORE and back to WORD as I press NEXT repeatedly.

If the digits entered do not form a word it recognizes there are other options but that is not relevant to this puzzle.

Anyway to the puzzle portion:

What digit sequence produces the longest NEXT-CHAIN, i.e. the most possible words.

My example above had a chain of 3 words.

I just made up this puzzle so honestly I don't have the answer but I thought collectively we could find it.

If you prefer we can also find the longest chain for each number of digits from 2 to n.

I don't know what dictionary my phone uses so for the sake of this puzzle let me just suggest we use the Webster dictionary: http://www.webster.com

Now you're probably thinking: Do you accept slang words, capatilized, abbreviations, etc.?

There is no money involved, only pride, so I don't want to get into big debates about whether a word is acceptable or not. If you can find it in the dictionary, it's good enough for me.
petrw1 is offline   Reply With Quote
Old 2007-11-30, 22:20   #2
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

52×211 Posts
Default some clarification ...

in case you don't have a phone / cell phone.

The digit 2 represents a,b, and c
3 = d,e,f
4 = g,h,i
5 = j,k,l
6 = m,n,o
7 = p,q,r,s
8 = t,u,v
9 = w,x,y,z
petrw1 is offline   Reply With Quote
Old 2007-11-30, 23:02   #3
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

23·1,361 Posts
Default

Quote:
Originally Posted by petrw1 View Post
With T9WORD I press 9-6-7-3 (once each - 4 key strokes). As I press each number it determines what word it could be and "guesses" that. Now if it guesses wrong and I wanted "WORE" instead of "WORD" I just need to press a key labelled NEXT (0 on my phone) to have it change to the next option for the digits pressed.
Used to use this feature on my phone, but it took me more time to do it that way. 2 reasons: it doesn't predict common abbrevations, I used many words and names that aren't in the simple dictionary.


As to the puzzle. I am guessing it should be around 4 letters long, unless it ends in 3-3 (ed, de, fe, ee) or 3-3-3.

Last fiddled with by Uncwilly on 2007-11-30 at 23:08
Uncwilly is online now   Reply With Quote
Old 2007-12-01, 05:47   #4
jasong
 
jasong's Avatar
 
"Jason Goatcher"
Mar 2005

DB316 Posts
Default

If the feature is trainable, then it would simply be a matter of taking the time to train it. I remember a program I had on my Atari which worked as a spellchecker. it started out with about 100 words and you trained it by telling it whether or not a word it flagged was correct or not. Cool idea, but the first few documents you gave it took more time to spellcheck than they did to type.

Kind of reminds me of the whole concept of computer programmers being intelligently lazy people. A little work at the beginning to cut down on work later.
jasong is offline   Reply With Quote
Old 2007-12-01, 14:24   #5
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

1,373 Posts
Default

This sounds like something that would be perfect for a regular expression or 153.4 million. Generate a list of numbers up to say 9 (have to go up to 2, up to 3, up to 4 etc. all individually) digits long (with leading zeroes) in base 8 (0-7 === 2-9), then from those create a list of regular expressions, then run them all on a wordlist.

Eg.
012 (=== 234) would become the reg exp /^[a|b|c][d|e|f][g|h|i]$/i.
000000012 (=== 222222234) would become the reg exp /^[a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][d|e|f][g|h|i]$/i.

An alternative to creating a larger number of reg exps would be to make them more complicated, but that would drastically increase the time to run a test. Eg.
012345670 (=== 234567892) would become the reg exp /^[a|b|c]([d|e|f]([g|h|i]([j|k|l]([m|n|o]([p|q|r|s]([t|u|v]([w|x|y|z]([a|b|c])?)?)?)?)?)?)?)?$/i

The major downside approaching the problem this way is that running _millions_ of regular expressions on a word list that contains 216,555 words (I don't know how many are in the webster list, so I'm going off SOWPODS), will take a REALLY long time and require running a splash over 33.2 trillion tests. Not to mention that longer reg exps take longer to run.

The process could be sped significantly by separating out all of the two, three, four etc. letter words and only running the appropriate reg exps on them.
Code:
One letters:	8 RE combinations, don't know if there are any 1 letter words in the list, I can only think of "a", "I", and maybe "O".
Two letters:	64 RE combinations, 121 words, 7,744 operations.
Three letters:	512 RE combinations, 1,229 words, 629,248 operations.
Four letters:	4,096 RE combinations, 5,155 words, 21,114,880 operations.
Five letters:	32,768 RE combinations, 11,812 words, 387055616 operations.
Six letters:	262,144 RE combinations, 20,964 words, 5495586816 operations.
Seven letters:	2,097,152 RE combinations, 31,229 words, 65491959808 operations.
Eight letters:	16,777,216 RE combinations, 38,043 words, 638255628288 operations.
Nine letters:	134,217,728 RE combinations, 36,847 words, 4945520623616 operations.
It seems like it would be feesable to scan up to five letter words in a relatively short time scale, but six letters and beyond is where it would get sticky. All combined, to test up to nine letter words in this way would require a little under 5.655 trillion tests.

At least the memory usage could be kept small by only keeping a list of the top 10 reg exps.

I should mention that I haven't really thought this through, so my numbers may well be wrong, and there are no doubt additional ways to speed up the search. However, I decided to throw together some JavaScript to see how quickly it's possible to run tests, here's the output from it.
Code:
To generate 64 regular expressions and check 124 words it took: 0 hours 0 mins 0.015 secs

First: /^[m|n|o][m|n|o]$/i with 6 matches.
Second: /^[a|b|c][g|h|i]$/i with 5 matches.
Third: /^[d|e|f][d|e|f]$/i with 5 matches.
Fourth: /^[g|h|i][m|n|o]$/i with 5 matches.
Fifth: /^[m|n|o][d|e|f]$/i with 5 matches.
A small number of words and simple regular expressions makes for a very quick completion time, but even generating the regular expressions for 5 letter words takes around 6 seconds. Oh, and though it doesn't output it, the words for the winner are "mm", "mo", "no", "om", "on" and "oo".

Since I didn't spend much time on the code, it's pretty unoptimised. I might tidy it up at some point and post it on the forum if anyone is interested. Though if you seriously want to stand a chance of answering your question, I think C would be the way to go, JavaScript isn't going to set any speed records no matter how optimised.

Perhaps an even faster way would be to run through all strings one character at a time and put them in a very multi-dimentional array (every character deeper adds an extra dimention).

Eg.
words[0] = first char abc
words[0][0] = second char abc
words[0][0][0] = third char abc

In C this method will probably eat up quite a lot of RAM, but might turn out to be pretty speedy.

Yet another method might be to split each word into an array of it's characters, then loop through and convert all the characters into the corresponding number on the phone keypad, then rejoin the array. At that point it would be a matter of comparing numbers to find the most common, and computers are pretty damn fast when it comes to numbers, especially integers.
lavalamp is offline   Reply With Quote
Old 2007-12-01, 16:26   #6
Wacky
 
Wacky's Avatar
 
Jun 2003
The Texas Hill Country

32·112 Posts
Default

Quote:
Originally Posted by lavalamp View Post
Yet another method might be to split each word into an array of it's characters, then loop through and convert all the characters into the corresponding number on the phone keypad, then rejoin the array.
Since the dictionary is of a reasonable size, if you can obtain a listing of its entries, it would be easiest to transform each word into its corresponding code and then count how many entries correspond to each code.

Given a flat file of all the words, all of the remaining operations can be performed with simple Unix commands.
Wacky is offline   Reply With Quote
Old 2007-12-01, 16:30   #7
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

1,373 Posts
Default

It's possible to download the dictionary from this page on Scrabulous, it's just a simple word list, one word to a line.
lavalamp is offline   Reply With Quote
Old 2007-12-01, 17:55   #8
Wacky
 
Wacky's Avatar
 
Jun 2003
The Texas Hill Country

32×112 Posts
Default

The Scrabulous dictionary has 1/4 million words.

Code:
M21:Warehouse rkw$ wc -l  ~/Downloads/sowpods.txt 
  267751 /Volumes/Home/Users/rkw/Downloads/sowpods.txt
and produces these results
Code:
M21:Warehouse rkw$ cat ~/Downloads/sowpods.txt | dos2unix | awk '/[A-Z]*/ { print tolower($1), $1 }' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ 22233344455566677778889999 | sort -k 2 | uniq -c -f 1 | sort -r -k 1 | head -n 10
  14 paper 72737
  13 pomp 7667
  13 acres 22737
  12 purer 78737
  12 popes 76737
  12 poker 76537
  12 pawer 72937
  12 pater 72837
  12 gomer 46637
  11 pomes 76637

M21:Warehouse rkw$ cat ~/Downloads/sowpods.txt | dos2unix | awk '/[A-Z]*/ { print tolower($1), $1 }' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ 22233344455566677778889999 | sort -k 2 | grep " 72737$"
paper 72737
papes 72737
pards 72737
parer 72737
pares 72737
pases 72737
raper 72737
rapes 72737
rarer 72737
rares 72737
raser 72737
rases 72737
sards 72737
saser 72737
Wacky is offline   Reply With Quote
Old 2007-12-01, 19:31   #9
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

25358 Posts
Default

Heh, I have no idea what those commands mean, but WOW that code is so much shorter than what I came up with.

There's a very odd thing when I run my code, Firefox uses twice the RAM and takes 20 times as long to run it than Internet Exploder. Ah well, I guess even IE has to have something going for it.

Anyway, here's what mine outputs:
Code:
To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.531 secs
To determine the top 5: 0 hours 0 mins 3.406 secs

First: 72737 with 14 words.
PAPER, PAPES, PARDS, PARER, PARES, PASES, RAPER, RAPES, RARER, RARES, RASER, RASES, SARDS, SASER

Second: 22737 with 13 words.
ACRES, BARDS, BARER, BARES, BARFS, BASER, BASES, CAPER, CAPES, CARDS, CARER, CARES, CASES

Third: 7667 with 13 words.
POMP, POMS, PONS, POOP, POOR, POOS, ROMP, ROMS, ROOP, ROOS, SOMS, SONS, SOOP

Fourth: 46637 with 12 words.
GOMER, GONER, GOODS, GOOFS, HOMER, HOMES, HONDS, HONER, HONES , HOODS, HOOFS, INNER

Fifth: 72837 with 12 words.
PATER, PATES, PAVER, PAVES, RATER, RATES, RAVER, RAVES, SATES, SAVER, SAVES, SCUDS
And I also modified it to output the top 5 by length with a minimum of two words:
Code:
To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.453 secs
To determine the top 5 by length (minimum 2 words): 0 hours 0 mins 3.360 secs

First: 232735377637737 with 2 words and 16 characters.
BEARDLESSNESSES, CEASELESSNESSES 

Second: 247666843727437 with 2 words and 16 characters.
CHROMOTHERAPIES, CHRONOTHERAPIES 

Third: 268476767462427 with 2 words and 16 characters.
ANTHROPOPHOBIAS, ANTHROPOPHOBICS 

Fourth: 333328483637737 with 2 words and 16 characters.
DEFECTIVENESSES, EFFECTIVENESSES 

Fifth: 333378372362437 with 2 words and 16 characters.
DEFERVESCENCIES, EFFERVESCENCIES
With minimum of three words:
Code:
To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.469 secs
To determine the top 5 by length: 0 hours 0 mins 3.281 secs

First: 42774637737 with 4 words and 12 characters.
GASPINESSES, GASSINESSES, HAPPINESSES, HARSHNESSES 

Second: 78664637737 with 4 words and 12 characters.
RUMMINESSES, RUNNINESSES, STONINESSES, SUNNINESSES 

Third: 25663637737 with 3 words and 12 characters.
ALONENESSES, ALOOFNESSES, BLONDNESSES 

Fourth: 28886646537 with 3 words and 12 characters.
BUTTONHOLDS, BUTTONHOLER, BUTTONHOLES 

Fifth: 68372665464 with 3 words and 12 characters.
OVERBOOKING, OVERCOOKING, OVERCOOLING
And here are some other random "interesting" ones:
Code:
74335464 with 7 words and 9 characters.
PIDDLING, PIFFLING, RIDDLING, RIFFLING, SHEELING, SIDELING, SIFFLING 

7277464 with 10 words and 8 characters.
PAPPING, PARPING, PARRING, PARSING, PASSING, RAPPING, RAPPINI, RASPING, SAPPING, SASSING

2666437 with 9 words and 8 characters.
ANOMIES, BOMMIES, BONNIER, BONNIES, BOOMIER, BOONIES, COMMIES, CONOIDS, COOMIER
lavalamp is offline   Reply With Quote
Old 2007-12-03, 16:24   #10
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

52·211 Posts
Default

I'm impressed how quickly the results have come in and how thorough they are!!!

By the way my phone apparently does not have that elaborate a dictionary.

For example:
When I enter 72737 I only get 4 of the 14:
paper, rapes, rarer, rares

22737 gives me 7 of 13 (better):
cases, cards, bases, acres, cares, bards, bares
petrw1 is offline   Reply With Quote
Old 2007-12-03, 18:02   #11
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

1,373 Posts
Default

Yeah, SOWPODS is a word list that integrates lots and LOTS of words, many of which are bizarre and unusual, but nevertheless valid. Since texts are mainly used for quick and short messages the phone probably only has the most common words in it's list.
lavalamp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
running OS in text mode wildrabbitt Information & Answers 10 2014-12-14 06:50
Output text in data format? TheMawn PrimeNet 47 2014-08-31 16:08
Intelligent Text Messaging - Part II petrw1 Puzzles 4 2007-12-24 23:01
Text books... Xyzzy Soap Box 9 2007-07-10 17:09
A text problem akruppa Puzzles 13 2005-12-30 20:44

All times are UTC. The time now is 00:09.


Sat Jan 28 00:09:25 UTC 2023 up 162 days, 21:37, 0 users, load averages: 1.07, 1.01, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔