Skip to main content

Cheating at a word game using SQL

The game

I recently came across this word game called fonetix, where you are given letter pairs, and you have to assemble up to 8-character words out of the tiles.
They have to be valid English words, of course. You're playing against the clock, and supposedly it also increases the difficulty by each level. There is a scoreboard, but you won't exactly get on there no matter how sharp your word game is:
The top one is, fittingly, "stop cheating". Well, you might guess how they achieved those high scores, but I still want to play the game, not just post a fictious high score.
Even so, as on later levels I've found that finding even 4-letter words starts to become difficult.
But that might be only me. So I started wondering, what is my brain missing? I'm a non-native English speaker by the way, so that might be the culprit (and definitely not the hangover).

So, I wanted a way to search all available solutions within the time limit, and still have time left over to enter the longest solution into the game. I'm not aiming for the time bonus - at this time.
That is my cheat.

The wordlist

The first step towards my cheating solution is well, to get an English word list. There are several available throughout the web, mainly for security research purposes, so I trust you wouldn't have any issues finding one. The one I downloaded has about 460 thousand. Upon opening it however, there is an issue I notice right away:
2
1080
&c
10-point
10th
11-point
12-point
16-point
18-point
1st
2,4,5-t
2,4-d
20-point
2D
2nd

So, these will definitely not show up in the game. Normally I would deal with this with a clever unix command (bash on windows FTW), but this is a SQL blog, dammit!

The setup

So, we need to load the words into the database, and clean it up a little bit.
How convenient, I already had a Toad editor open on an Oracle instance, so this example is in Oracle's SQL flavor.
To load the words into the database, first I created a table, and then ran Toad's built in data import wizard to load the records. It's fairly self-explanatory, I won't bore you with the details here.
Here's my table setup:

As you can see at the last part of the script, I created another table using CTAS with just the words that fit my criteria for the game. 

The first filter condition does a regexp matching on the entire word, and it just matches lower and uppercase alphabetical characters, and only words that are between 2 and 8 characters in length (I know, should have set the lower limit at 4, oh well).
The second filter condition filters out words whose lengths are not multiples of 2, since each tile has 2 letters, we're only looking for 4, 6 and 8-character words.
The resulting table has ~96k records, much easier to work with.
Now, we just need a way to quickly search through the wordlist and get to our solutions!

The cheat

I'll post the script first, and the explanation after:

So the way it works: You start the game, and then enter the tiles in a sequential order into the first string. Execute the query, and it gives you the answer, ordered by the longest first, and you have ample time to enter that into the game, and score a lot of points.

Query breakdown

The second named subquery ltr2 (my subquery naming game isn't strong) breaks it up into two-letter tiles, one per row. You could enter them like that manually, but most likely you'd run out of time, so this is a nice little shortcut. (except, you'll see later)

The way it does this is using the magic of hierarchical queries. This is not a true hierarchical query, because the hierarchy criteria is only specified that it's not more than 9 levels deep. (this is a nice way of counting to 9, by the way).

LEVEL is a built-in pseudocolumn that indicates which level of the hierarchy you're currently on. In this case it is used like a simple counter.
From that point it's just a simple SUBSTR to cut up the string into nine bite-sized pieces, so the rest of the SQL can ingest it.

The rest of the SQL does a cartesian product on the 9-row subquery, 4 times, so it ends up with 9^4 records, which is exactly 6561. This will contain all of the possible combinations of the tiles for the longest word (8 characters). However, we might not always have an 8-character word match, so the filter conditions take care of matching 4, 6 or 8-character words.

So this is fine, until you try to execute it:
You don't have that much time!
Luckily, the execution plan reveals where the issue is:

As you can see, the CBO estimates there is only 1 record coming from the hierarchical query, whereas it should be 9! (plan rows 12, 15, 18, 21).
As a result, it does a nested loop join between the tile table and the word table, which given the cardinalities is at least 600 million iterations! Not exactly what nested loops are for, now is it?

Let's give the CBO the information it needs via some more manual work:

This way since it's doing the set operation on 9 atomic queries, it will immediately know that the result is gonna be 9 records, and sure enough,
The execution plan of the "tuned" query:

As you can see, now it performs a concatenation of 3 hash joins, which is still way more effective than the single nested loop join.

There is, however, a way to tune the original query to give the same results:

Throughout my love-hate relationship with this RDBMS I've found that Oracle doesn't ignore the cardinality hint quite as often as it does some other hints. You can see the effect of the hint immediately on the execution plan:

So the moral of the story is, learn your words, kids!

Comments

  1. Lol, every score on the scoreboard was me. I just used JavaScript to send an AJAX POST request. The back-end of the game won't accept any nicknames longer than 15 characters, so if it's above that it defaults to "Micheal, VSauce". And the max score is 69696969, and anything above defaults to that. That's not me being silly, it's the game-maker. I assume the "stop cheating" was posted by whoever made the game, since its score is above the max.

    ReplyDelete

Post a Comment