LJ Archive

Work the Shell

Where's That Pesky Hidden Word?

Dave Taylor

Issue #252, April 2015

Word search—Dave tackles the complex task of writing a script to generate word searches from a list of words. Doable? We'll see.

I've been promising my 11-year-old for a long time now that I'd write a program that would let you build custom word searches based on a list of words given by the user. I wrote one years and years ago in C, but since I can't find that code any more and wanted to tackle another interesting project for this column, that's what I'm going to look at herein.

There aren't any well-established algorithms around word searches, so it's also a good opportunity to come up with something from scratch, which means that as with the most interesting projects, we'll actually need to think about the program before we start coding. It's good discipline!

I also should admit that I've loved solving word search puzzles since I was a young pup, so there's an element of enjoyment for me in this project too, not just the pleasure of doing something for my daughter!

In case you've never seen a word search, it's typically a grid of letters, and within that is a set of listed words that can be found running horizontally, vertically or diagonally, spelled forward or backward. Figure 1 shows a tiny example.

Figure 1. Word Search Example

Looking Figure 1, how many words can you find? I can find CAT, DOG, GOD (which is, of course, DOG backward), TIC, COP and, more nerdy, ROM and ARG too. And this brings up one interesting characteristic of word searches: there always are incidental words that can appear, so one important task for the resultant program is to ensure that no obscenities appear.

Upon first glance at a word search, it seems like the way to do it is to populate the grid randomly, then flip letters to make the words fit. But, it seems to me that a better strategy is essentially to make a crossword puzzle, then fill in the empty holes with random letters.

So that's going to be our first strategy for building the word search. Each word also will be randomly oriented horizontal, vertical or diagonal, as well as forward or backward. For now, let's just worry about forward or backward, meaning the initial word orientation code will look like this:

orient()
{
   # randomly pick an orientation and 
   # shift the word to match
   local direction;
   word=$1   # to make things neat and tidy
  if [ $(( RANDOM % 2 )) -eq 1 ] ; then
     # we need to reverse the value of $word
     word="$(echo $word | rev )"
   fi
}

Arrays are created by initializing them in the Bash shell, in a format that looks like this:

arrayname=( value1, value2, ... valueN )

And since this is going to be a lot about arrays, let's start by loading up the wordlist as an array, orienting words randomly as we proceed.

First, here's a really easy way to read a file in as an array of word values:

wordlist=( $(cat $1) )

Easy enough, but now let's step through the array word by word to reverse any that are randomly selected by the orient() function:

count=0
while [ ! -z "${wordlist[count]}" ]
do
   orient ${wordlist[count]};
   wordlist[$count]=$word
   echo "word $count = ${wordlist[count]}"
   count=$(( $count + 1 ))
done

With just this snippet and a word file that contains “cat”, “dog” and “car”, a single invocation looks like this:

$ sh wordsearch.sh wordlist.txt 
word 0 = cat
word 1 = god
word 2 = rac

That's a reasonable enough start. We now can read in a wordlist file and randomly reverse individual words as we go. Now, let's create a grid array and try inserting the words one by one.

And here's a wrinkle associated with the Bash shell: although it supports arrays, it doesn't support multidimensional arrays, which is rather a pain in the booty. So to have a 5x5 grid, we'll need five arrays of five elements. To start, let's initialize them at the beginning of the script:

row1=( "-" "-" "-" "-" "-" )
row2=( "-" "-" "-" "-" "-" )
row3=( "-" "-" "-" "-" "-" )
row4=( "-" "-" "-" "-" "-" )
row5=( "-" "-" "-" "-" "-" )

Then, further down, a simple function will allow us to print out the grid in an attractive format:

showgrid()
{
   echo "${row1[0]}  ${row1[1]}  ${row1[2]}  ${row1[3]}  
         ${row1[4]}"
   echo "${row2[0]}  ${row2[1]}  ${row2[2]}  ${row2[3]}  
         ${row2[4]}"
   echo "${row3[0]}  ${row3[1]}  ${row3[2]}  ${row3[3]}  
         ${row3[4]}"
   echo "${row4[0]}  ${row4[1]}  ${row4[2]}  ${row4[3]}  
         ${row4[4]}"
   echo "${row5[0]}  ${row5[1]}  ${row5[2]}  ${row5[3]}  
         ${row5[4]}"
}

We'll end up rewriting this function down the road to make it more flexible for an N x M size grid, but for now, let's just proceed with 5x5 so we can get into the algorithm itself.

Now the actual work of the script: inserting words into the grid.

Initially, of course, it's easy because we're pretty much guaranteed the word will fit if it's less than five letters long, but as more and more words are put into the grid, it becomes harder to fit each one.

To simplify things, we're going to look at inserting words only horizontally or vertically to start. It turns out that diagonal insertions are a bit more nuanced. That's okay, we'll circle back and add it once we get the basics working.

To start, the function fitword(), given a word (that might already be reversed), randomly picks an orientation and starting location that'll fit, then hands it to a horizontal or vertical insertion function for actual placement testing:

fitword()
{
    # fit word "$1" into the grid with a random orientation
    success=0
    wordlength=$( echo $1 | wc -c )     # always +1
    wordlength=$(( $wordlength -1 ))    #  and now it's fixed
    case $(( $RANDOM % 2 )) in
      0 ) # horizontal
         until [ $success -eq 1 ] ; do
           startpoint=$(( $cols - $wordlength ))
           col=$(( $RANDOM % $startpoint ))
           row=$(( $RANDOM % 5 ))
           Hinsert $1 $col $row
           success=$?                # what does Hinsert return?
         done
         ;;

      1 ) # vertical
         until [ $success -eq 1 ] ; do
           startpoint=$(( $rows - $wordlength ))
           row=$(( $RANDOM % $startpoint ))
           col=$(( $RANDOM % 5 ))
           Vinsert $1 $row $col
           success=$?
         done
         ;;
  esac
}

For now, Hinsert() and Vinsert() can both just return the numeric success value of “1”, so they're super easy to write. But, let's focus on fitword(), because that's where the action's really happening so far.

Consider a quick invocation with our three words into a 5x5 grid:

$ sh wordsearch.sh wordlist.txt 
word 0 = cat
Hinsert called with word cat and startloc 0, 0
word 1 = god
Hinsert called with word god and startloc 0, 0
word 2 = rac
Vinsert called with word rac and startloc 0, 1

A close look reveals that the first two words (the second of which already has been reversed) are going to be placed horizontally, both at the same starting point of 0,0. Clearly that won't work, but we'll come back to it (that's why the insertion statement is in a repeat loop: because there's an element of brute-force insertion we'll need to exploit).

The third word is going to be inserted vertically, and it too already has been reversed, with an attempted first location of row 0, column 1 (which won't work either: “cat” being inserted at 0,0 means that 0,1 will be an “a”).

This is going to be a tricky script, isn't it? Let's dig into it further next month, as I've run out of space here, but in the meantime, start thinking about how you'd address this interesting problem and drop me a note if you have a non-brute-force solution to offer.

Dave Taylor has been hacking shell scripts for more than 30 years—really. He's the author of the popular Wicked Cool Shell Scripts (and just completed a 10th anniversary revision to the book, coming very soon from O'Reilly and NoStarch Press). He can be found on Twitter as @DaveTaylor and more generally at his tech site www.AskDaveTaylor.com.

LJ Archive