So I'm still on a bit of a 'full-text search' kick in Sql Server, and I've been thinking about theoretical ways to speed up a search for a substring across an entire table (and multiple columns therein).

As you can see in the previous post, right now I'm simply doing a 'LIKE' comparison on the necessary fields. This works ok on small tables, but as the number of rows increases it's gonna get slow.

So theoretically I could create my own makeshift full-text index, using a custom table. Say a table that listed every individual word, and which row that word appeared in. Something like:

Words (WordID, WordText, ProductID)

Ok - that's not very efficient. We'd need to normalize that a bit. So we can have every unique word, and a second table which matches words to rows in the original table.

Products -> ProductWords (ProductID, WordID) -> Words (WordID, WordText)

Then we'd need a trigger on the original Products table, which nuked any references to the modified row in ProductWords, and then split ProductName (and any other columns we want to search in) into words and re-created the necessary Words and ProductWords rows.

So now we can do a 'LIKE' comparison just in the Words table, and use the ProductWords table to link back to the product we matched.

Obviously this method would only be efficient when the number of distinct words in your Products table is less than the number of actual products - otherwise you're still doing a search across the same number of rows.

What do you think? Do you have ideas on how to perform an efficient text search across a table if you don't have access to Sql Server's full-text indices?