LINQ to Text
This week's episode of Hanselminutes was all about LINQ to XML - a feature of the upcoming 3.5 release of the .NET framework and associated compilers. You can read more about it here, if you aren't already all over it.
Scott has talked about the concept of "languages within languages" before, and he mentioned it again in this episode when Carl brought up XPath. The idea is that you have to context switch to code XPath or, say, SQL from C# or VB because it's essentially a whole different language contained within a literal string in your program. Having to change gears half way through writing a method is not a good way to code, and the fact that these "embedded languages" aren't compile-time checked makes your software more open to runtime errors (or worse yet, problems that don't error but cause data corruption in some way).
This is where LINQ comes to the rescue, because most of these mini-languages are query-oriented, and LINQ brings the query syntax into C# or VB. So no more context switching.
This got me thinking about a third query-oriented mini-language: Regular Expressions. We use them to "query" strings - searching for patterns within text. Could there conceivably be a "LINQ to Text" that does away with RegEx? What would the syntax look like?
var emailAddresses = from w in myString.Words() where w.Contains("@") select w;I don't know whether this would make it any easier, but I do know that RegEx is pretty hard. Maybe LINQ to Text could solve that problem.
Comments
# Andrew
4/09/2007 9:20 AM
Strangely enough, there's already some stuff on this out there.
blogs.msdn.com/.../734383.aspx
Hanselman actually tagged that on del.icio.us a week ago :)
# mabster
4/09/2007 9:30 AM
Interesting. Eric's post is about LINQ to text *files*, rather than to pure text in a string. As in, querying a collection of lines. It's not really a replacement for RegEx.
Eric's idea is great for selecting individual lines from a file, but it wouldn't be useful for, say, selecting all the phone numbers matching a certain pattern out of a string (regardless of whether they span lines).