IanG on Tap

Ian Griffiths in Weblog Form (RSS 2.0)

Blog Navigation

April (2018)

(1 item)

August (2014)

(1 item)

July (2014)

(5 items)

April (2014)

(1 item)

March (2014)

(1 item)

January (2014)

(2 items)

November (2013)

(2 items)

July (2013)

(4 items)

April (2013)

(1 item)

February (2013)

(6 items)

September (2011)

(2 items)

November (2010)

(4 items)

September (2010)

(1 item)

August (2010)

(4 items)

July (2010)

(2 items)

September (2009)

(1 item)

June (2009)

(1 item)

April (2009)

(1 item)

November (2008)

(1 item)

October (2008)

(1 item)

September (2008)

(1 item)

July (2008)

(1 item)

June (2008)

(1 item)

May (2008)

(2 items)

April (2008)

(2 items)

March (2008)

(5 items)

January (2008)

(3 items)

December (2007)

(1 item)

November (2007)

(1 item)

October (2007)

(1 item)

September (2007)

(3 items)

August (2007)

(1 item)

July (2007)

(1 item)

June (2007)

(2 items)

May (2007)

(8 items)

April (2007)

(2 items)

March (2007)

(7 items)

February (2007)

(2 items)

January (2007)

(2 items)

November (2006)

(1 item)

October (2006)

(2 items)

September (2006)

(1 item)

June (2006)

(2 items)

May (2006)

(4 items)

April (2006)

(1 item)

March (2006)

(5 items)

January (2006)

(1 item)

December (2005)

(3 items)

November (2005)

(2 items)

October (2005)

(2 items)

September (2005)

(8 items)

August (2005)

(7 items)

June (2005)

(3 items)

May (2005)

(7 items)

April (2005)

(6 items)

March (2005)

(1 item)

February (2005)

(2 items)

January (2005)

(5 items)

December (2004)

(5 items)

November (2004)

(7 items)

October (2004)

(3 items)

September (2004)

(7 items)

August (2004)

(16 items)

July (2004)

(10 items)

June (2004)

(27 items)

May (2004)

(15 items)

April (2004)

(15 items)

March (2004)

(13 items)

February (2004)

(16 items)

January (2004)

(15 items)

Blog Home

RSS 2.0

Writing

Programming C# 5.0

Programming WPF

Other Sites

Interact Software

Learning PowerShell: FINDSTR Equivalent

Saturday 3 June, 2006, 05:58 PM

I've been using Windows PowerShell as my main command prompt now for a while. Good command prompt skills are extremely useful, so I wanted to learn PowerShell because it's a profound improvement over any command line technology I've seen before. And I'm not just talking about the Windows Command Line (which in turn is a heck of a lot better than the DOS prompt. Yes, CMD.EXE is not the DOS prompt, despite popular opinion to the contrary.) I used Linux for years before moving to Windows, and UNIX-style shells now look stone-age next to PowerShell. Using byte streams to pipe one program's output into another looks clunky and perverse once you've seen a convincing alternative.

Yes, I like PowerShell. (Except for the name, obviously. I preferred either of the codenames - Monad and MSH. Sigh.)

Unfortunately, I have a bit of a bad habit: if there's a task I know how to do with the good old Windows command prompt, I've tended to carry on doing things that way rather than looking for a PowerShellish approach.

This laziness is unfortunate, because the new techniques are often more powerful. Moreover, working out the new techniques is a great way to learn the new tool, so I'm passing up opportunities to learn new stuff. Bad Ian. So I'm trying to be a bit more diligent about finding these new approaches. Here's an example of one such learning exercise I recently went through when trying to perform a task from the command line.

The Task

Sometimes I have to use parts of the Win32 API with which I'm unfamiliar, or which I've forgotten how to use. On these occasions I often find it useful to go looking for things in the Windows SDK header files. When you find a function or type definition in a header file, there's often a lot of related information nearby. This can help you discover features more quickly than you will with the documentation. It has been particularly useful lately as I've been investigating new Windows Vista APIs - often the documentation for these is incomplete, because the product is still in beta. Sometimes the header files offer the only clues as to how to get things done.

For example, suppose I was trying to discover more about the new Command Link controls in Windows Vista. I would probably go to the Include directory in the SDK and type this:

findstr /si COMMANDLINK *.h

This example finds all the header files containing the text ‘commandlink'. There are three, it turns out. If I had been relying on the SDK, I'd probably have guessed at CommCtrl.h, but I'm not sure I'd have found vsstyle.h or credentialprovider.h so quickly. This illustrates how this technique can point you in new directions as you try to learn unfamiliar APIs.

Basic FINDSTR in PowerShell

Here's the simple equivalent of a basic FINDSTR:

gci -r -i *.h | select-string COMMANDLINK

This will search all files ending in ".h" and print out lines that contain the text 'COMMANDLINK'. (gci is a standard alias for get-childitem by the way. We're asking it to recurse through the directory structure and include any item matching *.h.) This performs a case-insensitive comparison by default by the way. The main reason I put COMMANDLINK in upper case was to make this parameter look different from the command.

Summarizing Output

That's fine for relatively small number of matches, but sometimes the volume of output can be overwhelming. Sometimes you just want the filenames. You can do this with FINDSTR:

findstr /si COMMANDLINK *.h

This just prints out:

CommCtrl.h
credentialprovider.h
vsstyle.h

We can do the same thing in PowerShell, but it looks a little different. It's not a case of just passing an extra option to one of the commands as it was with FINDSTR. We use a different approach which may seem more complex, but which is actually much more powerful. It will enable us to go places we cannot go with FINDSTR.

Before we look at the solution, it's useful to understand how the PowerShell example shown above is doing something fundamentally different from the FINDSTR equivalent. Although the command prints out the filename, line number, and line content, its output is not actually in string form. Its output consists of Microsoft.PowerShell.Commands.MatchInfo objects. It just happens that the default formatting for a MatchInfo is to print out the filename, line number, and content.

This means we can pipe the objects coming out of select-string into PowerShell's grouping command:

gci -r -i *.h | select-string COMMANDLINK | group Filename

The output looks like this:

Count Name                      Group
----- ----                      -----
    2 CommCtrl.h                {CommCtrl.h, CommCtrl.h}
    7 credentialprovider.h      {credentialprovider.h, credentialprovider.h,...
    5 vsstyle.h                 {vsstyle.h, vsstyle.h, vsstyle.h, vsstyle.h...}

This not only tells us which files contain the string we were looking for, it also tells us how many matches each file contains, which is nice. As far as I know there's no way of doing this with FINDSTR, so we're already ahead. (Obviously it's a bit more verbose, but you could set up a script file to streamline the usage.)

The only slightly unsatisfactory thing is that last column - it's not terribly useful. The simple thing would be to get rid of it. We can do this with a select command specifying just the first two columns:

gci -r -i *.h | select-string COMMANDLINK | group Filename | select Count, Name

This produces:

Count Name
----- ----
    2 CommCtrl.h
    7 credentialprovider.h
    5 vsstyle.h

Oddly, this output is indented almost half way across the screen, whereas the previous command's output was aligned to the left. (Indentation not reproduced here.) I'm not entirely sure why it's doing that, although it only happens when Count is there. It might be a ‘feature' of the preview build I'm running.

Enhancing the Summary

Even though the command above solves the problem I set out to solve, I now want more. Having seen the Group column we just carefully removed, it occurs to me that it was almost useful. The repeated display of the filename wasn't very helpful, but what it we could make the group column select some other attribute? E.g. maybe it could show a list of the line numbers where it found matches.

This is easily achieved. In the previous example, we used select to pick out the columns we wanted. This command also lets us generate new columns by supplying script blocks:

gci -r -i *.h | select-string COMMANDLINK | group Filename | select Count, Name, { $_.Group | foreach { $_.LineNumber} }

I'm not sure if this is the most effective way of doing this - I'm fairly new to PowerShell. It might be that there's some better way of saying “Modify this particular column of my output table in this way please.” But the code above certainly has the right effect. The select statement generates a new column, consisting of a collection generated by iterating through the existing Group column contents and pulling out the line number. The results look like this:

Count Name                       $_.Group | foreach { $_.L
                                 ineNumber}
----- ----                       -------------------------
    2 CommCtrl.h                 {7675, 7676}
    7 credentialprovider.h       {338, 441, 524, 525...}
    5 vsstyle.h                  {59, 60, 113, 122...}

As before it is printing out file names and match counts, but now it's also printing a list of as many of the matching line numbers as will fit.

The truncating of the line number lists to 4 entries is a bit annoying. I initially thought that this was related to the strange formatting with the Count column. But if I leave out the Count, it still only shows a maximum of 4 matches per group, even though it has space to show all of the matches. I suspect this upper limit of 4 items is just a feature of the default rendering of collection-like columns. We could probably work around this by adding more nested script, but I'm not hugely bothered by it. There's a bigger problem I want to fix.

That third column heading is really ugly. It has used the body of my expression, and that makes for a pretty poor kind of label. I really just want 'Line Number'.

It turns out that you supply both an expression and a label. You do this by passing an associative array to select. This array must contain Expression and Name members:

gci -r -i *.h | select-string COMMANDLINK | group Filename | select Count, Name, @{Expression={ $_.Group | foreach { $_.LineNumber} }; Name="Line Numbers"}

(Note: the documentation for PowerShell RC1 suggests that my associative array should be using 'Label' rather than 'Name', but I get an error if I try that. I'm not sure if that's because the documentation is out of date, or because I have an out of date version of PowerShell. I'm running Vista Beta 2, along with Beta 2 WinFX, SDK, etc., and I'm not completely sure whether that's the very latest PowerShell. You might need to change it to get this example running on your system.)

This command produces the following output:

Count Name                       Line Numbers
----- ----                       ------------
    2 CommCtrl.h                 {7675, 7676}
    7 credentialprovider.h       {338, 441, 524, 525...}
    5 vsstyle.h                  {59, 60, 113, 122...}

Much nicer. The only remaining nit is the curious formatting of the Count column - it's still wildly indented on my system. And if I make it the 2nd column, it ends up being ridiculously wide. The documentation suggests that I can fix this by using another associative array for the Count column, adding a Width entry to the array. But that doesn't seem to work on the version I'm running. Ah well. I guess that's beta software for you.

Script File

All that remains is to put the command into a script file so that we don't need to type the thing every time. This requires a slight change:

get-childitem -r -i $args[0] | select-string $args[1] | group-object Filename | select-object Count, Name, @{Expression={ $_.Group | foreach { $_.LineNumber} }; Name="Line Numbers"}

Notice the use of $args – this allows the filename pattern and string to be passed in as arguments to the script. I've also used the full Cmdlet names rather than aliases here, just to avoid any potential ambiguity. If you save this somewhere on your path as quickfind.ps1 and then sign the file with a suitable certificate, you can launch it like so:

quickfind *.h COMMANDLINK

and it will display the summarized results.

What Just Happened?

This was a fairly simple task, and not all that earth-shattering given how effusive I was about PowerShell at the start. But it does give a flavour of what I think is so great. The key is that it's a thoroughly .NET-based environment. When you pipe one command into another, you're connecting up streams of objects. You can do this because everyone agrees on what structured data looks like thanks to the .NET framework's common type system. (And of course it means you can use any object from the .NET framework class libraries anywhere in your scripts. Not that I did that in this example, but you're free to do it.)

This makes life much easier than having to bend your data into and out of textual representations as it flows from one program to another. This means you don't need to resort to a suite of text mangling technologies just to stitch programs together.

Copyright © 2002-2024, Interact Software Ltd. Content by Ian Griffiths. Please direct all Web site inquiries to webmaster@interact-sw.co.uk