Victor Vogelpoel

Excellence is in the details

PowerShell: Measure-ScriptCode (calculating script code metrics)

Leave a comment


In my current project for a client, me and my team are building an hefty framework in PowerShell 3. I was wondering how many lines of code and other metrics the framework script code would span. I asked Dutch PowerShell MVP Jeff Wouters (http://jeffwouters.nl) if he knew a tool for calculating PowerShell code metrics, but he didn’t know any. Well, I spent a little time in my Christmas holiday to come up with a tool of my own: Measure-ScriptCode. It was surprisingly easy to calculate metrics.

The Metrics

First of all, these are the things I would like Measure-ScriptCode to calculate:

  • Number of Lines of Code,
  • Number of Comments, while multi-line comments are counted as one,
  • Number of Functions, filters and workflow functions
  • Number of files, module and manifest files
  • Total number of Lines, words and characters.

For your curiosity, these are the metrics of our project framework:

Files                : 189
Modules              : 8
Manifests            : 8
CodeLines            : 11704
Comments             : 3823
Functions            : 262
Workflows            : 8
Filters              : 1
Characters           : 559350
Lines                : 18773
Words                : 67540

As an expample, script file “Write-HelloWorld.ps1” has the following metrics: (only) 7 lines of code, 8 comments (multi-line comments block counted as one comment), 1 function and a total of 36 lines.

# Write-HelloWorld.ps1
# Sample file for calculating PowerShell script code metrics
# Jan 2014
# If this works, Victor Vogelpoel <victor.vogelpoel@macaw.nl> wrote this.
# If it doesn't, I don't know who wrote this.

Set-PSDebug -Strict
Set-StrictMode -Version Latest

function Write-HelloWorld
{
  # Just write it to the host!    # This line has no code, just comment
  Write-Host "Hello World!"       # A line of comment on a codeline
  Write-Verbose "Hello World!"

<#
.Synopsis
  Writes hello world to the host
.DESCRIPTION
  Long description
.EXAMPLE
  Example of how to use this cmdlet
.EXAMPLE
  Another example of how to use this cmdlet
.INPUTS
  Inputs to this cmdlet (if any)
.OUTPUTS
  Output from this cmdlet (if any)
.NOTES
  General notes
.COMPONENT
  The component this cmdlet belongs to
.ROLE
  The role this cmdlet belongs to
.FUNCTIONALITY
  The functionality that best describes this cmdlet
#>
}

Calculating metrics

At first, I thought I could use regular expressions to extract information from a script file. However, the multi-line comment is a though nut to crack, or nested functions (or even functions in comments) and I seem to need regexp lessons all over again when I start using these expressions again… Jeff already mentioned AST, or Abstract Syntax Tree, and this PowerShell 3 feature made a breeze to calculate the metrics I wanted.

First of all, the abstract syntax tree for a file is created using the next PowerShell language Parser function:

$tokenAst = $null
$parseErrorsAst = $null
# Use the PowerShell 3 file parser to create the scriptblock AST, tokens and error collections
$scriptBlockAst = [System.Management.Automation.Language.Parser]::ParseFile($file, [ref]$tokenAst, [ref]$parseErrorsAst)

The file is parsed and the functions returns a collection of tokens and the Abstract Syntax Tree for the file:

  • The token collection contains an interpretation of all PowerShell ‘words’ and symbols in the file and also includes NewLine token, which I will use later on.
    Comments and even multiline comments is returned as a single token and can easily be counted.
  • The ScriptBlock AST is the actual structure of the script, in abstract terms like ‘CommandAst” or “FunctionDefinitionAst”. I’ll use this to extract ‘function’ information.

Calculating comments

Extracting the comments from the script file couldn’t be simpler:

$comments += @($tokenAst | where { $_.Kind -eq "Comment" } ).Length

Calculating Lines of Code

Calculating the lines of code is just a little more complex than calculating number of comments, thanks to the tokenizer. This is what a line of code is not:

  • an empty line
  • a line with just whitespace
  • a comment line, with or without prefixing whitespace
  • a multiline comment block <# .. #>

Fortunately, the tokenizer returns no whitespace; a line ending is represented with a NewLine token.

For me, a line of code is a (non-comment and non-empty) line with at least one statement, this including a line with just script brackets (for example, in the Write-HelloWorld code the line following “function Write-HelloWorld”). Calculating lines of code in a script actually no more than eliminating all ‘comment’ tokens from the token stream and counting non-adjacent NewLine tokens. If two (or more) NewLine tokens are adjacent, then we found an empty line in the script; if another token is adjacent to the NewLine token, then the line contained a statement and should be counted as a line-of-code.

The next pipeline code only keeps the token ‘kind’, filters out token kind ‘comment‘ and filters out NewLines that are not adjacent to another NewLine. The number of lines of code is the number of NewLines in the resulting pipeline stream, minus 1.

# Calculate the 'lines of code': any line not containing comment or comment-block and not an empty or whitespace line.
# Remove comment tokens from the tokenAst, remove all double newlines and count all the newlines (minus 1)
$prevTokenIsNewline = $false
$codeLines  += @($tokenAst | select -ExpandProperty Kind |  where { $_ -ne "comment" } | where {
                             if ($_ -ne "NewLine" -or (!$prevTokenIsNewline))
                             {
                                 $_
                             }
                             $prevTokenIsNewline = ($_ -eq "NewLine")
                         } | where { $_ -eq "NewLine" }).Length-1

Calculating functions, workflows and filter functions

Extracting the number of functions is quite easy, thanks to the AST: just ask the AST collection for FunctionDefinitionAst (including nested functions):

$functionAst = $scriptBlockAst.FindAll({ $args[0] -is [System.Management.Automation.Language.FunctionDefinitionAst]}, $true)

The number of workflows, functions and filter functions are now easy to separate:

# Count the specific implementation: 'function', 'filter' or 'workflow'
$functions += @($functionAst | where { (!$_.IsFilter) -and (!$_.IsWorkflow) }).Length
$filters   += @($functionAst | where { $_.IsFilter }).Length
$workflows += @($functionAst | where { $_.IsWorkflow }).Length

And now, the Measure-ScriptCode itself:

# Measure-ScriptCode.ps1
# Return metrics about the script and module files
# PowerShell 3 is required as Abstract Syntax Trees are used.
# Jan 2014
# If this works, Victor Vogelpoel <victor.vogelpoel@macaw.nl> wrote this.
# If it doesn't, I don't know who wrote this.

#requires -version 3
Set-PSDebug -Strict
Set-StrictMode -Version Latest

function Measure-ScriptCode
{
  [CmdletBinding()]
  param
  (
    [Parameter(Mandatory=$true, Position=1, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, HelpMessage="One or more PS1 or PSM1 files to calculate code metrics for")]
    [Alias('PSPath', 'Path')]
    [string[]]$ScriptFile
  )

  begin
  {
    $files          = 0
    $modules        = 0
    $manifests      = 0

    $lines          = 0
    $words          = 0
    $characters     = 0
    $codeLines      = 0
    $comments       = 0
    $functions      = 0
    $workflows      = 0
    $filters        = 0
    $parseErrors    = 0
  }

  process
  {
    foreach ($file in $ScriptFile)
    {
      if ($file -like "*.ps1") { $files++ }
      if ($file -like "*.psm1") { $modules++ }
      if ($file -like "*.psd1") { $manifests++ }

      $fileContentsArray  = Get-Content -Path $file

      if ($fileContentsArray)
      {
        # First, measure basic metrics
        $measurement        = $fileContentsArray | Measure-Object -Character -IgnoreWhiteSpace -Word -Line
        $lines              += $measurement.Lines
        $words              += $measurement.Words
        $characters         += $measurement.Characters

        $tokenAst           = $null
        $parseErrorsAst     = $null
        # Use the PowerShell 3 file parser to create the scriptblock AST, tokens and error collections
        $scriptBlockAst     = [System.Management.Automation.Language.Parser]::ParseFile($file, [ref]$tokenAst, [ref]$parseErrorsAst)

        # Get the number of comment lines and comments on the end of a code line
        $comments           += @($tokenAst | where { $_.Kind -eq "Comment" } ).Length

        # Calculate the 'lines of code': any line not containing comment or commentblock and not an empty or whitespace line.
        # Remove comment tokens from the tokenAst, remove all double newlines and count all the newlines (minus 1)
        $prevTokenIsNewline = $false
        $codeLines          += @($tokenAst | select -ExpandProperty Kind |  where { $_ -ne "comment" } | where {
                                 if ($_ -ne "NewLine" -or (!$prevTokenIsNewline))
                                 {
                                   $_
                                 }
                                 $prevTokenIsNewline = ($_ -eq "NewLine")
                               } | where { $_ -eq "NewLine" }).Length-1

        $parseErrors        += @($parseErrorsAst).Length

        if ($scriptBlockAst -ne $null)
        {
          # Find all functions, filters and workflows in the AST, including nested functions
          $functionAst    = $scriptBlockAst.FindAll({ $args[0] -is [System.Management.Automation.Language.FunctionDefinitionAst]}, $true)

          # Count the specific implementation: 'function', 'filter' or 'workflow'
          $functions      += @($functionAst | where { (!$_.IsFilter) -and (!$_.IsWorkflow) }).Length
          $filters        += @($functionAst | where { $_.IsFilter }).Length
          $workflows      += @($functionAst | where { $_.IsWorkflow }).Length
        }
      }
    }
  }

  end
  {
    return [PSCustomObject]@{
         Files                   = $files
         Modules                 = $modules
         Manifests               = $manifests

         CodeLines               = $codeLines
         Comments                = $comments
         Functions               = $functions
         Workflows               = $workflows
         Filters                 = $filters

         ParseErrors             = $parseErrors
         Characters              = $characters
         Lines                   = $lines
         Words                   = $words
    }
  }
}

You can also find the code at my github gist: Measure-ScriptCode and Write-HelloWorld. As a bonus, I added a PowerShell 2 version, which does not use AST, but the PSParser tokenizer instead of PS3 language parser: Measure-ScriptCodePS2.ps1; the price to pay is that the number of functions, filters and workflows is missing.

Advertisements

Author: Victor Vogelpoel

Dad, SharePoint technical specialist, PowerShell architect, photographer and just a guy whose life happens while trying planning it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s