Victor Vogelpoel

Excellence is in the details

PowerShell: Compare-Directory.ps1 – comparing file contents and directories with Compare-Object and MD5 hash

4 Comments


In a after-site-deployment scenario, I needed to compare site directories and files on front-ends in a load-balanced web server farm to make sure these sites contained identical files and directory structure. When a hotfix was applied by an administrator, once too often files were not deployed properly and behavior of the site was not consistent because of the differences in file or configuration on front-ends. To diagnose these kind of problems more quickly, I developed PowerShell command Compare-Directory to compare a reference directory with one or more difference directories.

Learning from Compare-Object

When designing Compare-Directory, I took  a good look at PowerShells own Compare-Object, which does a hell of a job comparing two objects:

Compare-Object –ReferenceObject $ref –DifferenceObject $diff –ExcludeDifferent –IncludeEqual `
               –PassThru –CaseSensitive –SyncWindow $sw

After playing with Compare-Object, requirements arose for Compare-Directory:

  • I wanted to support ExludeDifferent, IncludeEqual and PassThru in Compare-Directory as well.
  • I wanted to be able to exclude files in the comparison.
  • I wanted to be able to exclude directories in the comparison.
  • I wanted to recurse the comparison to subdirectories.
  • I wanted to compare a reference directory with one or more difference directories.

The output of Compare-Directory should be similar to Compare-Object.

But how to compare files and directories from the reference directory to the difference directory? If I use Get-ChildItem to gather the files and (sub)folders from the reference directory to compare, the fullpaths are bound to differ from files and (sub)folders in the difference directory. To illustrate, consider these example reference directory “FrontEnd1-Site” and difference directory “FrontEnd2-Site” as a subdirectory in “CompareObjectTest”:

Reference directory                   Difference directory
C:\TestFrontEnd1-Site\bin\site.dll    C:\TestFrontEnd2-Site\bin\site.dll
C:\TestFrontEnd1-Site\help.txt
C:\TestFrontEnd1-Site\index.htm       C:\TestFrontEnd2-Site\index.htm
C:\TestFrontEnd1-Site\web.config      C:\TestFrontEnd2-Site\web.config

Comparing the results of Get-ChildItem “C:\TestFrontEnd1-Site” –recurse with the results of Get-ChildItem “C:\TestFrontEnd2-Site\”–recurse with Compare-Object would yield all as different. So comparing on fullname, name or basename would not produce the desired result.

Comparing files and directories

Comparing two directories should be executed on two aspects, namely directory structure and file content. Beginning with the latter: comparing files is simple: compare file content (without taking file dates in account). The easiest way is calculating an hash for a file and comparing hashes. Enter Get-MD5.

function Get-MD5
{
  [CmdletBinding(SupportsShouldProcess=$false)]
  param
  (
    [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, HelpMessage="file(s) to create hash for")]
    [Alias("File", "Path", "PSPath", "String")]
    [ValidateNotNull()]
    $InputObject
  )

  begin
  {
    $cryptoServiceProvider    = [System.Security.Cryptography.MD5CryptoServiceProvider]
    $hashAlgorithm            = new-object $cryptoServiceProvider
  }

  process
  {
    $hashByteArray = ""

    $item = Get-Item $InputObject -ErrorAction SilentlyContinue
    if ($item -is [System.IO.DirectoryInfo])
    {
      throw "Cannot create hash for directory"
    }

    if ($item)
    {
      $InputObject = $item
    }

    if ($InputObject -is [System.IO.FileInfo])
    {
      $stream         = $null;
      $hashByteArray  = $null

      try
      {
        $stream         = $InputObject.OpenRead();
        $hashByteArray  = $hashAlgorithm.ComputeHash($stream);
      }
      finally
      {
        if ($stream -ne $null)
        {
          $stream.Close();
        }
      }
    }
    else
    {
      $utf8             = new-object -TypeName "System.Text.UTF8Encoding"
      $hashByteArray    = $hashAlgorithm.ComputeHash($utf8.GetBytes($InputObject.ToString()));
    }

    Write-Output ([BitConverter]::ToString($hashByteArray)).Replace("-","")
  }
}

Get-MD5 “C:\TestFrontEnd1-Site\bin\site.dll” would return the MD5 hash for the file:

PS> Get-MD5 "C:\TestFrontEnd1-Site\bin\site.dll"
8FDBD9537FC1D6BA4FD2F1B8944A2686

The hash of a file in the reference directory can now be easily compared with the hash of the file in the difference directory. The first step in the compare process is collecting a files and directories with Get-ChildItem and calculating the hash. And this is why I love PowerShell: I add the hash as a NoteProperty member to the FileInfo object as returned by Get-ChildItem with name “MD5Hash”; note the “Add-Member” that adds NoteProperty “MD5Hash” to the object.

Get-ChildItem -Path $DirectoryPath -Exclude $ExcludeFile -Recurse:$Recurse | foreach {
    $hash = ""
    if (!$_.PSIsContainer) { $hash = Get-MD5 $_    }

    # Added two new properties to the DirectoryInfo/FileInfo objects
    $item = $_ | Add-Member -Name "MD5Hash" -MemberType NoteProperty -Value $hash –PassThru
}

Now how to compare the file and directory structure? We to compare file and directory structures relatively to the base of the reference and difference directory. Illustrative:

Reference directory                   Difference directory
\bin\site.dll                         \bin\site.dll
\help.txt
\index.htm                            \index.htm
\web.config                           \web.config

For this purpose, I need to add another memberRelativeBaseName” to the FileInfo and DirectoryInfo objects returned by Get-ChildItem. A function function called Get-Files retrieves files and folders and adds the two new members:

PS> Get-Files "C:\Test\FrontEnd1-Site" -Recurse | select fullname, RelativeBaseName, MD5Hash `
        | ft -AutoSize

FullName                            RelativeBaseName MD5Hash                        
--------                            ---------------- -------                        
C:\Test\FrontEnd1-Site\bin          \bin\                                           
C:\Test\FrontEnd1-Site\help.txt     \help.txt        9E267E67EDC0AAA2D4DFDEDA885635BB
C:\Test\FrontEnd1-Site\index.htm    \index.htm       B0C46E58F3FDC897F20697FFA7726A0A
C:\Test\FrontEnd1-Site\web.config   \web.config      CC45796B7296B5A2D3EC6DED46626144
C:\Test\FrontEnd1-Site\bin\site.dll \bin\site.dll    4F750B7EFCB6520AE01E01D082D7D476

Now get a similar file/directory set for the difference directory and Compare-Object can be used to compare the two sets on properties RelativeBaseName and MD5Hash!

PS> $referencefiles  = Get-Files "C:\Test\FrontEnd1-Site" –Recurse
$differencefiles = Get-Files "C:\Test\FrontEnd2-Site" –Recurse

Compare-Object –referenceObject $referencefiles –differenceObject $differencefiles `
               -Property RelativeBaseName, MD5Hash

RelativeBaseName MD5Hash                          SideIndicator Item    
---------------- -------                          ------------- ----    
\index.htm       3B47F479A6075169531A4B34DF3154A6 =>            index.htm
\help.txt        9E267E67EDC0AAA2D4DFDEDA885635BB <=            help.txt
\index.htm       B0C46E58F3FDC897F20697FFA7726A0A <=            index.htm

Hey! It looks like file index.htm is different and help.txt can only be found in the reference directory (missing in the difference directory). I’ve included the Compare-Directory.ps1 and sample front-end site directories a ZIP, bundled with this article.

Without further ado, here’s the syntax of Compare-Directory script commandlet:

Compare-Directory [-ReferenceDirectory]
                  [-DifferenceDirectory] <DirectoryInfo[]>
                  [-Recurse]
                  [-ExcludeFile <String[]>]
                  [-ExcludeDirectory <String[]>]
                  [-ExcludeDifferent]
                  [-IncludeEqual]
                  [-PassThru]
                  []

Parameters Excludefile and ExcludeDirectory lists the names of files and directories to exclude from the comparison. Specifying ExcludeDifferent displays only the characteristics of compared objects that are equal; IncludeEqual displays characteristics of files that are equal. By default, only characteristics that differ between the reference and difference files are displayed. PassTru passes the files in comparison, instead of the sideIndicator view (as above).

# Compare-Directory.ps1
# Compare files in one or more directories and return file difference results
# Victor Vogelpoel <victor.vogelpoel@macaw.nl>
# Sept 2013

# Compare-Directory -ReferenceDirectory "C:\Compare-Directory\FrontEnd1-Site"  -DifferenceDirectory "C:\Compare-Directory\FrontEnd2-Site"

function global:Compare-Directory
{
    [CmdletBinding()]
    param
    (
        [Parameter(Mandatory=$true, position=0, ValueFromPipelineByPropertyName=$true, HelpMessage="The reference directory to compare one or more difference directories to.")]
        [System.IO.DirectoryInfo]$ReferenceDirectory,

        [Parameter(Mandatory=$true, position=1, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, HelpMessage="One or more directories to compare to the reference directory.")]
        [System.IO.DirectoryInfo[]]$DifferenceDirectory,

        [Parameter(Mandatory=$false, ValueFromPipelineByPropertyName=$true, HelpMessage="Recurse the directories")]
        [switch]$Recurse,

        [Parameter(Mandatory=$false, ValueFromPipelineByPropertyName=$true, HelpMessage="Files to exclude from the comparison")]
        [String[]]$ExcludeFile,

        [Parameter(Mandatory=$false, ValueFromPipelineByPropertyName=$true, HelpMessage="Directories to exclude from the comparison")]
        [String[]]$ExcludeDirectory,

        [Parameter(Mandatory=$false, ValueFromPipelineByPropertyName=$true, HelpMessage="Displays only the characteristics of compared objects that are equal.")]
        [switch]$ExcludeDifferent,

        [Parameter(Mandatory=$false, ValueFromPipelineByPropertyName=$true, HelpMessage="Displays characteristics of files that are equal. By default, only characteristics that differ between the reference and difference files are displayed.")]
        [switch]$IncludeEqual,

        [Parameter(Mandatory=$false, ValueFromPipelineByPropertyName=$true, HelpMessage="Passes the objects that differed to the pipeline.")]
        [switch]$PassThru
    )

    begin
    {
        function Get-MD5
        {
            [CmdletBinding(SupportsShouldProcess=$false)]
            param
            (
                [Parameter(Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true, HelpMessage="file(s) to create hash for")]
                [Alias("File", "Path", "PSPath", "String")]
                [ValidateNotNull()]
                $InputObject
            )

            begin
            {
                $cryptoServiceProvider    = [System.Security.Cryptography.MD5CryptoServiceProvider]
                $hashAlgorithm             = new-object $cryptoServiceProvider
            }

            process
            {
                $hashByteArray = ""

                $item = Get-Item $InputObject -ErrorAction SilentlyContinue
                if ($item -is [System.IO.DirectoryInfo])    { throw "Cannot create hash for directory" }
                if ($item)                                     { $InputObject = $item }

                if ($InputObject -is [System.IO.FileInfo])
                {
                    $stream         = $null;
                    $hashByteArray    = $null

                    try
                    {
                        $stream                 = $InputObject.OpenRead();
                        $hashByteArray             = $hashAlgorithm.ComputeHash($stream);
                    }
                    finally
                    {
                        if ($stream -ne $null)
                        {
                            $stream.Close();
                        }
                    }
                }
                else
                {
                    $utf8             = new-object -TypeName "System.Text.UTF8Encoding"
                    $hashByteArray     = $hashAlgorithm.ComputeHash($utf8.GetBytes($InputObject.ToString()));
                }

                Write-Output ([BitConverter]::ToString($hashByteArray)).Replace("-","")
            }
        }

        function Get-Files
        {
            [CmdletBinding(SupportsShouldProcess=$false)]
            param
            (
                [string]$DirectoryPath,
                [String[]]$ExcludeFile,
                [String[]]$ExcludeDirectory,
                [switch]$Recurse
            )

            $relativeBasenameIndex = $DirectoryPath.ToString().Length

            # Get the files from the first deploypath
            # and ADD the MD5 hash for the file as a property
            # and ADD a filepath relative to the deploypath as a property
            Get-ChildItem -Path $DirectoryPath -Exclude $ExcludeFile -Recurse:$Recurse | foreach {
                $hash = ""
                if (!$_.PSIsContainer) { $hash = Get-MD5 $_    }

                # Added two new properties to the DirectoryInfo/FileInfo objects
                $item = $_ |
                    Add-Member -Name "MD5Hash" -MemberType NoteProperty -Value $hash -PassThru |
                    Add-Member -Name "RelativeBaseName" -MemberType NoteProperty -Value ($_.FullName.Substring($relativeBasenameIndex)) -PassThru

                # Test for directories and files that need to be excluded because of ExcludeDirectory
                if ($item.PSIsContainer) { $item.RelativeBaseName += "\" }
                if ($ExcludeDirectory | where { $item.RelativeBaseName -like "\$_\*" })
                {
                    Write-Verbose "Ignore item `"$($item.Fullname)`""
                }
                else
                {
                    Write-Verbose "Adding `"$($item.Fullname)`" to result set"
                    Write-Output $item
                }
            }
        }

        $referenceDirectoryFiles = Get-Files -DirectoryPath $referenceDirectory -ExcludeFile $ExcludeFile -ExcludeDirectory $ExcludeDirectory -Recurse:$Recurse
    }

    process
    {
        if ($DifferenceDirectory -and $referenceDirectoryFiles)
        {
            foreach($nextPath in $DifferenceDirectory)
            {
                $nextDifferenceFiles = Get-Files -DirectoryPath $nextpath -ExcludeFile $ExcludeFile -ExcludeDirectory $ExcludeDirectory -Recurse:$Recurse

                ###################################################
                # Compare the contents of the two file/directory arrays and return the results
                $results = @(Compare-Object -ReferenceObject $referenceDirectoryFiles -DifferenceObject $nextDifferenceFiles -ExcludeDifferent:$ExcludeDifferent -IncludeEqual:$IncludeEqual -PassThru:$PassThru -Property RelativeBaseName, MD5Hash)

                if (!$PassThru)
                {
                    foreach ($result in $results)
                    {
                        $path         = $ReferenceDirectory
                        $pathFiles    = $referenceDirectoryFiles
                        if ($result.SideIndicator -eq "=>")
                        {
                            $path         = $nextPath
                            $pathFiles    = $nextDifferenceFiles
                        }

                        # Find the original item in the files array
                        $itemPath = (Join-Path $path $result.RelativeBaseName).ToString().TrimEnd('\')
                        $item = $pathFiles | where { $_.fullName -eq $itemPath }

                        $result | Add-Member -Name "Item" -MemberType NoteProperty -Value $item
                    }
                }

                Write-Output $results
            }
        }
    }

<#
		.SYNOPSIS
			Compares a reference directory with one or more difference directories.

		.DESCRIPTION
			Compare-Directory compares a reference directory with one ore more difference
			directories. Files and directories are compared both on filename and contents
			using a MD5hash.

			Internally, Compare-Object is used to compare the directories. The behavior
			and results of Compare-Directory is similar to Compare-Object.

		.PARAMETER  ReferenceDirectory
			The reference directory to compare one or more difference directories to.

		.PARAMETER  DifferenceDirectory
			One or more directories to compare to the reference directory.

		.PARAMETER Recurse
			Include subdirectories in the comparison.

		.PARAMETER ExcludeFile
			File names to exclude from the comparison.

		.PARAMETER ExcludeDirectory
			Directory names to exclude from the comparison. Directory names are
			relative to the Reference of Difference Directory path

		.PARAMETER ExcludeDifferent
			Displays only the characteristics of compared files that are equal.

		.PARAMETER IncludeEqual
			Displays characteristics of files that are equal. By default, only
			characteristics that differ between the reference and difference files
			are displayed.

		.PARAMETER PassThru
			Passes the objects that differed to the pipeline. By default, this
			cmdlet does not generate any output.

		.EXAMPLE
			Compare-Directory -reference "D:\TEMP\CompareTest\path1" -difference "D:\TEMP\CompareTest\path2" -ExcludeFile "web.config" -recurse

			Compares directories "D:\TEMP\CompareTest\path1" and "D:\TEMP\CompareTest\path2" recursively, excluding "web.config"
			Only differences are shown. Results:

			RelativeBaseName  MD5Hash                          SideIndicator Item
			----------------  -------                          ------------- ----
			bin\site.dll      87A1E6006C2655252042F16CBD7FB41B =>            D:\TEMP\CompareTest\path2\bin\site.dll
			index.html        02BB8A33E1094E547CA41B9E171A267B =>            D:\TEMP\CompareTest\path2\index.html
			index.html        20EE266D1B23BCA649FEC8385E5DA09D <=            D:\TEMP\CompareTest\path1\index.html
			web_2.config      5E6B13B107ED7A921AEBF17F4F8FE7AF <=            D:\TEMP\CompareTest\path1\web_2.config
			bin\site.dll      87A1E6006C2655252042F16CBD7FB41B =>            D:\TEMP\CompareTest\path2\bin\site.dll
			index.html        02BB8A33E1094E547CA41B9E171A267B =>            D:\TEMP\CompareTest\path2\index.html
			index.html        20EE266D1B23BCA649FEC8385E5DA09D <=            D:\TEMP\CompareTest\path1\index.html
			web_2.config      5E6B13B107ED7A921AEBF17F4F8FE7AF <=            D:\TEMP\CompareTest\path1\web_2.config

		.EXAMPLE
			Compare-Directory -reference "D:\TEMP\CompareTest\path1" -difference "D:\TEMP\CompareTest\path2" -ExcludeFile "web.config" -recurse -IncludeEqual

			Compares directories "D:\TEMP\CompareTest\path1" and "D:\TEMP\CompareTest\path2" recursively, excluding "web.config".
			Results include the items that are equal:

			RelativeBaseName    MD5Hash                          SideIndicator Item
			----------------    -------                          ------------- ----
			bin 	                                             ==            D:\TEMP\CompareTest\path1\bin
			bin\site2.dll       98B68D681A8D40FA943D90588E94D1A9 ==            D:\TEMP\CompareTest\path1\bin\site2.dll
			bin\site3.dll       9408C4B29F82260CBBA528342CBAA80F ==            D:\TEMP\CompareTest\path1\bin\site3.dll
			bin\site4.dll       0616E1FBE12D468F611F07768D70C2EE ==            D:\TEMP\CompareTest\path1\bin\site4.dll
			...
			bin\site8.dll       87A1E6006C2655252042F16CBD7FB41B =>            D:\TEMP\CompareTest\path2\bin\site8.dll
			index.html          02BB8A33E1094E547CA41B9E171A267B =>            D:\TEMP\CompareTest\path2\index.html
			index.html          20EE266D1B23BCA649FEC8385E5DA09D <=            D:\TEMP\CompareTest\path1\index.html
			web_2.config        5E6B13B107ED7A921AEBF17F4F8FE7AF <=            D:\TEMP\CompareTest\path1\web_2.config

		.EXAMPLE
			Compare-Directory -reference "D:\TEMP\CompareTest\path1" -difference "D:\TEMP\CompareTest\path2" -ExcludeFile "web.config" -recurse -ExcludeDifference

			Compares directories "D:\TEMP\CompareTest\path1" and "D:\TEMP\CompareTest\path2" recursively, excluding "web.config".
			Results only include the files that are equal; different files are excluded from the results.

		.EXAMPLE
			Compare-Directory -reference "D:\TEMP\CompareTest\path1" -difference "D:\TEMP\CompareTest\path2" -ExcludeFile "web.config" -recurse -Passthru

			Compares directories "D:\TEMP\CompareTest\path1" and "D:\TEMP\CompareTest\path2" recursively, excluding "web.config" and returns NO comparison
			results, but the different files themselves!

			FullName
			--------
			D:\TEMP\CompareTest\path2\bin\site3.dll
			D:\TEMP\CompareTest\path2\index.html
			D:\TEMP\CompareTest\path1\index.html
			D:\TEMP\CompareTest\path1\web_2.config

		.LINK
			Compare-Object
	#>
}

You can also download the PowerShell script Compare-Directory.ps1 at this gist, or download the script and sample bogus files in this archive: Compare-Directory.ZIP.

Well, have fun with Compare-Directory and make sure to let me know how you’re using it.

Advertisements

Author: Victor Vogelpoel

Dad, SharePoint technical specialist, PowerShell architect, photographer and just a guy whose life happens while trying planning it.

4 thoughts on “PowerShell: Compare-Directory.ps1 – comparing file contents and directories with Compare-Object and MD5 hash

  1. Hey Victor, thanks for this. It’s nice piece of work. I just had one question. I’m not seeing any way to specify the exclusion of a file in a directory. In my exclude list I’d like to be explicit about which file with a certain name is excluded in case there are multiple files with the same name in different directories.

    Thanks.

  2. Sean, thank you for your comment.

    File-exclusion is processed by the Get-ChildItem, so if you specify an file name to exclude, Get-ChildItem will exclude every same name file in every subdirectory. Compare-Directory is currently not designed to handle your requirement.

    What you can do is modify Compare-Directory to remove the Get-Files functionality and feed the reference and difference files arrays to the function to work out the differences. You’ll have to figure out how to execude specific files in specific directories. You could use Get-ChildItem to get all files from a directory and remove specific files from the resulting FileInfo array before feeding it to the modified Compare-Directory function… (I would call the function “Compare-Files” by now 😉

  3. Yep. That makes sense. Thanks for the response.

  4. Reblogged this on rsr72 and commented:
    PowerShell compare directories and MD5 hash

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s