Introduction:
This article describes an easy approach to determining
whether or not two files are exactly the same; the purpose of this test being to
determine whether or not a file has been edited or tampered with in any way by
comparing a file against an original. The code and sample application
demonstrate two methods for determining the status of the file.
The approach indicated is recommended by Microsoft and
mention of it was made in Matthew MacDonald's Visual Basic .NET book published
by Microsoft Press; I have found the approach useful in determining whether or
not a file has been altered by comparing that suspect file against the original.
Figure 1. The Sample Application in Use
Getting Started
In order to get started, unzip the included project and
open the solution in the Visual Studio 2005 environment. In the solution
explorer, you should note the following:
Figure 2. Solution Explorer
As you can see, there is only a single form contained
in this Windows application project (frmMain.vb). There were no additional
references or resources added to the project and only the default settings are
necessary to support the code used.
The design of the form is simple, there are two sets of
controls (a text box and a button) used in conjunction of an Open File Dialog to
search for and load two files. One file is the source file, and the second is
the file that will be compared against the source. Two additional buttons are
added to the form and are used to kick off either of the two tests that will be
run against the two selected files. Lastly, there is a button used to terminate
the application:
Figure 3. The Main Form Designer
The Code: Main Form (frmMain.vb)
The main form class includes two imports which are
necessary to support the sample application:
Imports
System.Security.Cryptography
Imports
System.IO
Cryptography exposes the Hash Algorithm class which allows the application to
convert the content of a file stream or byte array into a hash algorithm which
in turn may be used as the basis for a comparison between the target and
selected file. This approach will be sensitive to even the most minor change
(such as removing or adding a single space).
IO is
added to allow for the manipulation of the files themselves.
The
first block of code in the application is used to terminate the application
whenever the user clicks the "Exit" button:
Public
Class frmMain
Private Sub
btnExit_Click(ByVal sender
As System.Object,
ByVal e As
System.EventArgs)
Handles btnExit.Click
Application.Exit()
End
Sub
Following the exit button click event handler, the next two code blocks are used
to handle the click events for the browse buttons used on the form. Since the
two handlers are roughly the same, I will only show one of them here:
Private
Sub btnBrowseSrc_Click(ByVal
sender As System.Object,
ByVal e As
System.EventArgs)
Handles btnBrowseSrc.Click
OpenFileDialog1.Title =
"Open File"
OpenFileDialog1.Filter
= "Files (*.*)|*.*"
If OpenFileDialog1.ShowDialog =
Windows.Forms.DialogResult.Cancel Then
Exit
Sub
End If
Dim sFilePath As
String = OpenFileDialog1.FileName
If System.IO.File.Exists(sFilePath) =
False Then
sFilePath =
""
Exit
Sub
Else
txtSourceFile.Text
= sFilePath
End If
End
Sub
This is
all pretty common, the Open File Dialog is configured to display the title "Open
File" and the filter is set to display all files. If the user selects the cancel
button, the subroutine will exit. When the user selects a file through the
dialog, the subroutine checks to see if the file exists, and if it does, it sets
the text property of the appropriate text box to display the path to the file.
The next
block of code is used to execute the hash algorithm based test of the two
selected files:
Private
Sub btnTest_Click(ByVal
sender As System.Object,
ByVal e As
System.EventArgs)
Handles btnTest.Click
Dim myHash As
HashAlgorithm
myHash =
HashAlgorithm.Create()
If txtTestFile.Text =
String.Empty Or
Me.txtSourceFile.Text =
String.Empty
Then
MessageBox.Show("Set
all form fields prior to initiating a test", _
"Missing
Form Data", MessageBoxButtons.OK)
End If
Dim fs1 As
New FileStream(txtTestFile.Text,
FileMode.OpenOrCreate)
Dim fs1Bytes As
Byte() = New
Byte(fs1.Length) {}
fs1.Read(fs1Bytes, 0,
fs1.Length)
Dim arr1() As
Byte = myHash.ComputeHash(fs1Bytes)
fs1.Close()
Dim
fs2 As New
FileStream(txtSourceFile.Text, FileMode.OpenOrCreate)
Dim fs2Bytes As
Byte() = New
Byte(fs2.Length) {}
fs2.Read(fs2Bytes, 0,
fs2.Length)
Dim arr2() As
Byte = myHash.ComputeHash(fs2Bytes)
fs2.Close()
If BitConverter.ToString(arr1) =
BitConverter.ToString(arr2) Then
MessageBox.Show("The
file examined has not been tampered with.",
"Hash
Test
Passed")
'display
comparison
MessageBox.Show("Original
Hash: " & Environment.NewLine &
BitConverter.ToString(arr1) & _
Environment.NewLine
& _
"Test
Hash: " & Environment.NewLine & _
BitConverter.ToString(arr2),
"Hash Test Results")
Else
MessageBox.Show("The
file examined has been tampered with.", "Hash
Test
Failed")
'display
comparison
MessageBox.Show("Original
Hash: " & Environment.NewLine &
BitConverter.ToString(arr1) & _
Environment.NewLine
& _
"Test
Hash: " & Environment.NewLine & _
BitConverter.ToString(arr2),
"Hash Test Results")
End If
End
Sub
The
subroutine starts by creating an instance of the Hash Algorithm class called "myHash".
Next, the subroutine validates that there is text contained in each of the two
text boxes used to contain the paths to the source and test files to be used in
the evaluation.
The next
bit of code is as follows:
Dim
fs1 As New
FileStream(txtTestFile.Text, FileMode.OpenOrCreate)
Dim
fs1Bytes As Byte()
= New Byte(fs1.Length)
{}
fs1.Read(fs1Bytes, 0,
fs1.Length)
Dim
arr1() As Byte
= myHash.ComputeHash(fs1Bytes)
fs1.Close()
This code creates a file stream and passes the path to the
test file and file mode to that file stream object. A byte array is created and
set to the length of the file stream and then populated with the content of the
file stream. A new byte array used to contain value returned from the hash
algorithm's compute hash method is then created and passed the byte array
generated directly from the file stream. Lastly, the file stream is closed. This
same process is then applied to the source file in the next bit of code.
When the hash for each of the files has been generated, the subroutine then uses
the System.BitConverter to compare to the two byte arrays. If the arrays are
identical, the user is informed that the file has not been tampered with or
changed, if they do not match, the user is informed of the mismatch and the two
byte arrays are displayed to the user to confirm the difference between the two
arrays. Any minor change to the files will result in a completely different
hash.
The next subroutine is used to handle the Byte Test button click event; that
code is as follows:
Private
Sub btnByteCompare_Click(ByVal
sender As System.Object,
ByVal e As
System.EventArgs)
Handles btnByteCompare.Click
Dim fs1 As
New FileStream(txtTestFile.Text,
FileMode.OpenOrCreate)
Dim fs1Bytes As
Byte() = New
Byte(fs1.Length) {}
fs1.Read(fs1Bytes, 0,
fs1.Length)
fs1.Close()
Dim fs2 As
New FileStream(txtSourceFile.Text,
FileMode.OpenOrCreate)
Dim fs2Bytes As
Byte() = New
Byte(fs2.Length) {}
fs2.Read(fs2Bytes, 0,
fs2.Length)
fs2.Close()
Dim i As
Integer = 0
For i = 0 To
fs1Bytes.Length - 1
If
Not fs1Bytes(i) = fs2Bytes(i)
Then
MessageBox.Show("The file examined has been
tampered with at position " & _
i.ToString(),
"Byte Test Failed")
Exit Sub
End If
Next
MessageBox.Show("The
file examined has not been tampered with.",
"Byte Test
Passed")
End
Sub
This
subroutine starts out by opening a file stream for each of the two files (source
and test) and converts the content of the two files to byte arrays. Once this is
done, the subroutine executes a loop to do a byte by byte comparison between the
two files. If the files match from beginning to end, the user will be told that
the file has not been tampered with; if the files do not match as any position
in the byte array, the user will be told at what position the first mismatch
occurred.
Testing the Application
To prepare for the test, create a file in notepad, type
some text into it, and save it on the file system. Next, create an exact
duplicate of the file. Use these two files as the source and test files used by
the application.
Build and launch the application and use the browse
buttons to load the two files created per the last paragraph. Once the two files
have been set, click on the "Hash Test" button. You should see this result
displayed:
Figure 4. Hash Test Results for Identical Files
Figure 5. Original and Test Hash Comparison
Dismiss the dialog boxes by clicking OK on each of
them. Now click on the Byte Test button; the results displayed should match this
example:
Figure 6. Byte Test Results for Two Identical Files
Now, open the duplicate file in notepad and edit one
letter in the text. In the example, my text file contained the string shown in
Figure 7. In that string, I replaced the "b" in boat with a "g" to turn boat
into goat. Save the file and repeat the test.
Figure 7. Notepad with Sample Text
When the test is repeated, the results for the hash
test will be as follows:
Figure 8. Hash Test Results after Edit of Test File
Figure 9. Different Hash for Original and Test Files
After Edit of Test File
Figure 10. Byte Test Failure Pointing to Position of Mismatch
As can be seen from the results, the hash returned by
the test file after making a single character change is entirely different from
the original and the mismatch is easily detected by the comparison. Similarly,
when performing the byte array test, the position of failure was easily trapped
by making the byte by byte comparison of the two files. Position 82 in this case
is the position where the "B" in boat was swapped for the "G" in goat.
Summary
This example was intended to show a couple of ways in
which two files may be compared in order to determine whether or not they are
identical. While this example only shows two approaches to testing the files,
there are several variations to the approach that can be applied, for example,
the hash algorithm class ComputeHash method will perform the same operation
directly on the file stream without first converting it to byte array.