Introduction
In general, If we need to detect changes in file system or directory of files, we generally use file system watcher provided in .NET. However, after learning its side effects, it seems that it is just a suggestive class which does not have any real benefits as such. Another reason not to use file system watcher class is that it generally doesn’t care about file content and it takes care of the file system in general. So I have found hashing a better way.
Background
In this article I will try to answer a common question programmers ask about hashings; i.e., what time it will take to compute hash of files in my directory, what if I have sub folders in parent folder? Will it be fast enough for normal application deployment file structures which have a few Mbs of files? To answer these questions I wrote a small utility and ran it on my file structure with around 45 files having a few Mbs of the size of the whole directory. And the result was fast enough. It took only 50-60 milliseconds to compute hash and the same time it took to validate the hash.
Using the code
Please observe the below code file. I tried computing hash in both MD5 and SHA1 hash algos. Both algorithms take the same time to hash file content. Please note we are here hashing actual file content. If there would be any change in file content, even a new space or a character, the hash of the whole file will be changed. However it is also important to note that any change in file attributes like last file modification time etc. won’t affect hash result.
Hide Shrink Copy Code
- public class DeploymentFile
- {
- public string FilePath
- {
- get;
- set;
- }
- public bool IsFilePathValid
- {
- get;
- set;
- }
- public string HashedValue
- {
- get;
- set;
- }
- public bool IsFileModified
- {
- get;
- set;
- }
-
- public DeploymentFile(string filePath)
- {
- FilePath = filePath;
- IsFilePathValid = true;
- IsFileModified = false;
- if (File.Exists(filePath))
- HashedValue = ComputeHashSHA(filePath);
- else
- IsFilePathValid = false;
- }
-
- public bool IsExist(string FilePath)
- {
- return File.Exists(FilePath);
- }
-
-
-
-
-
-
-
-
-
-
-
-
- public string ComputeHashSHA(string filename)
- {
- using(var sha = SHA1.Create())
- {
- using(var stream = File.OpenRead(filename))
- {
- return (Encoding.Default.GetString(sha.ComputeHash(stream)));
- }
- }
- }
- }
Shown below is the code for the form which displays all controls. You may observe that I am using a stopwatch to measure the time taken for the whole process of computation of the hash.
Important: Please note that if message box appears the stopwatch measures all time while the user clicks and closes the message box. So to measure accurately one may disable the message box.
Hide Shrink Copy Code
- public partialclass FileValidator: Form
- {
- public FileValidator()
- {
- InitializeComponent();
- }
- List < DeploymentFile > DeployList;
- List < DeploymentFile > ValidationList;
- String filePath;
-
-
- #region ComputeHash
- private void ComputeHash_Click(object sender, EventArgs e)
- {
- DeployList = new List < DeploymentFile > ();
- foreach(var item in GetListOfFilesInDeployFolder())
- DeployList.Add(new DeploymentFile(item));
- FilesGrid.DataSource = DeployList;
- }
-
-
- #endregion ComputeHash
-
- #region ValidateFileHash
- private void ValidateHash_Click(object sender, EventArgs e)
- {
-
- Stopwatch stopwatch = new Stopwatch();
-
- stopwatch.Start();
- bool Abort = false;
- List < string > filesList = GetListOfFilesInDeployFolder();
- ValidationList = new List < DeploymentFile > ();
- foreach(var item in DeployList)
- ValidationList.Add(new DeploymentFile(item.FilePath));
-
-
- for (int i = 0; i < ValidationList.Count; i++)
- {
- if (ValidationList.Count != filesList.Count) Abort = true;
- if (ValidationList[i].FilePath != filesList[i]) Abort = true;
- }
-
- if (!Abort && ValidationList.Exists((x) => x.IsFilePathValid == false))
- Abort = true;
-
- if (Abort)
- {
-
- MessageBox.Show("Files/Folder structure changed or modified since last check");
- }
-
- if (!Abort)
- {
- for (int i = 0; i < ValidationList.Count; i++)
- if (ValidationList[i].HashedValue != DeployList[i].HashedValue)
- {
- ValidationList[i].IsFileModified = true;
- Abort = true;
- }
- }
-
- FilesGrid.DataSource = ValidationList;
-
- stopwatch.Stop();
- label1.Text = "Time taken in Validation : " + stopwatch.Elapsed;
-
- }
-
- #endregion
-
-
- private List < string > GetListOfFilesInDeployFolder()
- {
- filePath = textBox1.Text;
- return Directory.GetFiles(@filePath, "*", SearchOption.AllDirectories).ToList();
- }
-
-
- private void FileValidator_Load(object sender, EventArgs e)
- {
- FilesGrid.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.DisplayedCells;
- }
-
-
-
- }
The above screenshot displays the time taken for computation and validation of hash. If there will be some file modification in between the compute hash button click and check for modifications button click, then those modifications will display up in IsFilemodified column. I am also recording the file structure and comparing it with file structure, any change in file path will be shown in IsFilePathValid column.
Points of Interest
It is interesting to find out that SHA1 and
MD5 algorithm takes a similar time for fewer files. If the file count increases and the file size increases MD5 algorithms are more efficient that SHA1. However SHA1 is more trusted in developer circles. I think MD5 is better because we are not really challenging security here, we are more concerned about integrity of file content. Below are shown some of the comparisons of hash algorithms.
Read more articles on MD5: