JavaScript

The “Find Duplicate” method in FileWalker class

The FileWalker class is now able to generate Hash from a file which then use to compare files with each other. To store files and file information (such as, file path, extension, hash and size), we need a data structure.

I already created a tutorial on “Data Structure” on this link: Data structure: create a Storage class to store duplicate files. Where I explained how to store data (file information) by creating two objects in key value pairs and how to retrieve data when a new entry (file) found.

Let’s create findDuplicate(file, stat, hash) method by applying the logic mentioned in above link.

First, we’ll modify the reset() method by adding the two properties, this.files = {} and this.dupFiles = {}. So, open the walker.js file and make the changes as shown below:

reset(){
 this.isPaused = false;
 this.queue = [];
 this.files = {}; // this line
 this.dupFiles = {}; // and this.
 this.filter_dir = () => false;
 this.filter_file = () => false;
}

Next, edit the generateHash(file,stat) method by adding the this.findDuplicate(file,stat,hash) before the return keyword, as mentioned in following code:

generateHash(file,stat){
 ...
 this.emit('hash',file,stat,hash);
 this.debug&&console.log('hash emitted');
 this.findDuplicate(file,stat,hash);
 return;
 ...

Next, create the findDuplicate method:

findDuplicate(file,stat,hash){
 const ext  = path.extname(file),
 size = stat.size;
 
 let hashExist = this.files[hash]
               ? this.files[hash]
              : (this.files[hash] = {}),
 
 extExist = hashExist[ext]
          ? hashExist[ext]
         : (hashExist[ext] = {}),

 sizeExist = extExist[size];
 
 if (sizeExist === undefined){
  extExist[size] = file;
  return;
 }
 
let hashDExist = this.dupFiles[hash]
               ? this.dupFiles[hash]
               : (this.dupFiles[hash] = {}),

 extDExist = hashDExist[ext]
             ? hashDExist[ext]
             : (hashDExist[ext] = {}),
             
 sizeDExist = extDExist[size];
 
 if (sizeDExist === undefined){
  let duplicates = [sizeExist,file];
  extDExist[size] = duplicates;
  this.emit('duplicate',duplicates,size,ext,hash);
  this.debug&&console.log('New duplicate emitted');
  return;
 }
 
 sizeDExist.push(file);
 this.emit('duplicate',[file],size,ext,hash);
 this.debug&&console.log('duplicate emitted');
 return;
}

That’s it. The findDuplicate method received file, stat and hash from the generateHash method, it stores the received file in this.files object if there is a similar file already not exist.

Note: A file consider similar or duplicate if its size, file extension and hash are equal to another file’s size, extension and hash.

If the similar file already exist in the this.files object then we store the existing and received file (wrapped in an array) in this.dupFiles object and emit the duplicate event.

If the similar file already exist in this.files and this.dupFiles then we store the received file in dupFiles array and emit the duplicate event by wrapping the received file in an array.

To know more about the data structure we used for this.dupFiles and this.files, visit : Data structure: creating Storage class to store duplicate files. I was created this class for our Duplicate File Finder app, but then I changed my mind and created findDuplicate method inside the FileWalker class to simplify the code.

Next, we'll create a child process, walkerHelper.js, file, which will work like a bridge between walker.js and renderer.js to send and receive messages then process these messages to stop, pause or start file walker.