JavaScript

The "Hash files" method in FileWalker class

In this tutorial we’ll add another method, generateHash, to our FileWalker class. This method will open a file, read the 4Kb file header, generate hash and emit an event containing file hash. We’ll use this hash to compare files with each other to find their duplicate/copies.

Visit How to generate hash from a file or string in Node.js to see the full tutorial on Node.js Crypto module which helps us to make a hash from a string or file.

Open the walker.js, which we created on Creating pause and resume-able file walker, and add crypto = require('crypto') as mentioned in following code with bold font to include crypto module:

//walker.js
const fs = require('fs'),
crypto = require('crypto'),
path = require('path'),
{EventEmitter} = require('events');

constructor (debug)...
...

Next, edit the start method and add this.generateHash(file, stat); after the this.emit('file',entry,stat); line to invoke the generateHash method as shown below:

  if (stat.isFile()){
   if (this.filter_file(entry,stat)){
    this.debug&&console.log('filterFile: '+entry);
    return this.next();
   }
   this.debug&&console.log('File: '+entry);
   this.emit('file',entry,stat);
   this.generateHash(entry,stat);
   this.next();
  }

Now, we’ll create the generateHash method, as shown below:

generateHash(file,stat){
 const defaultLength = 4200,
 len = stat.size < defaultLength 
       ? stat.size
       : defaultLength,
 pos = 0, offset =0;

 fs.open(file, 'r', (err, fd) => {
  if (err) {
   this.emit('error',err,file,stat);
   this.debug&&console.log(err);
   return;
  }

  const buffer = Buffer.alloc(len);
  fs.read(fd, buffer, offset, len, pos,
         (err, bytesRead, buffer) => {
   if (err){ 
    this.emit('error',err,file,stat);
    this.debug&&console.log(err);
    return;
   }
   fs.close(fd, (err) => {
    if (err){
     this.emit('error',err,file,stat);
     this.debug&&console.log(err);
     return;
    }
   });
   
   const hash = crypto
             .createHash('whirlpool')
             .update(buffer)
             .digest('hex');
   
   this.emit('hash',file,stat,buffer,hash);
   this.debug&&console.log('hash emitted');
   return;
  });
 })
}

The defaultLength = 4200 shows we’ll read the first 4200 bytes from a file or entire file if the file is small in size, see code: len = stat.size < defaultLength ? stat.size : defaultLength. The rest of code is already described on following pages:

Next, we’ll modify the constructor method by moving all properties to new method except super() and debug = debug ? true : false :

constructor (entry,debug){
 super();
 this.isPaused = false;
 this.queue = [];
 this.debug = debug ? true : false;
 this.filter_dir = () => false;
 this.filter_file = () => false;
 this.start(entry); 
}

The modified constructor has only one parameter and it not accept an entry:

constructor (debug){
 super();
 this.debug = debug ? true : false;
 this.reset(debug);
}

We’ll create the deleted properties inside the reset() method, it is useful when we externally need to reset the FileWalker class:

reset(){
 this.isPaused = false;
 this.queue = [];
 this.filter_dir = () => false;
 this.filter_file = () => false;
}

Next, we create a new method to accept an entry to scan a directory:

addToQueue(entry){
 Array.isArray(entry) 
  ? Array.prototype.push.apply (this.queue, entry)
  : this.queue.push (entry)
}

The addToQueue method accepts a string or array and add the received entry into the queue.

Now how we’ll use this FileWalker class:

//walkerHelper.js
const walker = new walkerClass();
walker.addToQueue(['/a/path','/b/path']);
walker.next();

Our FileWalker class has been almost completed. Click here to download the new FileWalker class.

Next we’ll create a walkerHelper.js file which runs as a child process to separate the GUI thread (Electron renderer process) from the fs extensive processing. And then we create a Storage class to store files, hashes and stats received by FileWalker class and use these information to find duplicate files.