I already created a tutorial on “Data Structure” on this link: Data structure: create a Storage class to store duplicate files. Where I explained how to store data (file information) by creating two objects in key value pairs and how to retrieve data when a new entry (file) found.
Let’s create findDuplicate(file, stat, hash)
method by applying the logic mentioned in above link.
First, we’ll modify the reset()
method by adding the two properties, this.files = {}
and this.dupFiles = {}
. So, open the walker.js
file and make the changes as shown below:
reset(){ this.isPaused = false; this.queue = []; this.files = {}; // this line this.dupFiles = {}; // and this. this.filter_dir = () => false; this.filter_file = () => false; }
Next, edit the generateHash(file,stat)
method by adding the this.findDuplicate(file,stat,hash)
before the return keyword, as mentioned in following code:
generateHash(file,stat){ ... this.emit('hash',file,stat,hash); this.debug&&console.log('hash emitted'); this.findDuplicate(file,stat,hash); return; ...
Next, create the findDuplicate
method:
findDuplicate(file,stat,hash){ const ext = path.extname(file), size = stat.size; let hashExist = this.files[hash] ? this.files[hash] : (this.files[hash] = {}), extExist = hashExist[ext] ? hashExist[ext] : (hashExist[ext] = {}), sizeExist = extExist[size]; if (sizeExist === undefined){ extExist[size] = file; return; } let hashDExist = this.dupFiles[hash] ? this.dupFiles[hash] : (this.dupFiles[hash] = {}), extDExist = hashDExist[ext] ? hashDExist[ext] : (hashDExist[ext] = {}), sizeDExist = extDExist[size]; if (sizeDExist === undefined){ let duplicates = [sizeExist,file]; extDExist[size] = duplicates; this.emit('duplicate',duplicates,size,ext,hash); this.debug&&console.log('New duplicate emitted'); return; } sizeDExist.push(file); this.emit('duplicate',[file],size,ext,hash); this.debug&&console.log('duplicate emitted'); return; }
That’s it. The findDuplicate
method received file
, stat
and hash
from the generateHash
method, it stores the received file in this.files
object if there is a similar file already not exist.
Note: A file consider similar or duplicate if its size, file extension and hash are equal to another file’s size, extension and hash.
If the similar file already exist in the this.files
object then we store the existing and received file (wrapped in an array) in this.dupFiles
object and emit the duplicate
event.
If the similar file already exist in this.files
and this.dupFiles
then we store the received file in dupFiles
array and emit the duplicate
event by wrapping the received file in an array.
To know more about the data structure we used for this.dupFiles
and this.files
, visit : Data structure: creating Storage class to store duplicate files. I was created this class for our Duplicate File Finder app, but then I changed my mind and created findDuplicate
method inside the FileWalker
class to simplify the code.
Next, we'll create a child process, walkerHelper.js
, file, which will work like a bridge between walker.js
and renderer.js
to send and receive messages then process these messages to stop, pause or start file walker.