We’ll create a data class for our Duplicate File Finder app to store the results of FileWalker class.
Before creating the data class, we need to know, if the received file is the duplicate of another file or not. We consider a file duplicate if the following properties are match to another file:
- File size
- File extension
- File type
/* We'll compare following file properties to find duplicate files */ if (size == anotherFile.size && type == anotherFile.type && ext == anotherFile.ext) Note: it is not the actual code
Lets start creating our data class Storage
to store files :
const path = require('path'); class Storage { constructor () { this.files = {}; this.dupFiles = {}; } }
The Storage class has two properties:
this.files
stores “not duplicate” filesthis.dupFiles
stores duplicate files
Next, we’ll create add method:
add (file, stat, hash){ }
The add method has three parameters: file
, stat
and hash
. Back to our FileWalker class, the hash
event returns us following:
file
The full path of file i.e.D:\>BrainBell\file.txt
stat
Stat object offile
buffer
The 4Kbfile
chunkhash
The 512bit whirlpool hash ofbuffer
The FileWalker class doesn’t provide us the file type. Usually files stores their information, like file type, in their header. We’ll use the received hash
for file type as it is the Hashed buffer of file header.
We’ll create a JavaScript object called “literal object expression”: {hash:{extension:{size:file}}}
. Following is an example that shows how we'll structure the received files:
Example: dupFiles data structure: hash1:{ txt:{ 1024:["D:\>a.txt","D:\>b.txt"] }, php:{ 1024:["D:\>a.php","D:\>b.php"] } } } Example: files data structure { hash1:{ txt:{ 1024:"D:\>a.txt" }, php:{ 1024:"D:\>a.php" } }, hash2:{ txt:{ 1034:"D:\>c.txt", 1029:"D:\>d.txt" } } }
In JavaScript, this object is a data structure that provide a map from names to values, also called Dictionary data structure. We’ll use the following technique to create the above data structure for finding the duplicate files.
The files
object structure is:
files[hash]
Thefiles
object stores uniquehash
files[hash][ext]
Thehash
object stores unique file extensionext
files[hash][ext][size]
Theext
object stores unique file sizesize
files[hash][ext][size] = file
Thesize
object stores a singlefile
for comparison, for example, a file,file1.txt
stored in files array:{hash:{txt:{430:file1.txt}}}
and the other file,file2.txt
, has similar hash, extension and size{hash:{txt:{430:file2.txt}}}
, we consider thefile2.txt
as the duplicate offile1.txt
.
Let’s start creating the add method to implement the above technique for the Storage
class:
add (file, stat, hash){ let ext = path.extname(file), size = stat.size, hashExist = this.files[hash]; if (hashExist === undefined){ this.files[hash] = {}; this.files[hash][ext] = {}; this.files[hash][ext][size] = file; return } }
The add function extract the file extension and its size. Then it retrieves the hash
value from the files
object. If hash
value not exist, crate a new one by providing the file properties: hash
, ext
and size
as keys.
Next, if the hash
already exist in the files array then we retrieve the file extension:
... if (hashExist === undefined){ this.files[hash] = {}; this.files[hash][ext] = {}; this.files[hash][ext][size] = file; return } let extExist = hashExist[ext]; if (extExist === undefined) { hashExist[ext] = {}; hashExist[ext][size] = file; return; } ...
The add method retrieves the extension ext
from the existing hash
and store the remaining information to existing hash
object if the ext
not exist.
Next, If the ext
already exist then we retrieve the file size:
... if (extExist === undefined) { hashExist[ext] = {}; hashExist[ext][size] = file; return; } let sizeExist = extExist[size]; if (sizeExist === undefined){ extExist[size] = file; return; } ...
The add method retrieves the size
value and assign file
as if the size
value not exist.
Now, we consider the received file as duplicate of existing file because it matches all the three properties hashExist
, extExist
and sizeExist
. We’ll add this duplicate file to dupFiles
. Let’s see the dupFiles
object structure:
files[hash]
Thefiles
object stores uniquehash
files[hash][ext]
Thehash
object stores unique file extensionext
files[hash][ext][size]
Theext
object stores unique file sizesize
files[hash][ext][size] = [file]
Unlike thefiles
object the dupFiles’size
object stores multiple similar files, for example:{hash:{txt:{430:file1.txt,430:file2.txt,430:file3.txt}}}
.
Now we’ve the similar file sizeExist
and received file
, we’ll add the sizeExist
file in the dupFiles
if it not already exist in it and also add the received file . Let’s complete the remaining part of the add
method by adding the duplicate files in the dupFiles
:
var hashDExist = dupFiles[hash]; if (hashDExist === undefined){ dupFiles[hash] = {} dupFiles[hash][ext] = {} dupFiles[hash][ext][size] = [sizeExist,file]; return; } var extDExist = dupFiles[ext]; if (extDExist === undefined){ hashDExist[ext] = {} hashDExist[ext][size] = [sizeExist,file]; return; } var sizeDExist = dupFiles[size]; if (sizeDExist === undefined){ extDExist[size] = [sizeExist,file]; return; } sizeDExist.push(file);
That’s it. We’ve created the Storage class, let’s combine the code chunks:
The Storage class
//Storage.js const path = require('path'); module exports = class Storage { constructor () { this.files = {}; this.dupFiles = {}; } add (file, stat, hash){ let ext = path.extname(file), size = stat.size, hashExist = this.files[hash]; if (hashExist === undefined){ this.files[hash] = {}; this.files[hash][ext] = {}; this.files[hash][ext][size] = file; return } let extExist = hashExist[ext]; if (extExist === undefined) { hashExist[ext] = {}; hashExist[ext][size] = file; return; } let sizeExist = extExist[size]; if (sizeExist === undefined){ extExist[size] = file; return; } let hashDExist = this.dupFiles[hash]; if (hashDExist === undefined){ this.dupFiles[hash] = {} this.dupFiles[hash][ext] = {} this.dupFiles[hash][ext][size] = [sizeExist,file]; return; } let extDExist = hashDExist[ext]; if (extDExist === undefined){ hashDExist[ext] = {} hashDExist[ext][size] = [sizeExist,file]; return; } let sizeDExist = extDExist[size]; if (sizeDExist === undefined){ extDExist[size] = [sizeExist,file]; return ; } sizeDExist.push(file); } getDuplicateFiles(){ return dupFiles; } getFiles(){ return files; } }
The complete Storage class code. We’ve added two more method to return files
and dupFiles
objects. In next tutorial, I’ll show you how to use this class in our Duplicate File Finder app, inside the walkerHelper.js
file, and how to display result on app’s user interface.