-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
182 lines (131 loc) · 4.67 KB
/
README
File metadata and controls
182 lines (131 loc) · 4.67 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
Hub -> centralized servers
Centralized servers - multiple - interconnected - ( lets say 4-5 )
→ predefined IP address in code ( hardcoded ) -> initiating connection
→ every time at least 2 central server connection ( fault tolerance )
→ central server use hash table for the file name and peer information DHT
Peer connect with central servers
→ peer informs file to be uploaded -- upload -- this will be saved in central server hash table
→ connection alive every 10 mins check
Search
→ contact central server with filename ( search module )
→ file name exact matching
→ multiple files with the same hash stored in a different table ( replication )
Download
→ get file meta and corresponding peers
→ divide the file into chunks and download parts ( from which peer, which chunk ) ( load balancing )
→ each chunk verification by hash matching ( chunk security and validation )
→ complete file hash matching ( file authenticity )
< key, value > -> key = file name, value = peer ip address, timestamp, file hash, size
resolving ip address? -> node unique id?
hub having range of unique ids for peers
-> new hub joining protocol
-> new peer joining protocol
-> unique id or ip address
-> file upload protocol
-> file indexing
-> file download protocol
-> hash verification
-> peer or hub runtime -> connection alive checking protocol
RPC client
Hub:
Uid :: 32 bytes
Port1 :: listens for peer hubs
Port2 :: data servers communication
Cmd :: ./hub -p13000 -p2 4000
Dnode
Uid :: it will be given by any one of the hubs
Port1 :: data port for other data nodes
Port2 :: rpc client port
Upload files
Download files
Cmd :: ./dnode -h ip:port -p1 5000 -p2 6000
RPC client
./rpc_client -dnode ip:port -u file_name
./rpc_client -dnode ip:port -d file_name
- HUb communication
Join
-d filename
Result :: Index_data of the file, destination servers ip address
-u filename
Result:: destination ip addresses
File indexing
Split the file into chunks of 2MB
Hash each chunk (32 bytes)
Hash the concatenation of the chunk hashes, this will serve as file hash.
DHT::
File_name , file_hash
File_hash, {index_data, destination_server_uids}
In-memory data structure
Dnode_uid, dnode_ip, dnode_port, dnode_flags (32 bytes + 4 bytes + 2 bytes + 2 bytes) (40 bytes)
HUb -hub communication
1_ search for file_name
2) file_hash
2 users:
hub
map for file name, file hash
map for file hash, file information < list of peers to download from >
alive peers list
client
maintains map for uploaded files
map for file name, file hash
map for file hash, file information:
file location
size
file name
hub:
initialization:
initialize DHT
unique id for peers initialization
ports open ( two server ports )
while running:
Check for peer state when
upload request:
check file hash similar file present
check filename and size ( replication )
add as another peer for file
else
create new index with this file hash
on download request:
check file by file name ( complete matching )
check if peers are alive or not for peers last active before 30 mins:
if not then set not active
if active then check for file upload by peer
send list to requesting peer
peer:
initialization:
predefined ip address list of hubs
try connection with 2
get unique id and client details
system start:
recheck hub connections
check files uploaded are available or not
upload:
divide file into chunks and keep chunks hash
create file hash
send file details to hubs ( filename, size, uid, file hash, chunks hash )
download:
request list from hubs
from list check connection with peers
keep a queue for chunks
download chunks from different peers
verify each chunk
at the end verify whole file hash
seed ( replicate upload )
checkup by hub:
check if file is present for upload or not ( last checkup before 30 mins )
if not then return FALSE
else send TRUE
if node rejects download request:
send FAIL to hub
hub rechecks connection with corresponding peer for file
if refused, remove peer from list of ip address for file
if only 1 peer was there so delete file only
if hub fails:
peers find new hub and send upload details for files
if peer fails:
while downloading restart chunk download from another peer
# dnode directory
make sure to make bin and test directory in the root folder of the project.
for example if your project is at ~/Desktop/PeerToPeer_DistributedFileSystem,
make sure that you have ~/Desktop/PeerToPeer_DistributedFileSystem/test and ~/Desktop/PeerToPeer_DistributedFileSystem/bin present as well
samples folder is created only because we are testing the system on one local machine only, so when we run dnode1, it upload the file present in samples folder (see dnode1.sh, you will see that the path of the file being uploaded is there.)