After taking a look a peer to peer systems in the previous blog post, let us look at some key technologies which speeds up the lookup service in peer to peer systems.

In this blog post, we will be looking at distributed hash tables.

## Searching and Addressing

Before diving into distributed hash tables, let us first look at how we do searching and addressing in general.

There are 2 ways to find objects, depending on:

- How the network is formed
- Where objectives are places
- How objectives can be found efficiently

Examples include:

- Google search (Searching)
- DNS / IP Routing (Addressing)
- Napster (Searching)
- Gnutella (Searching)
- KaZaA (Searching)
- BitTorrent (Addressing)

## Searching VS Addressing

Addressing:

- Object location can be made efficient
- Each object is uniquely identifiable
- Need to know unique names
- Need to maintain structure required for addressing

Searching:

- No need to know unique names (More user friendly)
- Hard to make efficient
- Need to compare actual objects to know if they are the same

## 2 Types of P2P Networks

Unstructured P2P Networks:

- Cause the need for searching
- Does not mean complete lack of structure (Can have graphs structure like power-law, hierarchy, etc)
- Peers are free to join anywhere, choose neighbors freely.
- Objects are stored anywhere

**Structured networks**

- Allows for addressing, deterministic routing
- Network structure determines where peers belong in the net and where objects are stored.

In this blog, we will be focusing on how distributed hash tables (DHTs) can be used to build structured networks.

## A Key Value store

It is a type of database which contains entries in the form of (key, value) pairs. The key is typically the ID (EG: Name of a person, IP Address) The value is the usually the data (EG: Phone number of a person)

Key value stores usually consists of 2 operations:

`Put(key, value)`

`Get(key)`

It looks something like the table below

Name | Contact |
---|---|

John | 12345678 |

Mary | 87654321 |

Bob | 12345678 |

Alice | 87654321 |

The naive implementation takes O(N) to find the location of an object. Can we do better?

## Hash Tables

Hash tables are data structures that map keys to values. They contain fixed size buckets that allows for O(1) lookup time, deletions and insertions.

A hash function maps keys to the hash buckets with the desirable properties. They are usually fast to compute and distribute keys uniformly across the buckets.

## Distributed Hash Tables

Similar to normal hash tables, distributed hash tables borrow the idea of hash tables and distribute buckets to peers.

The core question is how to find out which peer is responsible for which hash bucket and how to route traffic between peers.

## Principles of DHTs

- Each node is responsible for one or more buckets
- As nodes join and leave, the responsibility changes

- Nodes communicate among themselves to find the responsible node
- Scalable communications makes DHTs efficient

- DHTs support all the hash table operations.

## Implementation of DHTs

In this blog post, we will be looking at 2 different implementation of DHTs

- Chord (2001)
- CAN (2001)

There are many other examples of DHTs but they are all based on the same abstractions.

- Store key-value pairs
- When given a key, can retrieve/store the values
- No semantics associated with key or value

The major differences are in the design of the namespace and the routing in the overlay.

## Chord

It is a DHT algorithm designed by MIT in 2001. This is algorithm is used in P2P storage systems.

It uses the SHA-1 Hash function in practice This results in at 160-bit object/node identification.

The same hash function is used for both objects and nodes

- Node IDs are hashed from IP addresses
- Object IDs are hashed from object names

They are organized in a ring which wraps around The nodes themselves keep track of predecessors and successors

### Assigning indices

Each identifier is represented using m bits (In the case of SHA-1 160 bits).

The convention used by Chord is to assign objects to the node that has the closest identifier that is larger than the object’s identifier.

The closest in this case is the immediate successor.

Imagine we use a hash functions that outputs a number from 1 to 5. It will result in a space that is similar to the one below.

Given if there is a node at position 2 and 4 and treating intermediate as moving as a clockwise manner in the diagram:

- Node 2 will be responsible for the objects 5 and 1.
- Node 4 is responsible for 2, 3 and 4.

This is because Node 2 is the first immediate node that is available after 5 and 1. Similarly, Node 4 is the first immediate node that is available after 2, 3 and 4.

### Searching for keys

However, if we follow trace the nodes linearly when searching for the key, we will end up with a linear search which runs in O(N) time. This is not ideal when the hash table is large and the number of nodes is very large.

Chord adds shortcuts in the form of a finger table to make the searching more efficient.

- Each node n maintains a finger table that includes at most m shortcuts (Where m is log(n)).
- The ith finger/shortcut is at least 2^(i-1) apart from the previous finger.

In the example, the color nodes indicates that nodes are filled by peers (Nodes 1, 4, 5).

In this case, the finger table for node 1 will be:

Start | Interval | Node |
---|---|---|

2 | [2, 3) | 4 |

3 | [3, 5) | 4 |

5 | [5, 1) | 5 |

It is the first available node which is larger than the start of the interval.

**Note** that the node itself is not included within the finger table as it knows where to find itself.

Similarly, the finger table for node 4 will be:

Start | Interval | Node |
---|---|---|

5 | [5, 6) | 5 |

6 | [6, 0) | 1 |

0 | [0, 4) | 1 |

And the finger table for node 5 will be:

Start | Interval | Node |
---|---|---|

6 | [6, 7) | 1 |

7 | [7, 1) | 1 |

2 | [1, 5) | 1 |

When looking up a value within a certain interval, a node will simple look for the node that is assigned to that region in the finger table.

Let us walk through how to search for a key using the same example above. The diagram is replicated below for your convenience.

Lets say node 5 wants to find the value for key 3. This are the steps it will take to find the node containing the resource.

- Node 5 will look at its finger table and find the node responsible for the interval, in this case it is node 1.
- Node 5 will then ask node 1 to find the value for key 3.
- Node 1 will check its finger table and find the node responsible for the interval, in this case it is node 4
- Node 5 will then query node 4 to find the value for key 3 and finds it in node 4.

As we can see from the process, each time there is a query, the number of nodes that needs to be queried is reduced by half. This will reduce the runtime for this algorithm to O(log(N)) instead of O(N) previously.

## Node join

When a new node, n, joins the network, the following happens

- Node n is initialized
- Other nodes will be notified to update their predecessors and finger tables. (Only those who are affected)
- Node n takes over responsible keys from its successor

For the initialization, the node n can check if the `ith`

entry in the finger table is correct for the `i+1th`

entry. This results in `O(log(N)log(N))`

time.

## Node Leave

To handle node departure, each node is required to know the IP address of its 2 successors. Each node periodically pings its two successors to see if they are alive.

Given the example below, if node 5 leaves:

- Node 4 will detect that it is unable to ping node 5.
- It will then make 1 its immediate successor and ask node 1 who its immediate successor is.
- It makes node 1’s immediate successor its second successor.
- It will update its finger tables.

The re-establishment of the finger tables will take `O(log(N)log(N))`

time.

## CAN

Scalable content-addressable network (CAN) is a DHT algorithm designed by the University of California, Berkeley in 2001 and is published within the same conference as Chord.

The namespace of CAN is a d-dimensional torus. This means that the nodes are arranged in a d-dimensional grid and the edges wrap around.

A 1-dimensional torus can be thought of as a ring. A 2-dimensional torus can be thought of as a plane with edges that wrap around (EG: (0,0) is beside (0, n) and (n, 0)).

Each node keeps track of its neighbors only and there is no need to store shortcuts.

Traversing between nodes will be similar to traversing through a d-dimensional euclidean space.

### Searching for keys

Assuming a 2 dimensional euclidean space, each node contains 4 neighbors, 2 for each axis.

For each dimension in CAN, there will be one hash function responsible for hashing the keys. By passing the object through each of the hash functions, we can determine the coordinate for the object.

Based on the direction of the coordinate, the node knows which direction to traverse in.

For example, if the node is at (0, 0) and the key is at (1, 1), the node will traverse to the right first before traversing up.

Routing takes in the worst case `O(dn^(1/d)) for d dimensions and n nodes.

### Node Join

When a new node joins, it will randomly choose a coordinate (x, y) and will be assigned to the node at (x, y). By traversing from the other peer, it will find the current node assigned to the coordinate and splits the node in half.

Now the new node owns one half and the old node owns the other half.

When a new node joins, the split occurs along the x dimension first before the y dimension. In the 2 dimensional space, each zone is a square or a 1 by 2 narrow rectangle.

### Node Leave

When a node leaves, the nodes merge back to a neighbor if possible. Otherwise, a neighbor node might need to temporarily handle multiple zones.

### Extensions

The runtime of CAN depends on the dimensionality of the implementation. By increasing the dimension beyond the 2nd dimension, the routing table size and hash functions increases linearly according to the number of dimensions but the path to reach a coordinate is reduced.

Although routing takes `O(dn ^(1/d))`

, the average path length is `O((d/4) * n ^ (1/d))`

as only half the nodes will need to be traversed in an average case.