Today’s society needs an abundant exchange of information to develop most activities or work. For example, companies, especially multinationals, distribute their projects among the many headquarters they have worldwide; this means that communication and information exchange must exist between the different locations for the proper development of their projects. Another example is universities, which need a system for exchanging information with students, providing grades, exams, and so on.
The beginnings of P2P
That’s why, around 1996, the first P2P application emerged from the hands of Adam Hinkley, Hotline Connect, which was intended as a tool for universities and companies to distribute files. This application used a decentralized structure and didn’t take long to become obsolete (as it depended on a single server); and as it was designed for Mac OS, it didn’t arouse much interest among users.
Napster
It was with Napster, in 1999, that the use of P2P networks aroused the curiosity of users. This music exchange system used a hybrid P2P network model, since in addition to peer-to-peer communication, it included a central server to organize these pairs. Its main problem was that the server introduced breakpoints and a high possibility of bottlenecks.
The Rise of the P2P Networks
This is why new topologies, such as decentralized topologies, are emerging. Their main feature is that you don’t need a central server to organize the network; an example of this topology is Gnutella. Another type is structured P2P networks, which focus on organizing content rather than users; as an example, we highlight JXTA. We also have networks with a Distributed Hash Table (DHT), such as Chord.
Next, we’ll develop the types of P2P networks mentioned above.
Early P2P systems: a hybrid approach
Early P2P systems, such as Napster or SETI@home were the first to move heavier tasks from servers to users’ computers. With the help of the Internet, which makes it possible to combine all the resources that users provide, they have managed to achieve greater storage capacity and computing power than servers.
However, the problem was that without an infrastructure that acted as an intermediary between the counterpart entities, the system would become chaotic, as each pair would end up acting independently.
The solution to the clutter problem is to introduce a central server, which will be responsible for coordinating the pairs (coordination between pairs can vary greatly from one system to another). These types of systems are called hybrid systems because they combine the client-server model with the P2P network model.
Many people believe that this approach should not be classified as a true P2P system, as it introduces a centralized component (server), but despite this, this approach has been and continues to be very successful.
How the hybrid system works
In this type of system, when an entity connects to the network (using a P2P application), it is registered on the server, so the server has control at all times over the number of pairs that are registered on that server, allowing them to offer services to other pairs. Normally, communication between pairs is point-to-point, as pairs do not form large networks.
The problems with the hybrid system
The main problem with this design is that it introduces a system stopping point and a high probability of the so-called bottleneck occurring (In data transfer when the processing capacity of a device is greater than the capacity to which the device is connected). If the network grows, the server load will also increase and if the system is unable to scale the network, it will collapse. And if the server goes down, the network won’t be able to reorganize.
Usage of the hybrid system
But despite this, there are still many systems that use this model. This approach is useful for systems that do not tolerate inconsistencies and do not require many resources for coordination tasks. For example, we present below Napster’s way of acting. Napster emerged in late 1999, hand-in-hand with Shawn Fanning and Sean Parke, with the idea of sharing music files between users.
Napster’s operation is that users must connect to a central server, which is responsible for maintaining a list of connected users and the files available to those users. When a user wants to get a file, they search the server and it provides a list of all pairs that have the file they are looking for.
Thus, the interested party searches for the user who can best provide him with what he needs (by selecting those with the best transfer rate, for example) and obtains his file directly from him, without intermediaries. Napster quickly became a very popular system among users, reaching 26 million users in 2001, causing discomfort for record labels and musicians.
Why it failed
That’s why the RIAA (Recording Industry Association of America) and several record companies, in an effort to put an end to it, took legal action against the company, which led to the shutdown of its servers. This caused the network to crash, as users were unable to download their music files. As a result, instead of putting an end to piracy, a large proportion of users migrated to other exchange systems such as Gnutella, Kazaa, etc.
Later, around 2008, Napster became an MP3 music sales company, with lots of songs available for download.
Unstructured P2P networks
Another way to share files is to use a non-centralized network, i.e. a network where any kind of intermediary between users is eliminated so that the network itself is in charge of organizing communication between peers.
How it works
In this approach, if a user is known, a union is established between them, so that they form a network, which can be joined by more users. To find a file, a user issues a query, which floods the entire network, to find the maximum number of users who have that information.
For example, to perform a search on Gnutella, the interested user issues a search request to his neighbors, and they to theirs. But to avoid collapsing the network with a small request, the broadcast horizon is limited to a certain distance from the originating host and also the lifetime of the request, because each time the message is forwarded to another user, their lifetime decreases.
The problems with Gnutella
The main problem with this model is that if the network grows, the query message will only reach a few users. If what we are looking for is something well-known, surely any host in our broadcast horizon will have it, but on the other hand, if what we are looking for is something very special, we may not find it because by having limited the broadcast horizon, we will have left out hosts that may have contained the information we are looking for.
Today, pure non-centralized P2P networks have been replaced by new technologies, as is the case with Supernodes.
SUPERNODES – A hierarchy in unstructured networks
The main problems with unstructured networks were the dissemination horizon and the size of the network. We have two possible solutions: either we increase the broadcast horizon, or we decrease the size of the network.
If we choose to increase the broadcast horizon, we increase the number of hosts to which we must send the query message exponentially. This would cause, as we have already seen, problems on the network, such as its collapse. Inversely, if we choose to reduce the size of the network, systems can evolve much better on the network, using supernodes.
How it works
The main idea of this system is that the network is divided between many terminal nodes and a small group of well-connected supernodes, to which the terminal nodes are connected. To be a supernode, you must be able to offer sufficient resources to other users, especially bandwidth. This network of supernodes, to which only a few can belong, is responsible for keeping the size of the network small enough not to lose search efficiency.
How it works is similar to the hybrid model, since the end nodes are connected to the supernodes, which act as servers, so that users only connect with other users to download exclusively. Supernodes store information about what each user has, so they can reduce the time of a search, sending the information to the end nodes that have what we are looking for.
This type of structure is still widely used today, especially since it is very useful for exchanging popular content information or searching for keywords. Because the supernode network is small, these systems scale very well on the network and don’t offer any stopping points like the hybrid model. Instead, they decrease robustness against attacks and network drops and lose precision in finding results, by replication via supernodes. If a small number of supernodes fail, the network is split into smaller partitions.
Structured P2P networks
This approach is being developed in parallel with the supernode approach described above. Its main feature is that instead of being in charge of organizing nodes, it focuses on organizing content, grouping similar content on the network, and creating an infrastructure that enables efficient search, among other things.
The pairs organize a new virtual network layer, “an overlapping network,” which sits on top of the basic P2P network. In this overlapping network, proximity between hosts is given according to the content they share: they will be closer to each other the more common resources they provide.
In this way, we ensure that the search is carried out efficiently within a not-too-distant horizon and without reducing the size of the network. As an example, JXTA, where pairs act on a virtual network and are free to form and leave peer groups. Thus, search messages normally remain in the virtual network and the group acts as a clustering mechanism, combining pairs with the same or similar interests.
This approach offers great performance and accurate searches if the virtual network accurately reflects the similarity between nodes and searches. But it also has a series of disadvantages: it has a high cost of setting up and maintaining the virtual network in systems where hosts enter and exit very quickly; They are not very suitable for searches that include Boolean operators, as nodes capable of searching with more than one term would be required.
A subclass within this type of P2P network is distributed hash tables.
Distributed hash tables (DHT)
The main characteristic of DHTs is that they do not organize the overlay network by its content or services. These systems divide up their entire workspace using identifiers, which are assigned to the peers using that network, holding them responsible for a small portion of the total workspace. These identifiers can be, for example, integers in the range [0, 2n-1], n being a fixed number.
Each pair participating in this network acts as a small database (the set of all pairs would form a distributed database). This database organizes your information in pairs (key, value). But to know which pair is in charge of the registration of that pair (key, value), we need the key to be an integer in the same range with which the participating pairs of the network are numbered.
Since the key may not be represented in integers, integers we need a function that converts keys to integers in the same range with which the pairs are numbered. This function is the hash function. This function has the characteristic that gives different inputs, it can give the same output value, but with very low probability.
So instead of talking about a distributed database, we talk about Distributed Hashes Table (DHT), because what each pair of the pair actually stores (key, value), is not the key as such, but the hash of the key.
We have already commented that each pair is responsible for a portion of the network’s workspace. But how is the pair (key, value) assigned to the appropriate pair?
To do this, a rule is followed: once the key hash has been calculated, the pair (key, value) is assigned to the pair whose identifier is closest (the immediate successor) to the calculated hash. In the case where the calculated hash is greater than the pair identifiers, the 2n module convention is used.
Once we’ve talked a bit about the basic operation of DHTs, we’ll see an example of their implementation, through the CHORD protocol.
Distributed search protocol in P2P networks: CHORD
Chord is one of the most popular search protocols distributed on P2P networks. This protocol uses the SHA-1 hash function to assign the two pairs and the stored information its identifier. These identifiers are arranged in a circle (taking all values of module 2m), so that each node knows who its predecessor and most immediate successor are.
In order to maintain the network’s scalability, when a node leaves the network, all its keys pass to its immediate successor, so that the network is always kept up to date, preventing searches from being erroneous.
To find the manager who stores a key, nodes send messages to each other until they find it. But, due to the circular layout of the network, in the worst case, a query can cover half of the nodes, making it very expensive to maintain.
To avoid this, and thus reduce the cost, each node has a stored routing table, in which is stored the address of nodes that are at a certain distance from it. In this way, when we want to know the responsible for the key k, the node searches its routing table if it has the address of the responsible for k; if it does, send the query directly; if you don’t have it, it sends the query to the node closest to k, whose identifier is less than k.
With this enhancement, we were able to reduce the search cost from N /2 to N log, with N being the network node number.
What are P2P Downloads
This type of download is done through a P2P connection. This means that we will need at least two parts. The name P2P comes from “peer to peer”. This is what allows you to connect between two or more rooms. For example, it can be used to share information between one computer and another. A method of sending and receiving files, just like in Crux.
This network exchange archive has been with us for many years. It has been widely used to perform Internet downloads. Fundamentally, one person or team shares data with another person or team. There is therefore a P2P exchange.
No intermediary is required in this type of download. Files go directly from one point to another. Both parties act simultaneously as clients and servers. The uses that can be given to this type of network are not exclusive to downloading. We can also use it to make VoIP calls, for example.
What are Torrent Downloads
We have an alternative with Torrent downloads. In a way, we can say that this is a variant of P2P downloads.
Torrent is a file format that stores content information that is shared over the network. Fundamentally, what we do is download a file to open it later with another application that is able to read that information and get the contents.
Mainly used for sharing large files. Of course, it can also be used for voice communications, for example. One of the main differences is that we don’t download from a single server, but rather access a kind of swarm where many users can download and upload content simultaneously.
This type of download can free up servers of space and resources. For example, a Linux distribution may have its own server where users download the file to be installed on their computer. Some limited resources are required. If this distribution only offers the Torrent File, this user can download it from an application that can read it and not depend on a single server.
You could say that P2P is the type of network and Torrent is the type of file. These are terms that are ultimately related.
Are P2P networks secure?
There are many myths associated with Internet Downloads. We can read information that downloading files can infect our computers, which can use platforms of this type to steal passwords or access the victim’s systems. It is worth noting that all of this could happen, but what matters most is how we use it. That is, how we use P2P platforms to download.
A P2P network lets you use the resources of a network to share content with other users.
Distribute bandwidth for this purpose. Therefore, the speed and quality of a download will depend on factors such as the Internet speed we have contracted (the maximum speed in this case) and the use we are giving it at the time (for example, if there are many computers connected and that can limit it).
Now, is it safe to use P2P networks to download? The truth is that if we don’t make good use of these services, it could be a risk. We could run into problems, as we’ll see below. For example, downloading a file that is actually malware and has been baited.
P2P networks can contain files of all kinds. By this, we also mean that they can be small documents, but also large files. Cybercriminals can take advantage of this flexibility to sneak in viruses and any malware to achieve their goals.
Data filtering
Our personal information, equipment data, or files we have on devices could be compromised. There could be a data leak that puts our privacy at risk. This is one of the negative aspects of using P2P networks and making mistakes.
We must always keep in mind what we are sharing and with whom. In addition to avoiding giving more permissions than necessary to the applications we use, that could access personal and sensitive content that we don’t want to be filtered on the network. If we mistakenly download a virus via these types of networks, we may expose our personal information.
Malware distribution
P2P networks are used to download content. Not everything is legal, as we know. We can find files that are protected by copyright. Sometimes a movie, game, or other file we’ve found has been maliciously modified and isn’t what it should be.
This is where hackers introduce malware. They create a file with the name of the movie, music, or book we want to download while in reality, what we download to our team is malware that could steal information. There are many types of viruses and they can be hidden in text, audio, and video files… The options are wide open for hackers.
Resource consumption
While this is not a problem directly related to security, it is another of the disadvantages we might have in using P2P networks. This will depend mainly on the resources of our system. The less we have, the worse it is. This is a problem that also affects our Internet connection, as it can consume bandwidth and cause browsing problems.
Every time we upload data or share files with third parties, our team puts its resources into making it possible. We may have difficulty navigating from other devices.
How to use P2P networks safely
We have seen that if we do not take adequate measures, P2P networks can be a problem that affects our security. Now, let’s give some basic advice to try to minimize the risk and do everything right.
Choose the right program to use
The first thing is to choose correctly which program we are going to use. We must inform ourselves in advance on the Internet. There are many options and not all of them are safe. We could install dangerous software, which was created to attack. We need to inform ourselves, read other users’ comments, etc.
Therefore, the first step we must take is to carefully choose where we are going to download the program and which one. You should always download it from official sources, avoiding third-party sites that can be a problem.
We can assure you that Crux is one of the safest software to use. We’re constantly working to make sure our software meets the latest cybersecurity cautions, keeping it clean as a whole. The same for the files uploaded and downloaded using the software, which are scanned to ensure that all of our users are safe from the malware distributors.
Download only trusted files
When downloading files you need to make sure they are reliable. We shouldn’t download software that we don’t know is legitimate or who uploaded it to the network for later download. We could be downloading malware without realizing it. Once again, it’s important to read the comments of other users who may have left.
Also, a good idea is to first analyze the files we have downloaded or even open them on another secure computer. This way, we’ll avoid compromising our core team and putting privacy at risk. This is a simple option that we can put into practice, for example by using a Linux system for these tests.
Install an Anti-Virus
Of course, you should always have a good antivirus. We have at our disposal many programs that serve to protect us on the net. Especially when we’re going to be using platforms like this, we need to be well protected. Some examples we can name are Windows Defender, Avast, or Bitdefender.
We have many programs at our disposal. They are both free and paid and are available for all types of platforms and devices. No matter what operating system we use, you should always have security tools.
Have updated equipment
Another issue to keep in mind is to always have the latest versions and patches. Sometimes, we encounter certain vulnerabilities that can be exploited by hackers. This is a problem if we haven’t corrected it before. This includes the P2P program itself, but also the system, antivirus, etc. The objective here is to be able to correct any vulnerabilities that appear.
Our advice is to always have the equipment updated, but also the P2P programs we are going to use. This way, we minimize the risk when uploading or sharing files over the network via this type of service.
In short, the P2P network is not in itself a danger. However, we could run into problems if we mistakenly downloaded an unsafe file and infected the system with malware. That’s why we must always be careful, have security programs, and update everything.
Conclusion
As we have seen, there are many types of P2P networks, each with its strengths and weaknesses. No one is different from the other, which allows, when programming, for example, a P2P application, to have several options, each with its own characteristics.
One thing to keep in mind is the evolution of how information is shared. By the end of the last millennium, the use of P2P networks was abundant and, for most people, it was the only known way to share information. Today, the trend has changed. People now prefer to exchange files via large servers where, in some cases, they pay users to stay on them.
Some questions that may come to mind are: What is the future of P2P networks? What ways of organizing information are evolving?
One of the possible evolutions is the jump from P2P to p4p. What is P4P? In summary, we would say that P4P, also known as hybrid P2P, is a small evolution of P2P whose main characteristic is that the service providers, the ISPs, play an essential role within the network because when searching will first be searched among the participating nodes that belong to the ISP itself.