Model checking with Spin

2 07 2009

Real engineers, when faced with complex design problems build and analyze prototypes to reduce the risk of implementing a flawed design. Can the same be done with software also? How can we show if the system meets its design requirements? It is hard to devise all the possible test cases especially for concurrent software.

One of the most powerful formal methods is model checking. In principle model checking generates all possible states of a program and checks that the correctness specifications hold in each state. Model checking is challenging because the model should describe the system in sufficient detail and yet the model must be sufficiently simple so that the model checker can perform the verification within the reasonable limits of time and memory.

In this blog post I will model the Peterson’s algorithm and verify the mutual exclusion property. We will use Spin model checker. Spin accepts design specification in the verification language Promela. Promela is not an implementation language but a systems description language. The emphasis in the language is on the modelling of process synchronization and coordination, and not on computation.

/** Model specified using Promela for Peterson's algorithm **/
/* condition for mutual exclusion */
#define mutex (critical <= 1)
/* flag value is 1 when the process wants to enter CS */
/* turn holds the ID of the process whose turn it is */
bool flag[2], turn;
/* number of processes in the critical section */
byte critical = 0;
/* instantiating two processes */
active [2] proctype process()
{
        /* process identifiers start from 0 */
        pid i = _pid;
        pid j = 1 - _pid;
        /* do statement*/
        do
        /* only one choice for execution */
        ::     /* process wants to enter CS */
                flag[i] = true;
                /* but asks other process to go ahead */
                turn = j;
        /* wait until the other process is not interested, */
        /* or it is his own turn */
        /* process is blocked until the statement is true */
                !(flag[j] && turn == j);
                /* critical section */
                critical++;
                critical--;
                /* end of critical section */
                flag[i] = false;
        od;
}
Correctness claims are specified in the syntax of standard Linear Temporal Logic – a logic which is built for reasoning about things that change over time. Mutual exclusion can be expressed using the temporal operator for always – [].

For verification, the negation of the LTL formula is added with the -a argument (verification mode) and -f argument (fairness condition of contention manager) to the Spin command which generates the C source code for the model checker. The file is compiled with the -DSAFETY argument so that it is optimized for checking safety properties. Absence of an example on executing pan will indicate that the property is not violated.
$ spin -a -f '![]mutex' peterson.pml
$ gcc -DSAFETY -o pan pan.c
$ ./pan
Fortunately there is no ‘Model Checking for Dummies’ book as of now. To get started on this subject, I would suggest that you read the chapter on ‘Verification by Model Checking’ in the book Logic in Computer Science followed by the manual of the appropriate model checker.





Combining different software licenses

22 06 2009

Any big software is made up of small modules. With the rise of open source, the developer can now use different open source libraries, toolkits, languages etc. Though open source, these components might be covered under different and even mutually exclusive licenses. Also the developer might have opted for a new license to release his final product. It is up to the developer to make sure that combining code does not violate the licensing terms that apply to each piece.

You must have heard of General Public License (GPL), Lesser GPL (LGPL), Apache license, BSD license, Mozilla Public License along with their version numbers. The problem in combining different licenses is that inbound rights – the rights granted from others to you should be as broad as outbound rights – the rights you granted to others.

Apache Software License is incompatible with the GPLv2 because it has certain patent termination cases that the GPL does not require.

Apache Software License is incompatible with the GPLv2 because it has certain patent termination cases that the GPL does not require.

The first step in understanding the problem is to know about the restricting nature of the open source licenses.

  1. Permissive
    Permissive licenses like Berkeley Software Distribution (BSD) and Apache permit the software to become proprietary.
  2. Copyleft
    They contain flow-down provisions that must be included in any of the downstream outbound licenses. For example, all licensees who accept code under GPL, must provide it to others under GPL. GPL requires that executables are accompanied with the complete corresponding machine-readable source code. The GPL says:

    10.  Automatic Licensing of Downstream Recipients
    Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License.
    You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License.

It is possible for the developer to divide the code base into different parts each covered in its appropriate license. However, GPL covers all code in the same executable, except proprietary system libraries. In contrast, LGPL code can be combined with any other code by using the LGPL code as a dynamically linked library. GPLv2 itself is not compatible with GPLv3. Eclipse Public License allows derivative works to choose their own license for their contributions.

The more you read and interpret the license documents, technical definitions seem fuzzy and legal language becomes philosophical. You can get a brief idea about the compatibility of common licenses on the FLOSS license slide.

I would recommend that you go for permissive licenses like BSD. The more permissive the license, the more likely the software will be used. But if you are a believer in freedom of software and don’t want your code to be used in any proprietary program then GNU GPL is the answer.





Making selfish BitTorrent clients history

16 06 2009

It is evident from the study of different exploits that we need to keep a check on the number of peers that we send to a newly connected client.

The tracker has a list of all the peers for the torrent. A new field for timestamp is to be added for each peer. The following steps should be followed when a client connects (with its unique IP and port address) to the tracker:

  • check if it is in the connected peers list
  • if not or if the time when it last contacted the tracker exceeds 1800 seconds (1800 is selected since it is the average time of scheduled exchange of messages between tracker and the peer), then send it the list of peers
  • add/update to the list of peers in the trackers, the timestamp describing when it established the connection with the tracker

The client can then only contact the tracker after a specified time of 1800 seconds to give information about its status and a new list of peers. This will help in removing peers which have moved out of the network. But genuine clients might have to restart the download or the peers in their list might have died.

When a client connects again within the 1800 seconds period:

  • the clients will have to provide the original list of peers given to them
  • tracker will verify the list and give a new set of replacement only for the peers in the list which have died out

So the repeated query for a list of peer sets will not yield new peers. Hence the methods which are dependent on having large peer set will be useless. There would not be any significant overhead in the network as even if the client sends the list of peers it has, it will only get a small set of peers which have died out in return. Otherwise it would have got a new random list of all the peers.

The tracker has to do some additional work in checking the timestamps and died out peers in the list sent by newly connected clients. However at present the capabilities of the tracker are underutilized, so a small overhead should not be a problem.

Different BitTorrent Clients

Different BitTorrent Clients

Some other clients also apply enhancements to overcome the free-riding behavior as described below:

  • Azureus client stores the information from which it has received the subpieces and immediately bans the IP address once the hash verification fails. Only a recent subset of history is necessary to determine the trustworthiness of a given peer.
  • The trackers of the sharing communities should verify the upload numbers sent by the peers. For instance, the sum of all reported download and upload amounts could be analyzed over different torrents and time periods, in order to detect and ban dishonest peers.
  • Many clients now implement super-seeding. Rather than claiming to have every piece from the outset, the seeder claims to have no pieces. As peers connect, the seed will inform a peer that it has received a new piece, one that has not yet been sent to any other peers. Thus allowing nodes to hide their properties does not allow selfish peers to detect the seeds.

The BitTorrent protocol is based on the obedience of clients. This appears unrealistic as more users will employ free riding clients because of their reluctance to upload. The users should realize that they need to contribute their resources in order to improve the download rates. On the other hand, in many countries, it is illegal to upload but okay to download, maybe these rogue clients can come to their rescue.





Free riding on BitTorrent

15 06 2009

I will, in this post, describe the loopholes in the BitTorrent protocol and how they are exploited by the rogue clients.

Loopholes in the BitTorrent protocol

  • Peer list during startup
    During start-up phase a new peer connects with tracker and requests for a list of peers (seeders and leechers) having the pieces of the file. The tracker generally responds with 50 peer addresses per announcement. This parameter can be increased to at most 200 in the announce request. Tracker announcements are repeated at an interval received in the first announce response, usually in the order of once every 1800 seconds. However downloaders may rerequest on nonscheduled times if an event happens or they need more peers.
  • No upload to seeders
    Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder. So the peer having connection with seed does not have to upload any data to the seed. Seeders upload to all peers in the round-robin fashion.
  • Optimistic Unchoke
    The purpose of optimistic unchoking policy is to allow newly joined peers without any pieces of torrent to bootstrap to the swarm. Which peer is optimistically unchoked rotates every 30 seconds. Newly connected peers are three times as likely to start as the current optimistic unchoke as anywhere else in the rotation. This gives them a decent chance of getting a complete piece to upload.

How rogue clients work?

BitTorrent Protocol vs Rogue Clients

  • Downloading only from seeds
    A selfish client, upon connecting, repeatedly asks for new lists. Since most trackers perform some form of load balancing, after a short period of time, such a client has the information for most of the seeds in the torrent. The selfish client then only connects and downloads pieces from the seeds and ignores the leechers. Since seeds are typically high-bandwidth clients, the selfish client are able to sustain high download rates without any uploads.
  • BitThief
    BitThief re-announces itself frequently during the startup period to the tracker in order to get as many remote peer addresses as quickly as possible. BitThief also sends an empty list of available pieces during connection setup and it does not inform the remote peer about any new pieces it acquires. So the BitThief client has a big list of different seeders and leechers for the torrent. The default BitTorrent client has 4 open connections (i.e. it is connected with at most 4 peers at a time). However the BitThief opens a large number of connections.

    Having more connections increases the number of seeders in the swarm, so the client benefits from their round robin unchoking periods. BitThief never uploads to the leechers and is regularly snubbed by them. The leechers in the swarm would also include BitThief in their periodical optimistic unchoke slot.

  • Pretending to upload
    Since, many times, torrent files are of pure quality or are fake, a lot of sharing communities have emerged. Sharing communities require their users to upload at least as much data as they download, i.e., to keep their sharing ratio above 1. The community sites make use of the tracker announcements which every client performs regularly. In these announcements the client reports the current amount of data downloaded and uploaded. However, the client can announce bogus information and fake peers so that the tracker’s peer list fills up with dozens of clients which do not exist.
  • Advertising false pieces
    A selfish peer when asked for a sub-piece can send garbage data. The receiving leecher is able to verify the SHA1 hash only after receiving all sub-pieces of a piece.

    When selecting which piece to start downloading next, peers generally download pieces which the fewest of their own peers have, referred to as rarest first. A client can announce it has rare pieces, so more peers try to connect with it, they get garbage values while the rogue client will have a piece of the file.

Next post will list the enhancements used by some clients to overcome the free-riding behavior and propose a method to limit the peer list a client has.





Hello world!

15 06 2009

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!








Follow

Get every new post delivered to your Inbox.