Monthly Archives: October 2016

Second example of MDS, using travelling time between cities as distance

Introduction

A while ago, I wrote a post on Multidimensional Scaling in R and Gephi, using an example based on geographical distances between some Dutch cities. As I mention in that post, the example is a bit silly, because reproducing the geographical distances between the cities in an MDS configuration has little added value. I therefore decided to write a new post, in which I make use of the same set of cities, but use a different kind of distance between them. The distance that I used is travelling time when using public transport (e.g., train, bus, tram, etc.). As in the first post on this topic, I use R and Gephi in the demonstration. For a brief introduction to MDS, please see my first post on this topic, referenced above (I also mention several academic references there).

The travelling distances

As I mentioned in the introduction, I use travelling times (using public transport) between a set of cities in the Netherlands for this example. Indeed, these distances are not absolute, and they will depend on the time of day that you are travelling, possible disturbances on the way, and so on. To determine the travelling times, I made use of the website 9292.nl, which is the website that you can use to plan journeys with public transport in the Netherlands. For each travel, I assumed that we would want to arrive at our destination at around 09:00 on Monday October 24th, 2016. Indeed, if I would have picked another time of day, or another day in the week, the results of this exercise could have been different.

When searching for possible journeys, the website 9292.nl will typically give you four options that will bring you to your destination around the requested time (see the screenshot below), and it will (among other things) indicate the time it will take to reach the destination.

Screenshot of 9292.nl

I always picked the option that would bring us to the destination quickest. I also assumed that travelling from city A to B would take the same amount of time to travel from city B to A. I tested this assumption for a few cities, and it does not actually hold for all cities, although the differences are small enough to be considered negligible. The picture below shows the distance matrix that I constructed this way, where the distances are travelling time in minutes when using public transport.

Travelling Distances

Running the MDS analysis in R

I performed the MDS analysis in R (see my first post for a more detailed discussion of how this works). As I mentioned in my earlier post on MDS, I embedded R in emacs, using emacs speaks statistics (just in case you are wondering). I saved the matrix shown above as a csv-file, using semicolons as the column delimiters. I then started a session of R, and loaded the vegan package, which is the package that contains the MDS algorithm that I want to use (see the screenshot below for a full log of the commands that I used). I imported the csv-file, loaded it into a data frame, and ran the metaMDS algorithm, requesting a solution with 2 dimensions. As the log below shows, I got a solution with a stress value of about 0.0917, which indicates a very good fit. I plotted the space created by the solution, which is also shown in the screenshot below.

Running the MDS in R

Visualising the result

To make a nicer visualisation, I exported the coordinates of the MDS configuration to a new csv-file. For this, I made two vectors (dim1 and dim2), and put these in a data frame as two columns (see the screenshot below for the commands I used). I then wrote the resulting data frame to the disk, with the write.table() command.

Exporting the coordinates

The file that I created can be seen in the screenshot below. Before I imported this file in Gephi, I made some small changes. The original file has three columns: 1 column with the city names, and 2 columns with coordinates on 2 dimensions. I renamed the first column to Id, then copied the column entirely to a new column that I called Label. These are columns that are typically used in a Gephi nodes list, which is exactly how I am going to import the data. I did not change anything in the other two columns.

Adjusted coordinate matrix

I started up Gephi, and I imported the data from the data laboratory (see my first post for details). In this step it is especially important to import the coordinate data as double values. To reproduce the layout in Gephi, I used the MDS Layout that I wrote for Gephi some time ago. A visual inspection of the original plot that I made with R already showed me that the MDS configuration places the cities in a similar layout as the layout that we normally use in geographical maps, with the exception that the x-axis (dimension 1) corresponds to the north-south axis on geographical maps, and the y-axis (dimension 2) corresponds to the west-east axis on geographical maps. For a more intuitive map, we want this the other way around. In the MDS Layout menu, I therefore selected dimension 2 as my x-axis, and dimension 1 as my y-axis. After running the algorithm, I get the layout as shown below (I only re-positioned the labels in Inkscape).

Gephi Layout

Interpreting the result

So what do we have here? In the 2-dimensional configuration that we have created, the distances between the cities are proportional to the time it takes to travel between these cities, using public transport. In the previous example, we used geographical distances between the cities as distances, which meant that we could plot the resulting configuration on a blank map of the Netherlands, and thereby create a quite accurate geographical map. That was possible because that configuration (with geographical distances), and the blank map we pasted it on both model ‘geographical space’. However, our new configuration, which is based on travelling times as distances, models a different kind of space. Let us call this space the ‘travelling time space’ for now.

In principle these two spaces are not really comparable. However, I thought it would still be interesting to plot both of them on a blank map of the Netherlands. Therefore, I took the configuration that I produced in my previous example (again, based on geographical distances), and copied the coordinates of that configuration in the file of the new configuration (based on travelling time). Thus, I now have a file with 4 coordinate columns, 2 of which hold the coordinates of the cities in our space of geographical distances, and 2 of which hold the coordinates of the cities in our space of travelling times (I switched some of the dimensions around to make sure that the configurations are oriented in an intuitive way). The resulting file can be seen below.

Both configurations - data file

I imported this file into Gephi, and I plotted and exported the two configurations separately. I then used Inkscape to integrate the two plots, making two different versions of the integration. In one version I used the city of Utrecht as my reference point (making sure that the position of Utrecht overlaps in both plots), because it is a central and major public transport hub in the Netherlands. In the other version I used Rotterdam as my reference point, because this lead to a result in which the cities in the two configurations lie much closer to each other (also, because I have lived in Rotterdam for quite some time, I am a bit biased, and think that Rotterdam is the best reference point that you can have in the Netherlands). I gave the cities that serve as reference points a green colour to emphasise their special role in the plots, I gave the cities in the geographical space a blue colour, and I gave the cities in the travel time space an orange colour. I then pasted the results on top of a blank map of the Netherlands, and made sure that the position of the cities in the geographical space (blue) are more or less correct. It should be noted that integrating the two spaces is, in principle, not possible, but by doing so we can make some simple comparisons of the relative distances between the cities in the two types of spaces. Using the blank map of the Netherlands mostly makes this exercise less boring, but (as we will see below) may also help us explain some observations. The resulting plots can be seen below.

Combined map A

Combined map B

What is immediately obvious is that in the space of travelling times the relative distances between the cities are different from their relative distances in the geographical space. Some cities seem to be closer to each other (e.g., Utrecht and Zwolle, Leeuwarden and Groningen, Rotterdam and Breda), while other cities are further away from each other (Utrecht and Maastricht, Utrecht and Rotterdam, Zwolle and Den Helder, Rotterdam and Middelburg). The interpretations I give below should really be taken with a grain of salt, because they are all based on intuitions, rough guesses, and pairwise comparisons of city’s positions, where it is probably much more helpful to consider the configuration as a whole (but I don’t want to break my brain; I am going to need it for a bit longer).

In both configurations, Den Helder and Middelburg are both further away from all the other cities in the travel time space than in the geographical space. Intuitively, I think this makes sense, because due to their geographical locations, these two cities seem relatively hard to reach by public transport (or even in general). For example, there is a natural barrier between Den Helder on the one hand, and Leeuwarden, Groningen and Zwolle on the other hand. It is possible to reach Den Helder from Leeuwarden more or less directly by bus (I suspect this bus crosses the “Afsluitdijk”, indicated on the map by the line between the land mass of Den Helder, and the land mass of Leeuwarden and Groningen), but it is not possible to make that journey by train (you would have to travel via Utrecht). The estuary in which Middelburg is located can also be understood to act as a kind of natural barrier. According to a map of Dutch train tracks, there is only one train track that leads to Middelburg, coming (more or less) from the direction of Breda.

Other observations are a bit harder to explain. For example, Utrecht seems to have moved quite a bit to the northeast (this can be seen most easily in the plot where Rotterdam is the anchoring point). It might be that this is because Utrecht is less affected by the natural barrier formed by the “IJsselmeer”, and is therefore relatively close to both Den Helder and Groningen and Leeuwarden, but that does not explain, for example, why the distance between Utrecht and Rotterdam has increased. Here, it may have to do something with the exact route taken by trains that run between these two cities, and the number of stops they have to make on the way. However, it may be necessary to consider the distances between a multitude of cities to fully understand these observations. For example, Rotterdam and Breda are pulled towards each other because there is a high speed train service between these cities that does not run between most other cities. At the same time, Breda and Utrecht are probably pushed away from each other because they are relatively poorly (or indirectly) connected. Thus, these can be understood as pulling (Rotterdam-Breda) and pushing (Utrecht-Breda) forces that act against each other, while also interacting with other pulling and pushing forces, finally leading to a rather complex picture. As I mentioned before, offering a good interpretation of these distances, and especially of the differences between the geographical and the travelling time distances, is a bit tricky.

Closing statements

This was another simple example of performing MDS in R and Gephi, and I hope you enjoyed it. As promised in the first post, I will try to write posts with additional examples in the future, probably focusing more on “social distances”, thereby moving further away from the more intuitive types of distances such as geographical distance and travelling time. I hope these future examples will make clear that the concepts of space and distance can be generalised in ways that we would not consider in our daily lives, and that they can show us things that interesting and relevant from a social scientific point of view (for example, some of Bourdieu’s most famous work makes heavy use of forms of ‘social distance’). For now, I say goodbye.