Additional functions for directed, a-cyclic graphs

Earlier, I posted R-scripts that include functions for finding paths in directed, a-cyclic graphs. I (and some colleagues) use this type of graph to model processes (sequences of events) as networks. We refer to this type of graph as event graphs (Boons et al. 2014; Spekkink and Boons, in press), where events are represented by nodes, and the relationships between events are represented by arcs and/or edges. The underlying principles of this approach were based primarily on the work of Abell (1987) on the Syntax of Social Life, where he introduces narrative graphs.

In addition to the path-finding algorithm, I recently created a plugin for Gephi, called Lineage, that identifies all the ancestors and descendants of a node (chosen by the user) in directed graphs. Based on the logic that I used for the Lineage plugin I also did a complete rewrite of the function (for R) that I designed for finding paths in directed, a-cyclic graphs (see this post). I decided to use the same logic to write some additional functions that explore ancestors and descendants of nodes in directed a-cyclic graphs. I wrote four new functions for R:

  • GetAncestors(arcs, node): This function returns all the ancestors of a given node, which is submitted by the user by passing it as an argument of the function (node). The other argument (arcs) should be an edge list, structured as a 2-column matrix. The first column should report ‘source’ nodes, and the second column should report the ‘target’ nodes that the ‘source’ nodes link to (see table below for an example of such an edge list). This edge lists represents the network under investigation.
  • GetDescendants(arcs, node): This function does the same as the GetAncestors() function, but returns the descendants of the input node, rather than its ancestors.
  • CommonAncestors(arcs, nodes): This function finds all the common ancestors of a pair of nodes, or multiple pairs of nodes. In this case too, the arcs argument should be an edge list, as described earlier. The nodes argument can either be a vector of two nodes, or a matrix with multiple pairs of nodes (similar to the edge list displayed in the table below). The function writes a file in which it is reported whether the submitted pairs of nodes have ancestors in common, how many ancestors they have in common, and what these ancestors are. The file is placed in the current working directory of the user.
  • CommonDescendants(arcs, nodes): Same as the CommonAncestors() function, but in this case the function returns the common descendants of the pairs of nodes submitted by the user.

All four functions assume that the events are labelled with numbers. Unless the edge lists that are submitted by the user have numerical variables, the functions won’t work as they should.

Source Target
1 2
1 3
2 4
2 5

So what are the functions useful for? I assume that different people may come up with different uses, but I use them specifically as a step in the identification of emergent linkages between events (see Spekkink and Boons, in press). It will take me some paragraphs to explain this.

In my PhD research I introduced a distinction between intentional linkages between events, and emergent linkages between events. An intentional linkage exists between two events A and B if in event B actors intentionally respond to conditions created in event A. For example, event A might concern the development of a plan, and plan B may concern a feasibility study of that plan. I used the idea of intentional linkages to identify different streams of events in a process, that may, for example, represent different projects that unfold independent from each other. In my thesis, I used intentional linkages to identify independent projects that were later assembled into a larger collaborative process, and can thus be understood to serve as building blocks of the collaboratie process. The figure below shows a phase model that I visualize in the conclusions of my thesis, and that also aims to visualize the idea of different building blocks existing independently at first, and being assembled into a collaborative process at a later stage.


Emergent linkages refer to another type of relationship between events. I found that actors involved in independent events often address similar (or the same) issues in their activities. It might, for example, be the case that different actors are working on different projects that are all related to biobased economy, but that actors do not react to each other’s projects. They may not even be aware of each other’s activities. To capture these similarities between events (i.e., the similarities of issues addressed) I introduced the notion of emergent linkages. I defined emergent linkages such that they refer to similarities between events, but that they can only exist between events that are not connected by a path of intentional linkages. The reason for the latter condition is that I was only interested in coincidental similarities. If similarities exist between events that are also connected by a path of intentional linkages, then there is a very strong chance that the similarity is also intentional. To explain what I mean, let me go back to my earlier example, concerning the intentional linkage between event A (a plan is developed), and event B (a feasibility study is performed). Say that the plan concerns the construction of a biofuel factory. In this case it will be logical that both the plan and the feasibility study address issues related to biofuel production. These similarities are not emergent, in my view, and that is why my definition of emergent linkages excludes them.

There are different approaches to identifying similarities between events. The approach that I used is described in Spekkink and Boons (in press). In short, in this approach the similarities between events are calculated as correlations between the column profiles of an incidence matrix, where the rows represent issues, and the columns represent events (a ‘1’ in a given cell indicates that the corresponding issue was addressed in the corresponding cell). The problem with this approach (and other approaches) is that the similarities between all events are calculated this way. This includes the similarities that I do not consider to be emergent. Thus, I need some way of filtering out the non-emergent linkages from the complete list of similarities between events. Based on my definition, I used to filter out the similarities between events that are also on a path of intentional linkages together (this was the reason why I wrote the function for finding paths in directed, a-cyclic graphs). Lately, I have been thinking about expanding this definition, because there are cases where two events are not on a path together, but do have one or more common ancestors. Similarities between these events may also be considered intentional in some cases. For example, consider the development of a large program (event A), the details of which are worked out in multiple, independent working groups (events B, C, and D). Events B, C and D may not be intentionally linked, but they do have a common ancestor (event A), and they all inherit issues from that ancestor. I considered introducing a more restrictive definition of emergent linkages where not only nodes that share a path of intentional linkages are excluded, but also events that have common ancestors. That is the reason why I wrote the CommonAncestors() function. I included the other three functions because it only took small adaptations to make them, and they might be useful for someone.

You can download the script with the functions here.