Monthly Archives: July 2016

New piece of software: CSV-parser for Gephi

I wrote a new program, primarily intended for personal use, but probably useful to others as well. To explain why I wrote it I have to give you some background:

In my own research I tend to work with so-called Event Sequence Datasets (see my thesis), which I at some point store in a CSV format, because that is a format that many software packages I work with can import. One characteristic of the files that I work with is that cells can have multiple values that are separated by some delimiter character (usually semicolons). Often, these values are codes that I assigned to some entity in the dataset. I like to visualise the contents of my datasets with Gephi, and Clement Levallois wrote a very useful plugin for Gephi that allows you to import data from files that have multiple values in one cell. However, I wanted to have a bit more control over the process of parsing the data from my files to something that can be read by Gephi (as well as other software, such as the Neo4J CSV-importer). This includes the possibility to add properties to the nodes list, based on information included in the input file. Also, because I potentially want to use the output files for other software as well, I wanted something that works independent from Gephi. I therefore decided to write my own CSV-parser, and chose to do it in C++, making heavy use of the Qt4 GUI library.

Basically, the program can be used to import a csv file that contains data on entities that you want to visualise as a network. For example, the csv file may have entities in different columns that you want to relate to each other, or you may have multiple entities in a single column that you want to relate to each other (e.g., a co-authorship network). The program allows you to select the appropriate source and target node, assign properties to these nodes, indicate the type of relationship between them (directed or undirected) and assign a label to the relationship. After specifying the desired settings, it is possible to export a nodes file, as well as an edges file, which can be imported easily into Gephi (via the data laboratory) or into other software packages.

Currently, the program does not support the creation of dynamic networks directly, which is something Clement’s plugin does support. This feature is not currently included because I do not immediately need it myself, although I might still add it in the future. I am also thinking about creating an export-to-gexf function.

The program can be downloaded for Windows and Linux here. The source code is open, and can be found here.