Part of project proposal I’m working on currently is the network
(re)construction of geographical locations. For a first experiment I used a
publicly available data set provided by Flickr that
contains 100 million photos of which 49 million are geotagged. It can be found
here.
The dataset consists of 10 TSV files holding the information (like id, geotag,
user, etc.) of 100.000 photos each.
I used Apache Spark for the preprocessing of these
files and converted it into a GraphX graph.
An Edge between locations exists if at least 2 persons visited both and
locations are less then 10 degree apart.
Using Spark the resulting vertices and edges where written to disk and
transformed to a JSON object holding these values. Using
D3.js (world rendering, top) and
Processing.js (acyclic
BFS rendering with
starting point Vienna, bottom) this JSON object was loaded and the network
projected onto a Miller and
azimuthal projection respectively.
Vertices where drawn as translucent circles whose overlap creates different
color intensities on the map according to the popularity of the location. Edges
where drawn as translucent arcs with weighted line width that highlight highly
connected vertices via overlapping. The final renderings are based on ~170.000
vertices and over 4 million edges.
You have a question or found an issue?
Then head over to Github and open an
Issue please!