Till Nagel


Processing geo information in Wikipedia articles

by Till Nagel.

This tutorial explains how to lookup and geocode places in a Wikipedia article and visualize those with the help of Processing. It gives an overview on related topics, shows how to display placemarkers on a map, and provides mutiple Processing examples, below.

This brief article, as well as the examples were created for the GeoWiki workshop I gave at the Interface Design department of FH Potsdam.

Placemaker

A gazetteer is a geographical dictionary or directory, a reference for information about places and place names.

Yahoo Placemaker is a “geoparsing web service” that enrichs content with geographic metadata by extracting places from unstructured texts. It finds, identifies and disambiguates place names from textual content and returns the place with its geo-location and additional information, such as its type (e.g. country, or town). Different place names can be recognized (e.g. “New York” and “NYC”), as well as multi-lingual references (e.g. “München”, “Munich”).

YQL

For using Placemaker you either need an App ID to query the Placemaker API directly, or you can use YQL for it. YQL is a query language to retrieve and manipulate data from various web services, and thus is convenient for mashups. It simplifies the access of diverse APIs by unifying and connecting their interfaces. YQL is influenced by SQL but diverges from it as it provides specialized methods to query, filter, and join data across web services.

Very helpful to live test YQL queries is the YQL console. Here you can experiment with all the web services, execute queries, and directly retrieve the results.

YQL Console

Into the text field (1) you can enter YQL, and after executing that query you will see the original results as XML or JSON (2). On the right side (3) you find all enabled web services to integrate. For many purposes the data category may be of interest; there you’ll find methods to access data from the web, such as HTML pages or RSS feeds.

Using Placemaker via YQL

Let’s take a look at a simple place extraction query: The YQL below utilizes the Placemaker API to analyze the given text in documentContent.

SELECT * FROM geo.placemaker
	WHERE documentContent = "Walter Gropius was born in Berlin."
	AND documentType="text/plain"


This query returns an XML result with all recognized places. In this example, “Berlin” is the only found place from that text snippet.

<place xmlns="http://wherein.yahooapis.com/v1/schema">
	<woeId>638242</woeId>
	<type>Town</type>
	<name><![CDATA[Berlin, Berlin, DE]]></name>
	<centroid>
		<latitude>52.5161</latitude>
		<longitude>13.377</longitude>
	</centroid>
</place>

The returned place consists of the geo position, as well as further data elements. The woeId is a unique identifier for that place, while the type indicates which kind of physical place this data object is about. The name holds the canonical english name.

Extracting places from web pages

With YQL you not only can access various web services, but also query data from different sources. The following returns the complete HTML body of a given page.

SELECT * FROM html
	WHERE url="http://en.wikipedia.org/wiki/Walter_Gropius"

As we want to extract places from Wikipedia articles we need to combine the content of the Wikipedia web page with Placemaker:

SELECT * FROM geo.placemaker
	WHERE documentURL="http://en.wikipedia.org/wiki/Walter_Gropius"
	AND documentType="text/html"

which returns all found places with their geo-positions. A small excerpt of the result looks like this:

<matches>
	<match>
		<place xmlns="http://wherein.yahooapis.com/v1/schema">
			<woeId>2515601</woeId>
			<type>Town</type>
			<name><![CDATA[Wayland, MA, US]]></name>
			<centroid>
				<latitude>42.3632</latitude>
				<longitude>-71.3604</longitude>
			</centroid>
		</place>
		...
	</match>
	<match>
		<place xmlns="http://wherein.yahooapis.com/v1/schema">
			<woeId>654369</woeId>
			<type>Suburb</type>
			<name><![CDATA[Gropius-Stadt, Berlin, Berlin, DE]]></name>
			<centroid>
				<latitude>52.4288</latitude>
				<longitude>13.4529</longitude>
			</centroid>
		</place>
		...
	</match>
	...
</matches>

So, now that we have got access to the places mentioned in a Wikipedia article, let’s use them in a Processing sketch.

Reading XML in Processing

To be able to read and use the geo-positions, we need to request the YQL and parse the resulting XML. In Processing this is very easy to do: Simply create a new XMLElement with the URL of the web service to use. After that, parse the returned XML to read the result values of the called API method.
See the paragraph on XML reading in the RSS tutorial for further information.

Call YQL and count place matches

Let’s create a new XMLElement and provide a URL in the constructor, and count and print the number of found places. The restUrl used in the example is the REST service with a YQL query, where the parameters are URL encoded, e.g.
http://query.yahooapis.com/v1/public/yql?q=SELECT%20*%20FROM%20geo.placemaker%0A%09%09WHERE%20documentURL%3D%22http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWalter_Gropius%22%0A%09%09AND%20documentType%3D%22text%2Fhtml%22&format=xml
Simply copy it from the YQL console, “REST query” text field.

Now that we have the xmlResponse we walk through the XML structure to the elements to use.

XMLElement xmlResponse = new XMLElement(this, restUrl);
XMLElement[] placeXMLElements = xmlResponse.getChildren("results/matches/match/place");
println("Found " + placeXMLElements.length + " places");

In this example we get the place of every match with xmlResponse.getChildren(path). The path parameter specifies which elements to return as array. The hierarchical XPath expression results/matches/match/place selects all place elements.

Read name and geo position

We can iterate over all the places, and access their titles and geo-positions. See the section on the Place XML structure for a more detailed field description. The elements are accessed via the getChild(index) method in the following example.
Note, that the latitude and longitude values need to be converted to float prior to using.

for (int i = 0; i < placeXMLElements.length; i++) {
	String name = placeXMLElements[i].getChild(2).getContent();
	float lat = new Float(placeXMLElements[i].getChild(3).getChild(0).getContent());
	float lng = new Float(placeXMLElements[i].getChild(3).getChild(1).getContent());

	println(i + ". " + name + " (" + lat + ", " + lng + ")");
}

This example prints all names and positions of the places in the XML.

Visualizing places

To draw the geo-positions with latitude and longitude onto the Processing canvas they have to be projected onto a planar surface. There are various map projections for different usages, with specific advantages and disadvantages. You can find a neat Projection reference with an overview on map projections and their applications over at Radical Cartography.

Different map projections: Mercator, Goode Homolosine, Gall-Peters (left to right)

Draw positions in Processing

The Equirectangular projection is a very simple map projection, used mostly for thematic maps and geovisualizations. The geo-coordinates can be mapped directly onto the cartesian coordinate system.

float x = map(longitude, -180, 180, 0, width);
float y = map(latitude, 90, -90, 0, height);

Now you can visually represent every place by using whatever Processing drawing methods you like.

Background world map

Displaying the positions solely on a blank canvas may not be sufficient to understand the geospatial relations. One of the simpler approaches is to display a map in the background. In Processing just load a map image, and draw it before drawing the place markers.

PlaceMarker class

Find below a small class for the place markers. This could be used as basis for own geo visualizations.

class PlaceMarker {

  String name;
  float lat;
  float lng;

  PlaceMarker(String name, float lat, float lng) {
    this.name = name;
    this.lat = lat;
    this.lng = lng;
  }

  void display() {
    // Equirectangular projection
    float x = map(lng, -180, 180, 0, width);
    float y = map(lat, 90, -90, 0, height);

    noStroke();
    fill(255, 0, 0, 50);
    ellipse(x, y, 15, 15);
    fill(255, 0, 0, 200);
    ellipse(x, y, 5, 5);
  }

  String toString() {
    return name + " (" + lat + ", " + lng + ")";
  }

}

5 comments on ‘Processing geo information in Wikipedia articles’

  1. [...] Geodaten und mehr mit Yahoo Query Language März 2, 2012 Mag. Thomas Koberger Hinterlasse einen Kommentar Kommentare lesen Das wollten wir doch immer schon. Geodaten mit beliebigen Suchbegriffen verbinden und noch vieles mehr. Ein Service von Yahoo, nämlich die YQL (Yahoo Query Language) macht es möglich. Es handelt sich dabei um eine an die Syntax von SQL angelehnte Abfragesprache, die mittlerweile sehr vielen Datenbanken abfragen durchführen kann. Eine gute englischsprachige Erklärung zur YQL gibt es von Till Nagel. [...]

  2. [...] diese Projekton hat Till Nagel eine eigene Placemarker-Klasse geschrieben, die ich für mein Projekt leicht verändert habe. [...]

  3. mcmoe says:

    Hi Till,

    the yql query to parse locations from a website does not return any matches:
    SELECT * FROM geo.placemaker
    WHERE documentURL=”http://en.wikipedia.org/wiki/Walter_Gropius”
    AND documentType=”text/html”

    Any idea why?

    The queries on text work just fine though.

    I tried searching the web – nothing much on the topic. Could it be that it is now part of the Yahoo Biss premium service?

    Great work by the way, on this and your other processing (and non-processing :) ) articles!

    Thanks,
    Moe

  4. PlaceMaker became PlaceSpotter in 2012. The Yahoo! BOSS APIs (including PlaceSpotter) were permanently discontinued on March 31, 2016. If anyone is looking for a replacement or alternative, we recently launched Geoparser.io — it’s free to use during the public beta period and you can request an API key through the website: https://geoparser.io/

Leave a Reply