On November 2009 I had the wonderful opportunity of taking part in Visualizar ’09, a workshop on data visualization organized in Madrid, Spain, by Medialab Prado.
This year’s idea was to work with data produced by governments and public institutions. My proposal was to visualize the gigantic (and fascinating) data body of US patent registrations over the last few decades. It was selected, and a few weeks after that I was flying over the ocean for my first visit to Madrid in almost two decades.
The workshop developed over two intense and very productive weeks, helped by the support of some excellent teachers: Ben Cerveny, Aaron Koblin, Manuel Lima, Andres Ortiz and Santiago Ortiz. There were 10 projects participating: both the ideas and the groups that carried them on as fat as they could were very diverse, and I think the interchange was enriching for everyone. You can see a summary of the projects at the Medialab Prado site, or in this post at Information Aesthetics
The experience did not only take place at the facilities of Medialab Prado, but during lunch time and every night at the uncountable, charming little cafes and restaurants of Madrid, where many new friendships were born. Networking is without a doubt one of the great things about participating in events like this. Internet does not beat physical presence when it comes to (really) know somebody new.
My project counted with the invaluable assistance of two collaborators: Javier Tardáguila and Alberto Labarga, both of them Spanish. We could develop three visualizations over the two week span. There are more ideas: patent registrations are an incredibly rich material, a source of very detailed data on the technological innovation process in USA and the world at large, and its ties to universities, companies, and other institutions involved in the process. Still, I’m happy we could reach the point of offering different perspectives on the data, and I expect to have the chance to carry the project further in the near future.

The first visualization is a classic streamgraph displaying number of patent registrations over time. It is intended as a first approach to aggregated numbers that might reveal broad trends over time. A dropdown menu let you choose graph segmentation, by technological category, country of origin, or US state. Mouse hover shows the specific number of registrations for a given year and category, and clicking brings a list of the 10 most cited patents among them.

The second visualization dives in to one of the most interesting features of patent registrations: the citations of prior art that every registrant is required to declare. Citations give us an explicit network graph that we can navigate to track influences among inventions, and an implicit measure of relevance, since the most cited patents tend to correspond to the most important inventions, commercially and scientifically.
Here, each patent is represented by a circle, and both citing and cited patents are shown on a timeline. You can click on any of them to unfold its network, and thus navigate the graph as far as you wish.

The third visualization uses text analysis tools to extract the five most relevant words from each patent, and used them to build a conceptual map of the invention universe. The time dimension is dropped here, to focus instead on the most frequent words and their relative distance in an abstract “thought space”. The map is pre-calculated and static, but you can use Google Maps-like tools to zoom and pan around it. Clicking any of the words brings a list of the 10 most cited patents associated with it.
The main issue we had to deal with during development, and what I learned most about technically, is database optimization. Even with a data body that only extend for the moment until 1999, we are dealing with more than 3 million patents, and 16 million citation links. That makes for huge tables and queries that might take full minutes to complete if you are not careful. That is, at least in relationship to anything I have done until now. I suppose this would be a bad joke for Google, but hey, I don’t have their server farms.
That’s the reason why this is still not online – the database still needs some massaging and a few wrinkles in the visualizations need to be ironed out. But I hope it will be available for anyone to play soon. Watch out for the update.
Need a list of Excellent
Need a list of Excellent intersting abstract art & artists. . (not just the dots and shapes but objects that is half realistic?). . Thanks.. OKAY.. rephrase question.. . I want art with objects and photos in them with meanings like emotions, beliefs. . NOT JUST COLOURS AND STRIPES... . http://coolartdude.com/
This stuff is astonishing!
This stuff is astonishing! Yours is a GREAT site! Found it on Visible Certanty's blog roll and put you into my Data Visualisation References resource list! (Will be updated a little later today.)
Post new comment