Open Sourcing The UN Data API

undata-logo

Link to repository: https://github.com/3scale/un_data_api
Link to wiki: https://github.com/3scale/un_data_api/wiki
Sign up for the API: https://www.undata-api.org/

We are excited to announce that we have open-sourced our UN Data API.

We built this API for a few reasons. First, to make the data more easily accessible – the UN has a lot of interesting and useful data and we thought developers should be able to access it (read on for more detail on what’s included in the data).  It was also an opportunity to provide an example implementation of a RESTful API for those interested in learning more. We hope  putting the code out there might help others run a similar service and or give us feedback on how to improve it.

We have written a detailed wiki on how to setup and get the API running yourself. It also includes directions on how to use the giant XML Parser we built into the project to parse databases.

So why did we choose data.un.org?

The UN Data site has a large database aggregating data from a multitude of important organizations throughout the world. Some of the organizations include the World Health Organization, the International Telecommunications Union, and the United Nations Statistics Division. The problem is that you could only access the data by downloading the files in XML, CSV, and a few other formats. This leads to a serious problem when it comes to analyzing the data. Namely,  you’d have to download all the files and parse them into your own database for analyzation. Additionally,  between the different organizations, the country names and other information used were not consistent.

The data on the UN Data site has great value and should be easily accessible to developers but it needed to be easy to get to. Also, having all the information normalized by country would eventually make it easier to query between different organization’s databases. Our first goal was to get as much of the data as possible into our database and then make more complex queries available to the user. It was important to us to store the data with the minimal amount of manipulation and duplication.

Methodology

The first step was to choose which framework to use. I am a Ruby developer so I opted to use Ruby on Rails with the Rails-API Gem which generates your project without the front-end portion of the project.

Next was to decide which database to use. We chose MongoDB.  Here’s why:

  1. It has a schema-less design. Most of the databases have different attributes on their data. This poses an issue when downloading large amounts of data from different databases with widely varying field names. In the long run we would not want to make a new schema each time a new database was being added. With Mongo it’s easy to detect additional attributes and add them to a record.
  2. The database would be used for mainly read purposes, unless adding additional databases.
  3. It can  beoptimized to extract large static chunks of data at once.

We also decided to use an ORM, MongoMapper to help keep the codebase readable and also to implement validations.

The API Design

Because it’s important that the API be as easy to use as possible, we decided to go with an intuitive RESTful API design. The main URL format is Organization/Database/Dataset/Country/Records. It serves JSON as default. If preferred, you can specify the format as XML.

The hardest part of the project was normalizing all of the country names. In the end we had to make a decision on how to present all of that information. To solve the problem we correlated all the countries together and added an attribute to the record that stores the name of the country as it was presented in its original database.

Implementing 3SCALE

This API does all of its authentication using 3SCALE’S API Management system. 3SCALE offers services to authenticate, monitor and rate limit your API. We have two main options to implement the API management system. The first are plugins, which is language specific code package that you add to your code directly. The second option is to use a proxy. The UN Data API uses the plugin option.

Checkout the gem for the 3SCALE plugin.
Checkout the UN Data API wiki on how we implemented 3scale

Checkout our project! https://github.com/3scale/un_data_api  We hope you find it as interesting as we did – as well as full of possibility. Thank you!