Announcing the NYPL Digital Collections API
The New York Public Library is pleased to announce the release of its Digital Collections API (application programming interface). This tool allows software developers both in and outside of the library to write programs that search our digital collections, process the descriptions of each object, and find links to the relevant pages on the NYPL Digital Gallery. We are very excited to see what the brilliant developers who use our digital library will create. In the following post, Digital Curator for the Performing Arts, Doug Reside, reflects on the importance of APIs in our age of digital information.
It is now April, when, as both Chaucer and T.S. Eliot observed, small roots shoot up from the ground signaling new beginnings. Twenty years ago this month, the European Organization for Nuclear Research (known commonly by its French-based acronym, CERN) decided to make the technology that powers the World Wide Web free for anyone to use — a move towards openness that led, in just two decades, to an explosion of innovation and unprecedented access to information.
Ten years later, in April of 2003, a government-sponsored project to map the entirety of the human genome was declared complete. This monumental accomplishment was the result of a worldwide collaboration among researchers who contributed their data to a common pool. Although some private companies attempted to patent their own contributions, President Bill Clinton declared in 2000 that the project would "continue its longstanding practice of making all of its sequencing data available to public and privately funded researchers worldwide at no cost." The potential, yet unimagined, uses for this data were felt to be too important to be stalled by limiting innovation to a few companies. The small green shoots of innovations in medicine and biotechnology are even now beginning to emerge from the seeds of this decision.
In theory, the free and open standards of the web should allow data sources like the human genome project to be easily combined with others and enable new discoveries, but in the early days of the Internet many important data sources remained isolated from each other. What T.S. Eliot wrote in The Wasteland nearly a century before is an apt description of the situation:
What are the roots that clutch, what branches grow
Out of this stony rubbish? Son of man,
You cannot say, or guess, for you know only
A heap of broken images[...]
To make sense of these scattered pockets of data, some programmers designed API (or application programming interface) to make their information more usable. An API is a set of commands that computer programmers expose to the world to allow other programmers to perform an action on their systems (often to retrieve data). Programmers use APIs to take scattered data sets (a heap of broken images?) and combine them together to create new knowledge. For instance, if you've ever seen a webpage that mapped events (such as job openings or real estate) on a Google Map, the programmers probably used the Google Maps API.
Today, in the spirit of the seedlings of openness that sprouted in past Aprils, I am very pleased to announce the first release of the New York Public Library Digital Collections API. This API, built by developers in our IT Group, allows computers to search our digital library and get back information about the objects along with links to the relevant Digital Gallery page. Of course, as a human, you can already do that using the Digital Gallery itself, but you can only perform one search at a time. If you wanted to make a chart of say, the most commonly occurring words in the titles of the Mid-Manhattan Picture Collection, it would take a while. Now that the API makes this data available to computer programs, though, it wouldn't take a great deal of coding to generate such a chart (I'll leave that as a challenge to you hackers out there... post your solutions in comments).