Google Books y sus metadatos
En Google Books: A Metadata Train Wreck:
My presentation focussed on GB’s metadata — a feature absolutely necessary to doing most serious scholarly work with the corpus. It’s well and good to use the corpus just for finding information on a topic — entering some key words and barrelling in sideways. (That’s what «googling» means, isn’t it?) But for scholars looking for a particular edition of Leaves of Grass, say, it doesn’t do a lot of good just to enter «I contain multitudes» in the search box and hope for the best. Ditto for someone who wants to look at early-19th century French editions of Le Contrat Social, or to linguists, historians or literary scholars trying to trace the development of words or constructions: Can we observe the way happiness replaced felicity in the seventeenth century, as Keith Thomas suggests? When did «the United States are» start to lose ground to «the United States is»? How did the use of propaganda rise and fall by decade over the course of the twentieth century? And so on for all the questions that have made Google Books such an exciting prospect for all of us wordinistas and wordastri. But to answer those questions you need good metadata. And Google’s are a train wreck: a mish-mash wrapped in a muddle wrapped in a mess.