This blog posting, “Google Books: What’s Not to Like,” on the American Historical Association’s website is quite intriguing. Earlier this year I posted about one blogger’s extremely positive experience finding information via Google Book. This one is quite different. And, because diversity of thought is very important in forming one’s own opinions on things, I think it is important to read through this post and compare it to the one that I wrote and referenced earlier.
The author of this post, Robert Townsend describes his recent work with Google Books and didn’t find it a pleasant experience. Here’s a great synopsis of this blog post:
“Over the past three months I spent a fair amount of time on the site as part of a research project on the early history of the profession, and from a researcher’s point of view I have to say the results were deeply disconcerting. Yes, the site offers up a number of hard-to-find works from the early 20th century with instant access to the text. And yes, for some books it offers a useful keyword search function for finding a reference that might not be in the index. But my experience suggests the project is falling far short of its central promise of exposing the literature of the world, and is instead piling mistake upon mistake with little evidence of basic quality control. The problems I encountered fit into three broad categories—the quality of the scans is decidedly mixed, the information about the books (the “metadata” in info-speak) is often erroneous, and the public domain is curiously restricted.”
His critique seems like quite a good one. This is something that I’ve never gave proper thought to. What if the scans are not of good quality? Who is checking them to make sure it is readable? What about the mixing up of pages and getting them incorrectly? It’s great when you hear about stories of Google scanning 30,000 books in a week but that means that there could be countless similar errors in just those 30,000 books as was referenced by Townsend. Nobody really knows where Google is doing the scanning since they keep it secret. And, they probably don’t know who is working with the books. I wonder if it is similar to what I saw at the ACRL conference? With minimal human interaction of the book, I can see how this could be a problem. But, it makes sense that there wouldn’t be a human component to scanning the book other than setting it in place if they’re scanning as much as has been claimed.
In regard to the second problem he has with Google Books, it too is a very valid point. However, I think that what he’s finding is a problem, the metadata not matching the actual document, is a problem for not just Google Books but for libraries as well. True, it doesn’t make a whole lot of sense that the year should be wrong on this, but when journals switch titles back and forth (a great example is Atlantic Monthly which switches its name to and from Atlantic what seems like every 10 years) it becomes difficult to know what the actual title is.
The last point he makes about public domain books is definitely a problem. It is absolutely ridiculous that Google wouldn’t put those books that are printed by the government on their website without any trepidation. But for some reason they want to respect the copyright on those? It doesn’t make a bit of sense and I suppose the only saving grace is that much of what the government publishes right now is automatically put on the web so just go to the GPO website.
Finally, Townsend is right that it doesn’t make a whole lot of sense to do this digitization if you’re just creating more problems by doing it. So, maybe Google will see this and correct them before they become even bigger and more prevalent.