Thursday, October 4, 2007

More on Google & Microsoft Book Searches

I have been doing some more experimenting with both Google Book Search and Microsoft Live Book Search. I continue to be amazed at the speed with which these collections are growing, though neither Google nor Microsoft are publishing any statistics on the actual size of their databases, making evaluation difficult. (I have emailed Google about this lack of statistical data -- if they send me any information back, I'll let you know.)

I have, though, added both Google Book Search and Microsoft Live Book Search to our list of available library databases, and I am adding them to my information literacy class to give students some tips on how to make use of these databases. The extent of the available resources in these two collections makes them too valuable to ignore.

As an experiment, I searched 30 titles from our collection in both Google and Microsoft that were published before 1923 and therefore are in the public domain. From this highly unscientific sample, I found the following:

Available full text in Google: 10
Available full text in Microsoft: 4
Limited access in Google:10
Not available in Google: 10
Not available in Microsoft: 26

(Of the 10 that had limited access in Google, 7 were recent reprints--therefore under copyright--but 3 were the original, pre-1923 texts and should have been available full text.)

This then got me to wondering about more recent books, so I searched another 30 titles from off our new book list, picking titles that were published at least a year ago in case brand new items would not have had time to find their way into the databases:

None were available full text (not too surprising)
None were available in any form in Microsoft
Limited access in Google: 7
Listed in Google, but no access: 15
Not in Google: 8

"Listed in Google, but no access" is what Google calls "No preview available". You can find the title by searching title or author information, but the contents of the book are not available even for key word searching, much less previewing. I had not been aware that "No preview available" meant that the book contents are not searchable, but my own efforts to search key words and phrases from some of our "No preview available" titles confirmed this. In spite of their We-have-the-right-to-scan-any-book assertions, Google for whatever reason is not providing any access (beyond a title listing) to a certain percentage of their scanned database.

[Excursus: I find searching in Microsoft Live Book Search to be frustrating. There is no advanced search to allow searches by title or author; putting phrases from the title in quotes seems to be the most efficient way to search Microsoft. The response time is also much slower than Google, especially when scrolling through the results. I also found it odd that I kept encountering listings in Microsoft that said "Book Removed -- This book is no longer available" with no information to identify what the original book had been. If it's not available, why bother to provide a "Book Removed" listing?]

One practical question comes to my mind after this experiment: There have been rumblings that Northern Seminary may sell our current campus and relocate. This has necessitated a lot of work on my part to evaluate our collection and design possible scenarios for what to hypothetically do with the library collection should it theoretically be relocated to any number of putative sites. Most of these possible scenarios involve off-site storage for a certain percentage of the collection. Identifying titles available in Google Books appears to be one way to select volumes for storage. Dare one download and archive one's own copy of the PDF file and discard the book altogether? What would you do, hypothetically?

1 comment:

Matt said...

FYI, Google announced in May that it would be including metadata from over 20 union catalogs in their search results. So not everything in Google books has been scanned. See:

As to your hypothetical, I'm slowly starting to believe that an ebook that is a viable alternative to print is coming and I think Google will be a big player.