Documenting Picasa

Providing documentation on Picasa and Picasa Web Albums - photo organization software and services from Google.

Saturday, August 12, 2006

How did the search get lost?

Google knows a thing or two about search, so you would think that they would provide a search feature in whatever new services they introduce. Why then does Picasa Web Albums not have a search facility? Whilst the roll out of Web Albums is kept to a slow pace by the invitation process, the number of albums is climbing, and if the service is to have a future, people are going to want to find images that other people have placed in their albums.

At the moment the only presence Picasa Web Albums have in Google search results is shown by doing a search for "site:picasaweb.google.com", which shows around 30,000 results. (Such figures are rarely accurate, but I'll ignore that for the moment).

Since Google is not indexing the picasaweb site (its robots.txt file excludes all indexing spiders), what this shows is that there are links from other webpages that point at around 30,000 different galleries or albums hosted at picasaweb. Note that the results do not show any text snippets - which is further indication that Google only includes the items in its index because there are incoming links, and the page itself has not (and will not) be spidered.

If the page contents are not in Google index, then the only information that can be indexed for these pages consists of the page URL, and the text of the link pointing at the page. Thus, the obvious search strategy of using "site:picasaweb.goole.com {queryword}" has very limited success.

In fact, this is so little information that even words which represent hugely popular photographic subjects such as "wedding" or "vacation" provide less than 10 matching results. A little more investigation came up with the following as being the top search words for finding albums:
  • photo (391 results)
  • picasa (316 results)
  • here (252 results)
  • album (233 results)
  • photos (179 results)
  • my (166 results)
  • pictures (130 results)
  • gallery (124 results)
  • the (93 results)
  • 2006 (70 results)
  • fotos (62 results)
  • click (58 results)
  • wedding (6 results)
  • vacation (5 results)
  • baby (1 result)
So, the general result is that there is no way to reliably search for Picasa Web Albums - all the top results are simply words that are regularly used in link text, but which convey little information about the target contents - including a big contribution from that most useless of links "Click Here".

The other aspect of the information available for searching is the URL, which consists of upto 3 parts:
  • the fixed part "picasaweb.google.com"
  • the account name
  • possibly the album name
The account name may be a gmail account name, or may be an alternative name chosen so as not to expose the gmail account name. As such, these names follow the gmail naming conventions - which is that they contain at least 6 characters, and that by-and-large they are not to be found in a dictionary (Google seemed to remove most dictionary words from the gmail namespace, to cut down on email spam via dictionary attacks). This generally means that the account name is little clue to the photos you may find within its albums - beyond the fact that the name (somehow) relates to the photographer, and so may be relevant if you already know of the photographer.

That leaves the final component of the (optional) album name. Many albums have multiple word descriptive titles, which are used to construct the album name part of the URL by removing spaces from the title. This results in a lot of album names being (from an indexing perspective) nonsence names, since the compound text is not being indexed under its component parts.

In conclusion, Google needs to introduce an internal search facility for Picasa Web Albums as soon as possible - external search systems cannot substitute, since they cannot get at the data necessary to build an index.

Labels:

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home