Documenting Picasa

Providing documentation on Picasa and Picasa Web Albums - photo organization software and services from Google.

Tuesday, March 20, 2007

Picasa Web Albums learns to count

Picasa Web Albums has finally learnt to count.  When it was first released, the community search feature of Picasa Web Albums was unable to count accurately - any search you did on a popular term returned a number of matches that was way up in the many millions.

These figures were frankly unbelievable, and the fact that the number varied wildly from search to search indicated that these figures were not to be trusted. 

However, Google seems to have fixed this problem, searches now return reasonable numbers for most queries, of the order of thousands of results rather than many millions.

For the first time, this allows us to do an estimate of the size of Picasa web albums by seeing how many results it has for a number of popular terms.

My searches showed the following numbers:

the 1128279
and 1075975
in 823335
of 782688
a 600900
trip 550325
my 427027
at 391763
photos 340944
on 317398
new 307207
wedding 288286
for 281272
2 268007
1 238600
de 217689
3 203786
party 195887
by 172345
march 157793
family 157555
album 156220
birthday 153792
10 147692
park 143186
me 140743
house 133573
san 133416
city 125609
christmas 125569
la 125095
home 115748
4 108715
london 104605
st 103432
friends 99035
ca 94977
beach 91139
summer 78804
york 78482

I've included the top 40 words that I found, from a selection of around 120 likely high scoring words.

So, the top two result give a just over a million photos each.  We can double check those numbers by using the - (subtraction) operator to check on how many of the pairs of words are disjoint.  This gives the results

the -and 679780 + 1075975 = 1755755
and -the 616847 + 1128279 = 1745126

the -in 740250 + 823335 = 1563585
in -the 416885 + 1128279 = 1545164

the -of 686066 + 782688 = 1468754
of -the 333947 + 1128279 = 1462226

the -a 860211 + 600900 = 1461111
a -the 309792 + 1128279 = 1438071

trip -wedding 552249 + 288286 = 840535
wedding -trip 287401 + 550325 = 837726

In all cases I've taken the result of subtracting a word, then adding the count of that word to the resultant count.  The fact that pairs of totals here are approximately equal gives me greater confidence in the individual numbers themselves.

Note that the top value here gives us a minimum estimate of the total size of Picasa Web Albums - there are something over 1.7 million searchable photos that include "and" or "the" in their descriptions or album names.  Of course that vastly under counts the number of searchable photos, since I'd suppose that the vast majority of photos in Picasa Web Albums are actually unlabelled, or placed in fairly generic named albums which don't involve either of these words.  I'd guess that this makes our figure an order of magnitude out - so lets guess that there are 17 million public photos.  I also suspect that there are as many unsearchable photos as there are searchable ones - many users are making use of non-public albums, and even when they do have public albums they have not elected to add them to the community photos search.  So from these I'm guessing that Picasa Web Albums is hosting around 34 million photos.

For comparison, something over 425 million photos have been uploaded to Flickr - making it currently over 12 times the size of Picasa Web Albums.

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home