August 31, 2004

Searchable Photos: Part IV - XHTML Questions

The full “Searchable Photos” thread now includes:

  1. How To Make Photos More Searchable (Part I)
  2. Searchable Photos: Part II
  3. Searchable Photos: Part III
  4. Searchable Photos: Part IV

I have finally gotten my Web Photos Pro product to a place where I feel like I can breathe for a bit. The last three or four months have been an endless cycle of new features, bug fixes, and beta releases. So much so that I’ve felt a bit like Judy in Punch in Judy (ergo the photo that adorns this entry) – though I suppose it could also be due to the broken rib I got last week while mountain biking, the rib that’s been making me yelp with pain whenever I sit, stand, lie down, roll over, or rotate in just about any direction.

With this latest release, 1.0b12 (beta 12), out the door, I feel like I can get off the treadmill for a minute and do a couple of things that don’t involve programming – things like pay the bills, clean up my office floor, write documentation for Web Photos Pro, write some entries for this weblog, and get the PhotoRSS web site fleshed out. And oh yes, spend some time with the family.

Never one to do first things first, I decided to spend some time on the PhotoRSS web site. But I got stuck pretty quickly because I wasn’t sure how to go about extending XHTML. I know it can be done, but I’m not sure of all the details, and I don’t know if it’s possible to do it the way I’m thinking about. I looked at the docs, and I looked at various examples, and I found my eyes glazing over. Very quickly I came to the conclusion that it’s just like doing my taxes – it’s a heck of a lot quicker, and a lot less painful, to call in the pros. So here I am, calling all you XHTML pros.

Before My Questions: Some Background
(If you’d rather skip straight to the chase, you can jump down to “The Four XHTML Questions” below.)

Those of you who have been following this blog know that I’m very interested in making my photos more searchable. There are various ways that this can be done, and I’ve started to write up three ideas on the PhotoRSS web site.

Of the three ideas, the first two are relatively straightforward. The first involves creating a photo-centric XML namespace for RSS. I’ve done something similar before, when I was working   with   some   interesting   characters at Wired Digital. One of my skunk projects involved adding RSS 2.0 output to Lycos New and Wired News searches. One of the things I wanted to support was the ability to page through the RSS search results, and to do that I defined an XML namespace xmlns:WiredNewsSearchResults which provided things like number of results, next and previous links, etc.

The second proposal involves creating a new RSS-like format for photo syndication. While at Wired Digital I did a lot of work with XML. Not only did I architect, design and implement the first all-XML-transport site at Lycos (all communication between the front-ends and eight different back-ends used with Lycos News was via XML), but I was instrumental in getting Lycos to change all of its middle and back-ends to serve up XML, so that data could be reused anywhere on the network. No longer was adding a stock ticker to a web page a major project involving cross-group coordination and development, rather it became a simple 15 minute exercise to get and format some XML data.

But it’s the third proposal, the one about defining an XHTML tag that’s giving me trouble.

What Do I Want?
I want search engines to stop having to guess what my photos are about, I want them to know what they’re about. When I search for “Sunset over Daymer Bay” on Google I want to get back this and this, not this.

The way things are today, the best any search engine can do is look at the text and html near an image, and from it hazard a guess as to what the image is about. When you look at this photo search, I’m not sure whether we should be pleased by how many correct pictures were found, or disappointed at how many unrelated pictures there are. Whatever your thoughts, I think we can all agree that not only would it be nice if photo search results were more accurate, but it doesn’t seem as though it should be a particular hard problem to solve.

Two Different Ways To Make Photos More Searchable
To make photos more searchable the information about the photo – the title, description, keywords, location, etc. – must be made explicit. And there are only two ways that this can be done: 1) you can embed the data in the photo, or 2) you can bind the data to the photo’s <img> tag.

Many cameras already embed data in your photos, using the EXIF portion of the JPEG header. The big benefit of EXIF is that the data stays with the photo – link the photo to a web page and it’s there, email the photo to me and the EXIF data gets emailed too. But there are so many downsides – the data is invisible to the human eye, the photo has to be opened to see if it has embedded data, EXIF data is not as structured as it needs to be, the format has some severe limitations, most resizing programs do not keep EXIF data intact, EXIF editing is difficult to do, and the data cannot be context dependent – that it’s pretty obvious that EXIF is not the right way to solve this problem (at least not in the foreseeable future).

This leaves us with only one way to make photos more searchable – by finding a way to bind a photo’s data to its <img> tag. And this problem, how to best bind photo data to the html <img> tag, leads me to ask the following four questions about XHTML.

Continued at PhotoRSS.org »

Posted by: Frank @ 12:40 pm — Filed under: Comments (0)

March 5, 2004

Searchable Photos: Part III

Last October I wrote a piece entitled How To Make Photos More Searchable. I wrote it because I wanted to, well, make my photos more searchable, both locally, and from search engines such as Google.

But I also wrote it because I was frustrated by how hard it was to get photos up on my weblog. First I’d download from my camera to iPhoto. Next I’d copy the images I wanted to use to a temporary folder, and then open them in Photoshop where I’d resize and make thumbnails. After that I’d upload the various files via FTP, and finally I’d add some <img> tags to the weblog entry. All this could easily take 30 minutes minutes or more, and it made me quite grumpy (and still does).

In November I wrote a followup piece entitled Searchable Photos: Part II, where I described my camera, computer and weblog setups, and a wish list of things I’d like to have to make my photo to weblog life easier. The wishes included reducing the number of steps it takes to get a photo from camera to web, creating thumbnails automatically, displaying photos on my weblog entries as well as in album format, setting and displaying GeoURL and EXIF information, and of course making them more searchable.

Since early January I’ve been working hard on a product that I think will satisfy most of those wishes in the first version, and all of those wishes in followup versions. And I’m happy to say that the product is just about ready for release.

It’s a two part product. The first part is a photo album program called “Photo Album” (until such time as I can think of a better name) that runs on both Mac and PC. With it you can build photo albums, view the photos, add information to each photo such as Title, Description, Date, Location, Keywords and GeoURL. And you can even see, and add to, the EXIF information stored in the photo’s JPEG header (well, you’ll be able to as soon as I finish the EXIF parsing code).

The second part of the product is PHP-based software, called “Photo Album Server", that displays your photo albums as dynamically generated web pages. In addition to being able to display your albums and photos in customizable templates (and in various thumbnail and full image sizes), it has search capability, can display EXIF data, and lets you create public or private photo albums (the private albums are password protected). And don’t worry, if you don’t have a server that runs PHP, “Photo Album” (the Mac/PC program) can create great looking photo album web pages that can be displayed on any web server.

But that’s enough tease for now, I’ve got to get back to the coding grind.

I’ll be adding more information about the products and the release schedule over the next couple of days, so come back and visit us soon.

And if you want to be a beta tester, drop me a line.

p.s. I have a new weblog devoted to the product over at http://photoalbum.backtalk.com/

Posted by: Frank @ 4:19 pm — Filed under: Comments (0)

November 5, 2003

Searchable Photos: Part II

When I was in Italy I ate alone at lunch and dinner. There’s nothing worse than having to look like you’re enjoying dining alone (with nothing more interesting to do than look at your fellow diners all evening), so I would either bring a book to read, or a pen and paper to make lists.

One evening in Florence I made a list of features I wanted in photo gallery software that I could use with this weblog. When I came home I took a second look at that feature list in the context of the comments I received on How To Make Photos More Searchable.

Here’s what I came up with.

Step 1: My Situation

  1. Camera: I have two cameras, an EOS-10D and a Sony Cyber-shot DSC-P72. I had an EOS film camera for many years, so making the step to using a digital back with my existing lens was a no-brainer once one with enough features became available. I like to put the Sony in my pocket when the EOS is too bulky for where we’re going, e.g. hiking with the kids.
  2. Photo Software: I use iPhoto on an iBook to store my photos. I use Photoshop Elements 2.0 to resize the images for the web and create two sizes of thumbnails. I use ftp to move the variously sized images to my web server.
  3. Weblog: I use MovableType. I tried a couple of other packages, but for no particular reason, other than it was good enough, decided on MovableType. If I had it to do over again I might use PHPNuke, but it’s not worth the hassle of starting all over again at this point.
  4. Servers: I have a server sitting at a colo in San Jose that serves this site. I have an identical MovableType setup on my home PC so I can write entries locally without being connected to the net. I do this partly because I only get 33.6K connectivity here in Cornwall, but also so I can hack the site and not impact my server.
  5. Programming: I program in most anything – except Perl. I’ve always disliked command lines and shell scripts, and Perl reminds me of a big shell script. On the other hand, the palette of stuff we sent from the States that’s arriving later this week has a Perl book in one of the boxes, so I may dig into it a bit more next week (and I’ll let you know if I change my tune).

    My current favorite programming tools are PHP and MYSQL, as well as the excellent Zend Studio development environment. I used ASP a lot at Wired Digital (aka Terra/Lycos), but I prefer PHP’s syntax, features and capabilities. For database work I find MYSQL easier to install and use than SQLSERVER; and of course there is the cost issue to consider as well.

  6. Photo To Weblog Process: The process of getting photos from my camera onto my weblog looks like this:

    1. Use camera to take photos
    2. Connect camera to iBook
    3. Import photos into iPhoto
    4. Drag photos I want to use on the weblog from iPhoto into a /temporary directory in the Finder. A copy of each dragged photo is created in the /temporary directory.
    5. Use PhotoShop Elements batch capability to create 650 width large images, 150 width thumbnails and 60 width mini-thumbnails.
    6. Use FTP to move large, thumbnail and mini-thumbnail images to separate directories on this web server
    7. Write weblog entry. Insert photo by placing an <img> tag in the html.

Step 2: My Wish List

The wish list I’ve come up with is a direct result of the camera and weblog setup I have, and my desire to do more with geoUrl and GPS information.

  1. Simplify my photo to weblog process by reducing the number of steps it takes to get a photo from the camera to the web server.
  2. Add geoUrl or GPS info to my photos.
  3. Display photos (as pinpoints or mini-thumbnails) on a map using the embedded geoUrl or GPS info.
  4. Create thumbnails automatically on the web server rather than by hand using PhotoShop Elements. (It would be nice if I could create thumbnails that were as small (filesize) as those created by PhotoElement’s “Save For Web” feature.)
  5. Make it easy to add thumbnails, mini-thumbnails or mini-thumbnail galleries to a weblog entry (must be even easier than using <img> tag.)
  6. Display photos in gallery format.
  7. Display photos in multiple large format sizes as is done at photo.net.
  8. Expose EXIF, geoUrl and GPS info when the photo is displayed

Step 3: Next Steps

Do, or figure out how to do, the following:

  1. Add geoUrl or GPS data to EXIF headers of photos in iPhoto. I guess I need something like RoboGeo running on my Macintosh. And oh yes, a GPS receiver.
  2. FTP photos from iPhoto to my web server.
  3. Create thumbnails on my web server (probably using iMagick)
  4. Create a simple PHP-based API that I can use to display thumbnails and photos on my weblog entries.
  5. Write a simple, PHP-based, template driven, photo gallery viewer.
  6. Read and display EXIF data using PHP.

I’ve already started working on some of these items. My plan is to release anything I do as open source. I’ll post updates as things become available.


Online Research

As part of writing this entry I visited all of the sites suggested by comment writers. I also looked on the web for photo gallery software, or anything else that was somewhat related, e.g. EXIF and GPS. I found a number of interesting sites, all of which I’ve listed here.

General Searches

PHP Gallery Software

Other Gallery Software

GPS Hardware and Software

Mapping Software

Anyone know of a web site on which I can create a map with multiple waypoints? How does Blog Mapper do it?

Other Bits and Bobs

Posted by: Frank @ 11:06 pm — Filed under: Comments (0)

October 1, 2003

How To Make Photos More Searchable (Part I)
Scripting News had two items of particular interest to me yesterday. One was a link to Pheed.Com where they describe RSS photo feeds. The other was a link to WorldKit, a flexible mapping application that I’m going to load on my server later tonight.

I’ve been taking a lot of photographs on this journey, and will continue to do so where ever we live in Europe. I’ve been wanting to share the photographs in new and interesting ways (as opposed to just decoration on a weblog entry). For example, I’m thinking about walking a portion of the coastal walk, and would like to show the photographs on a map, without having to do a lot of extra work beyond specifying where the photos were taken. The WorldKit application may just let me do that – I’ll report back later.

I’d also like to make my photos more searchable. Unfortunately there’s no standard way to describe photo information such as title, caption, location, exposure, equipment, etc., in either the <img> tag, or in the RSS feed described by pheed.com, and so there’s no easy way for search engines such as Google to know where and when my photographs were taken, or what they’re about.

I’ve been playing around with search, photos and RSS for a couple of years now. About two years ago, I added RSS support to Lycos News search (search for Howard Dean to see an example). This RSS feed returns photos as well as news stories, but as it was built with RSS 0.91, there was no way to specify what was a photo and what was a news story.

But enough of the introduction, what I’d really like to do is propose two things. First, that the <img> tag be enhanced to support photographic parameters. Second, that the Pheed.Com photo RSS not use the Dublin Core, but rather specify a new “Photo Core” that is specific to photographs.

In my dreams the Photo Core would have GeoURL information describing the photo’s location information, as well as exposure and camera information such as is captured on Photo.Net.

If you peek at the source of the image on this entry you can see what an enhanced <img> tag might look like. The RSS feed is left as an excercise to the reader.

Posted by: Frank @ 4:02 pm — Filed under: Comments (0)