Jump to content

Recommended Posts

I was wondering how, from a logic perspective, web bots are able to pick out the content picture in html.

 

For example, Google News is very good at this, picking out the picture in the article that the news is coming from.

Example 2: Facebook also does a pretty good job when you post a link and it puts a picture next to it. It is almost always the most appropriate image.

 

My thoughts on this so others can bounce off... mostly i've come up with ways that don't work.

- Its not always the biggest pictures (header pictures can be very large in area)

- you can't rely on someone naming a div 'content' so you can't narrow it down that way

- the only one i came up with.. a lot of background images are inserted through CSS while the image i'm looking for should be an image tag. This helps increase the odds but still doesn't guarantee the right image, not to say thats possible, but Google News seems to do very well.

 

Thanks for the help

facebooks way of doing things is quite slick.

 

(for anyone that doesn't know, when sending a message, all you do is past the URL of the article in the message window, and an AJAXy thing inserts a brief article preview as well as an image)

 

it lets you pick from any of the images on the page (aside from background ones). i've not noticed it pick the exact image yet (as i've only tried it once),  but my guess would be it's just analysing the pageflow a bit, excluding things like 'logo.gif' (and other common element names), and picking an image close to a header tag providing the page is sort of formatted well enough.

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.