Tables for layout

Daniel0 · December 29, 2008

Well, not using tables for tabular data is just as wrong as using tables for non-tabular data. If you use CSS+divs to emulate tables then your page will be broken without the presentational layer. Dependencies must only exist in the way that a higher layer depends on a lower (e.g. CSS depends on HTML because the CSS is describing the HTML, and JS depends on CSS and/or HTML because it might change that based on certain circumstances). Not the other way around. Otherwise you will have accessibility and SEO issues. This page for instance works perfect even though it only uses the content layer (yes, I know browsers have built-in default styles).

nrg_alpha · December 29, 2008

Well, not using tables for tabular data is just as wrong as using tables for non-tabular data. If you use CSS+divs to emulate tables then your page will be broken without the presentational layer.

Yes, I agree in that regard. But your browser's CSS capabilities are either enabled or disabled. I am going to go out on a limb here and 'assume' (and yes, I know what they say about that) that most users have CSS enabled. Especially in this day and age where more and more sites are starting to use reduced markup in conjunction with CSS, or just using CSS at all, even in conjunsction with tables (since CSS support is integrated into all major up-to-date browsers [to varying degrees]), it is a relatively safe bet that people viewing data presented via divs / css would not be problematic. As to disable CSS in your browser risks breaking the entire presentation of the whole page itself, let alone any 'tabular divs' using css as a replacement of tables. So IMO, its a moot point (based on the assumption of course that we are building a tabular data presentation system from the ground up.. not being handed a table to begin with.. in that case, yes, perhaps just sprucing up the table with CSS would do ok). However, by going the div / CSS route, we once again have cleaner, leaner markup for tabular data then leaving it as table based.

I understand that tables were introduced for just that purpose (tabular data).. but if I was to build such a system from scratch, I would personally side with divs / CSS for the reasons I mention above.

EDIT - I am going to assume what you mean is that if the CSS layer is not enabled, tables as tabular data is still intact and readable to the person viewing it? In that sense, yes, css based data would go to hell in a hand basket in a hurry

Daniel0 · December 29, 2008

Are you telling me that you would replace the following piece of markup with divs and CSS and call it cleaner (thus better)?

<table summary="A list of the PHP Freaks users and information about them.">
<caption>PHP Freas User List</caption>
<thead>
	<tr>
		<th>User ID:</th>
		<th>Username:</th>
		<th>Email:</th>
		<th>Title:</th>
	</tr>
</thead>
<tbody>
	<tr>
		<td>1</td>
		<td>jdoe</td>
		<td><a href="mailto:john.doe@example.com">john.doe@example.com</a></td>
		<td>Member</td>
	</tr>
	<tr>
		<td>2</td>
		<td>Daniel0</td>
		<td><a href="mailto:daniel.egeberg@phpfreaks.com">daniel.egeberg@phpfreaks.com</a></td>
		<td>Administrator</td>
	</tr>
</tbody>
</table>

It is tabular data and it belongs in a table.

This clearly defines the structure of the content in a semantic manner which will enable visitors that have no pragmatic understanding of the content's visual aids thus being able to translate it into meaningful and understandable content. Those visitors could be blind people using a screenreader or it could be search robots indexing your site. Anyone turning that into divs and make it look like a table don't really know what they are doing. They are just jumping the "tables=evil; divs=good" bandwagon without any real knowledge of how it works. You are also forgetting something, screenreaders and search engines cannot see all your fancy styles. A screenreader will not say: "Red text with a background image of a bear: Lorem ipsum dolor sit amet, consectetur adipiscing elit."

It doesn't really matter whether most people use a browser that has CSS enabled. Your page should work regardless. I might find myself with a broken desktop environment and therefore need to use a browser like links from the terminal. The thing is that it's really easy to make things work with only the content layer, so I fail to see why somebody would not make it like that except due to ignorance.

CSS is not and has never been a replacement of tables. Tables, on the contrary, have been a replacement for CSS but should never have been that. Today when things are turning and people are starting to use CSS for layout you hear all those people who try to convert legitimate tables into something else, which is really stupid actually.

A benefit of using tables is for instance that if you use the <thead> then the table header will be repeated on each page when you print it (same goes for <tfoot> should you ever find a need for that). Screenreaders will also read aloud the content in a manner that makes you able to understand that it's a table of data. Incidentally, the same is true if you misuse tables, which will just confuse the heck out of them.

Finally, I'd like to see your table->css+div conversion of the above table as well as notes on why it is cleaner and better. There must also be no accessibility drawbacks and it must behave like a table.

nrg_alpha · December 29, 2008

As I attempt to construct it via divs / css, I begin to see the inherent pitfalls. So I concede on that front. Indeed, thanks to built in table functionality that automatically bolds text contained within <th> tags as an example offers a 'short-cut' of sorts that would require an additional line of CSS to acheive the equivalent. So I see your point. For tabular presentation, tables are better (afterall, they were designed for such purposes intially.. I just figured CSS would play out well on this front.. and it could.. but with a bit more effort, and at the loss of semantic structure).

You are also forgetting something, screenreaders and search engines cannot see all your fancy styles. A screenreader will not say: "Red text with a background image of a bear: Lorem ipsum dolor sit amet, consectetur adipiscing elit."

Well, as we already established, we know spiders do not take CSS into account (and to be quite frank, who cares? Spiders are after content, not style). But you bring an interesting point about screenreaders. I have not looked up on the subject, let alone used one. So this is a plus I suppose for those who require it with regards to tabular data.. but this would not be a good reason in itself for building entire pages in tables IMO).

CSS is not and has never been a replacement of tables. Tables, on the contrary, have been a replacement for CSS but should never have been that.

Agreed. But to be fair, tables were around before CSS, which as we know was built for the purpose of separating presentation from content and driving the presentation aspect..where as tables intertwine both when used for page layouts (which as we also know, was not the intention of tables.. some designers started playing around and figured out how else to utilise tables. Was a good idea at the time I suppose, as since there was no CSS, no separation of content and presentation, tables opened many doors on that front. IMO there's no longer a need for tables in that regard [but I now retract my opinion about tabular tables though.. they're useful it turns out...]).

Daniel0 · December 29, 2008

thanks to built in table functionality that automatically bolds text contained within <th> tags as an example offers a 'short-cut' of sorts that would require an additional line of CSS to acheive the equivalent.

It's not so much about "short-cuts". You can make the header look any way you want (bold, another color, italic, or perhaps larger text). Most browsers just think that boldface, centered text is a good way to visually say "this is a header" and that works quite well. If you wish to change that visual appearance, however, then you can use CSS. In fact, those default styles are set using CSS.

It's more or less the same difference between <i> and <em>. If you think that you have a better way of visually representing emphasized text then you can change it using CSS, but the tag will still indicate emphasis. <i> on the contrary would be ludicrous to change, because it only signifies that the text, for whatever reason, should be italic, and there is no other way to represent italicized text than setting it italic. <var> will often be represented in italic as well, but it's still different from both <em> and <i> because <var> represents a variable. <cite> is in italics as well, but it's also semantically different from the three tags before.

Well, as we already established, we know spiders do not take CSS into account (and to be quite frank, who cares? Spiders are after content, not style).

Right, but spiders can better rank data if it carries a semantic meaning due to the fact that natural language processing is freaking difficult. Therefore, you should use HTML, not CSS, to express that. A table carries a semantic meaning, a div does not. This means that a div should not replace a table for representing tabular data. Of course, you can make a series of divs look like a table and its related tags, but that isn't the point. You can also make a table look like your divs+CSS (table based layouts).

but this would not be a good reason in itself for building entire pages in tables IMO).

I never said you should, unless of course all content on the page is tabular. I said that tabular data should always be a table.

Agreed. But to be fair, tables were around before CSS, which as we know was built for the purpose of separating presentation from content and driving the presentation aspect..where as tables intertwine both when used for page layouts (which as we also know, was not the intention of tables.. some designers started playing around and figured out how else to utilise tables. Was a good idea at the time I suppose, as since there was no CSS, no separation of content and presentation, tables opened many doors on that front. IMO there's no longer a need for tables in that regard [but I now retract my opinion about tabular tables though.. they're useful it turns out...]).

CSS did exist, but browser support sucked. I still don't think that misusing HTML was a viable option. Browser vendors should have focused on CSS instead. Browser vendors also started to implement their own proprietary tags, such as the infamous <blink> tag, which later made it into the specifications because their usage became widespread. It's simply just bad decisions in the past because the focus was put in the wrong place.

You are not writing essays in MS Excel and setting up spreadsheets in MS Word either. Despite the fact that you could do it, it is wrong. It's about using the right tools for the right job.

This entire thing hasn't really anything to do with getting things look in a particular way, but in using proper semantic markup. The HTML5 draft takes this even further with more semantic markup elements such as <header>, <nav>, <section>, and <footer>.

Also, I lied when I alleged that CSS doesn't carry semantics, because it can in a way do that. You could do <span class="whisper">something</span> and then .whisper{volume:soft}. The client is then free to use that or not. A regular text based browser such as Firefox would ignore it, but a screenreader would make the sound, well, softer. There are loads of other aural CSS properties, but their usage isn't very widespread. However, it is still only the content that's described in the HTML and then the presentation in CSS

.josh · December 29, 2008

I seriously can't believe you are actually arguing those points. Crawlers ranking siteB lower than siteA because of a few more lines of code? Does google sit there and time how long it takes the crawler to go forth and come back with info, first one back is the winner? Next thing you're gonna tell me is that I'll have better SEO the closer my server is to one of google's.

And seriously, you're arguing that a crawler bot has an "easier" time ignoring a <div> tag than a <table> tag? Do you know anything about regex at all? I mean, I see you helping in the regex forum, so it boggles my mind why you would argue this.

I'm trying really hard not to smack you in the face with a 10 pound fish. These are the sort of arguments I expect to hear from con artists trying to sell snake oil to non-programmers. Don't make me paper cut you.

Wow.. the unwarrented hostility. I think a little more civility is in order here.

For starters, why the regex talk? Did I mention anything about regex? If you are suggesting that this is the ultimate in algorithms of search engines, I think it's quite more complex than that.

The point of argument is making it 'easier' for the spider.. Just how much more efficient is it? Who knows (amongst us I mean). I tend to lean towards believing less code making things easier, yes. Sorry this upsets you. The common sense logic is that the more stuff it has to crawl / sort through.. the more 'work' on its end. Simple as that. Now if we are to compare a small simple clean table-based site vs something equivalent in CSS, I suspect the differences would be small. Very small... Just how deep the rabbit hole goes on much larger / messier sites, well, this is where it might have more of an impact.

I think the link you povided is (with all due respect) utter garbage. The author(s) clearly start making claims that distort everything in their favor. It would serisouly require a large amount of my time and quite the post to start going point by point to counter that site's stupidity, so I won't even bother. I'll refrain from going any further. It was a good laugh though.

In either case, to avoid a developing flame war, you can stick to (and support) your tables.. no harm, no foul. I'll stick to my CSS. But don't get all bent out of shape over someone else's opinion just because it doesn't match yours. We all post out thoughts [and / or links] (hopefully without the insults of face slapping and other such adolescent immaturities) and discuss things in a manner that is more civil. We don't have to agree with each other. But such talk is not necessary.

You're supposed to be one of the 'better ones'.

Common sense logic is that either the program works or it doesn't. Making a webpage with one tag over another or one way over the other, makes it harder on the people programming the crawlers to program them. But once they program them, the bots work as they are programmed, and do not care on whit whether it's one tag or another, unless that's how they were programmed. Your arguing that crawlers are actually programmed to take that sort of thing into consideration, when they aren't. I'm not arguing that it's not "more work" for the bot; sure, it may take longer to process, but to claim that longer processing time == lower SEO ranking is not true.

And regex is 99% of what a crawler is. That's what crawlers do: find information based on patterns and return that information to wherever, to be processed by something else. A crawler's function is to filter information.

And I'm sorry, my previous post did come off kind of harsh, and I apologize for that. But I maintain that those are 'snake oil' reasons.

And I'm not 100% pro tables at all. But I get the impression that people aren't being very objective about css. css is great for maintaining sites. From a larger perspective, MVC is great for maintaining sites. But there is a higher learning curve associated with it. If someone contracts me to go in and fix some bug, I'm gong to have to spend time figuring out how everything is set up. Sure, in theory, if I know the framework they used, and they actually stuck to it, then it shouldn't be a problem. But there are lots of frameworks, and even home brewed frameworks galore. So I have to sort out just where everything was put, before I can even get down to business.

With things like tables, or from a larger scale, methods that don't separate model/view/controller, you have more of a WYSIWYG setup. Sure, the script might "look" messy, but I know that the flow of the program is all right there, straight forward, and if I change something here, it's going to affect what happens right here. So if I get contracted to go in and fix a bug, I'm going to figure out the setup and find the bug a whole lot quicker.

So what is better? That's the point: it depends on the situation. If I were hired by a company to indefinitely maintain their site, I would rather have css, or MVC design in general. But if I'm being asked by random mom and pop shops to fix some random bug or add some random feature, I'd rather have WYSIWYG setups.

Daniel0 · December 29, 2008

And regex is 99% of what a crawler is. That's what crawlers do: find information based on patterns and return that information to wherever, to be processed by something else. A crawler's function is to filter information.

I'm quite sure that the majority of interpreters do not use regular expressions for parsing and interpreting code. I'm not an expert on this though, so I'll return in some two years where I am hopefully studying computer science and have aced the compilers/interpreters course. I also very much doubt they pay the dudes over at Google a lot of money for just writing a couple of regular expressions. I'd assume that the algorithm for determining search relevancy is much more sophisticated than just figuring out which words there are on a page.

.josh · December 29, 2008

And regex is 99% of what a crawler is. That's what crawlers do: find information based on patterns and return that information to wherever, to be processed by something else. A crawler's function is to filter information.

I'm quite sure that the majority of interpreters do not use regular expressions for parsing and interpreting code. I'm not an expert on this though, so I'll return in some two years where I am hopefully studying computer science and have aced the compilers/interpreters course. I also very much doubt they pay the dudes over at Google a lot of money for just writing a couple of regular expressions. I'd assume that the algorithm for determining search relevancy is much more sophisticated than just figuring out which words there are on a page.

Virtually anything that involves parsing and interpreting, be it content or code or whatever, is what the principle of Regular Expressions is. Even doing something like this:

$x = substr($string,0,1);

// or 

if (substr($string,0,1) == 'a') {
  // do something
}

is regexing.

Now, whether the fine folks at google are using something like the pcre library or whether they brewed up their own library, is another story. But the principle is the same. Crawler takes a page and breaks it down, looking for specific things. That's regex.

Daniel0 · December 29, 2008

Parsing a string doesn't make it regex. Regex can be used for that, but string parsing and regular expressions are not interchangeable things. Regular expressions have to be parsed as well, but if all parsing constitutes the use of regular expressions then you are sort of in an infinite loop.

No computer knows what to do with ^fj(?=[a-z]{9})\d+$ by itself; it needs to be parsed itself.

.josh · December 29, 2008

Parsing a string doesn't make it regex. Regex can be used for that, but string parsing and regular expressions are not interchangeable things. Regular expressions have to be parsed as well, but if all parsing constitutes the use of regular expressions then you are sort of in an infinite loop.

No computer knows what to do with ^fj(?=[a-z]{9})\d+$ by itself; it needs to be parsed itself.

When someone mentions 'regex' most people think of it in limited terms. That is, they think of just parsing through content with things like the pcre library. But using regular expressions is the core principle of programming, in general. No computer knows what to do with anything except for straight machine language. But since writing in machine language sucks for us, we use higher level languages to act as a middle man. That middle man parses and interprets the code with its own regex functions. Nested inside that might be some arbitrary function/system (like pcre) that uses regex to interpret something else. Things like the pcre library is just one instance of regex. But regex is a concept, not a single instance of something. So, anything that involves interpreting or parsing is regex. How one thing goes about parsing or interpreting and for what reason, and what the subject(content) is, is irrelevant. It is the principle of pattern recognition which causes it to be a form of regexing.

Daniel0 · December 29, 2008

Yeah, I know about the various abstraction levels there are down to machine code, but I believe you are using a non-standard definition of the term "regex", CV. Mostly everybody else refer to regex as what is described in this article. It's true that PCRE is a particular implementation of regex and that other dialects exist (e.g. POSIX), but I've never seen anybody but you refer to regex as being "string parsing" all everything else that term encompasses.

.josh · December 29, 2008

Yeah, I know about the various abstraction levels there are down to machine code, but I believe you are using a non-standard definition of the term "regex", CV. Mostly everybody else refer to regex as what is described in this article. It's true that PCRE is a particular implementation of regex and that other dialects exist (e.g. POSIX), but I've never seen anybody but you refer to regex as being "string parsing" all everything else that term encompasses.

Don't confuse "most referenced" with "standard." It is "most referenced" by things like pcre to the programming community, because that's the level we work on. It would indeed be safe to assume that when a programmer is talking about regex, they are talking about things like pcre libraries. But when you move beyond the program itself, into what the program is being used for, what is impacted because of it (SEO, for instance), we move beyond the confines of programming.

As said, things like the pcre library is what most people think of, as far as regex, but it is indeed a principle, the principle used with computers and programming in general. Right down to the hard wiring of computers. It all boils down to "on" or "off." Circuit boards are made up of capacitors and transistors and all that junk, laid out in specific patterns and computers are hardwired to physically do something based off those patterns. Machine code interprets higher level code to send specific patterns to the hardware. Higher level languages send specific patterns to be parsed and interpreted by the machine code. As you move up to higher levels, the subject and pattern becomes more abstract to the computer, but less abstract to us.

What do you think a programming language does with the code you feed it? The code you write is a pattern. It parses and interprets that pattern, as a whole. It recognizes a certain sub pattern and hands it off to a certain function that uses its own code pattern to filter through that pattern. Regex is the concept of pattern recognition. It is used in a fractal sort of way, a bunch of filters boiling down to one thing: on, or off.

Daniel0 · December 29, 2008

I know that, but that still doesn't make it regular expressions. Anything that's a pattern isn't regex. You don't call a sentence in English a regular expression either just because there are certain patterns in the language.

Also, the way a word is usually used has to be the standard definition. You can't just make up your own definitions and claim you are still correct. I can't call a dog a table, a fork a bird, or grass a car. If I do that then people won't know what I am talking about because I use the words in a non-standard manner.

.josh · December 29, 2008

I know that, but that still doesn't make it regular expressions. Anything that's a pattern isn't regex. You don't call a sentence in English a regular expression either just because there are certain patterns in the language.

Also, the way a word is usually used has to be the standard definition. You can't just make up your own definitions and claim you are still correct. I can't call a dog a table, a fork a bird, or grass a car. If I do that then people won't know what I am talking about because I use the words in a non-standard manner.

Again, you're failing to understand that the programming community is just one circle of mentality. As I said before (I think you missed the edit), yes, it is the "standard" within the programming community. We can safely assume that when someone is talking about regex on a site like phpfreaks.com, that they are talking about pcre or posix or something similar. But in other communities, that's not the case. I will concede that our community is probably the only one that conveniently shortens it to "regex," and maybe that's why you continue to only think inside this box.

A pattern is a regular expression. They are synonyms.

A sentence is a pattern. We use words as building blocks and put them in certain orders, tenses, etc... in order to convey an idea or intent. I read the sentence, and my brain interprets it, based on a set of rules (part of speech, grammar rules, etc..).

Daniel0 · December 29, 2008

It may, or may not be true that other fields refer to regular expressions as something else than what it means in programming/computer science, but using a X's definition in a Y context doesn't really make sense and will only cause confusion.

You can't just take a word, give it a new meaning and say that everybody else fails to understand its true meaning (the meaning you gave it). As I said before, I have never heard anyone who referred to something as a regular expression without it being in a programming sense. Patterns? Yes, but not regular expressions and I still do not agree that a pattern and a regex is the same thing. A regex is a pattern, but a pattern isn't necessarily a regex, in the same way that a dog is an animal, but an animal isn't a dog.

.josh · December 29, 2008

I wasn't expecting to take the contextX reference and apply it to contextY, merely point out that it is a general principle, that can be applied to many situations, and that in principle, it goes beyond contextY.

Why do you not think all patterns are regexes? I see your animal > dog analogy, but the more appropriate application to that analogy is

'dog' is to 'pattern' as 'animal' is to 'pattern recognition.'

A regular expression is a pattern, and a pattern is a regular expression. Moving beyond single application into broader classification is pattern recognition.

.josh · December 29, 2008

'dog' is to 'pattern' as 'animal' is to 'pattern recognition.'

Or rather:

'dog' is to 'pattern' or 'animal' is to 'pattern'

vs.

'animal > dog' is pattern recognition.

Daniel0 · December 29, 2008

Why do you not think all patterns are regexes?

Because regular expressions are a particular type of patterns like a dog is a particular type of animal. You can express what I mean using PHP actually:

class Pattern {}
class Regex extends Pattern {}
// and e.g.: class PCRE extends Regex {}
//           class POSIX extends Regex {}

$pattern = new Pattern();
$regex = new Regex();

var_dump($regex instanceof Regex); // true
var_dump($regex instanceof Pattern); // true
var_dump($pattern instanceof Pattern); // true
var_dump($pattern instanceof Regex); // false

I'm talking about the same kind of taxonomy as you also use in biology to classify organisms (life>domain>kingdom, etc.)

I believe there must be a widespread consensus that pattern and regex are synonymous for it to be true.

Seriously though, would you call the following a regex? It's a pattern after all (white square followed by black square four times, new line, repeat eight times and reverse for each new line).

nrg_alpha · December 29, 2008

So many responses since I last checked in.. rightie-oh. Onwards march:

Agreed. But to be fair, tables were around before CSS, which as we know was built for the purpose of separating presentation from content and driving the presentation aspect..where as tables intertwine both when used for page layouts (which as we also know, was not the intention of tables.. some designers started playing around and figured out how else to utilise tables. Was a good idea at the time I suppose, as since there was no CSS, no separation of content and presentation, tables opened many doors on that front. IMO there's no longer a need for tables in that regard [but I now retract my opinion about tabular tables though.. they're useful it turns out...]).

CSS did exist, but browser support sucked. I still don't think that misusing HTML was a viable option. Browser vendors should have focused on CSS instead. Browser vendors also started to implement their own proprietary tags, such as the infamous <blink> tag, which later made it into the specifications because their usage became widespread. It's simply just bad decisions in the past because the focus was put in the wrong place.

Dan, I'm referring to the point in time when CSS did not exist, but when tables did. CSS was first published in December of 1996 (which was inducted into HTML 4 [1997 I think]), while tables have been around since HTML 3 (the last sentence in this link is what caught my eye).

So when developers are stuck with (the then current) HTML 3 and tables, at this point, there wasn't much room for creativity... until people starting discovering the incorrect use for tables and using it for structure / presentation. So this is what meant by 'at the time, it was a good idea' (read, creative idea that opened some design doors so-to-speak. at the expense of using it as markup unfortunately). But I do agree that when CSS finally did make its way onto the stage, browser developers should have jumped on (and aggressively endorsed) CSS when it first appeared in HTML 4. If they had done so, there would be far less table-based layouts than there are today. (on a side note, I admittedly hadn't realized that CSS is that old.. I figured perhaps somewhere in the neighbourhood of the year 2000 give or take + / - a year).

Also, I lied when I alleged that CSS doesn't carry semantics, because it can in a way do that. You could do <span class="whisper">something</span> and then .whisper{volume:soft}. The client is then free to use that or not. A regular text based browser such as Firefox would ignore it, but a screenreader would make the sound, well, softer. There are loads of other aural CSS properties, but their usage isn't very widespread.

Well, again, I wouldn't know the difference in that regard, as I have never researched (let alone used) a screenreader. Just as a side question out of pure curiosity, which verson of CSS are these properties found in? (CSS 1, CSS 2, CSS 2.1...)

.josh · December 29, 2008

Yes, I would call the 'checkerboard' a regex, the same as I would call it a pattern. Now, if I were to look at it in a broader sense of using black and white boxes to make up a checkerboard, that would be pattern recognition. Or I could use the checkerboard as a whole, as a pattern in a larger context.

life>domain>kingdom

kingdom is a sub-pattern of domain, which in turn is a sub-pattern of life. But each one is indeed a pattern, or regex. We use pattern recognition in general to make those patterns or regexes.

Daniel0 · December 29, 2008

Well, again, I wouldn't know the difference in that regard, as I have never researched (let alone used) a screenreader. Just as a side question out of pure curiosity, which verson of CSS are these properties found in? (CSS 1, CSS 2, CSS 2.1...)

I couldn't find anything about it in the CSS1 specs, but it appears in CSS2: http://www.w3.org/TR/CSS2/aural.html

Yes, I would call the 'checkerboard' a regex, the same as I would call it a pattern. Now, if I were to look at it in a broader sense of using black and white boxes to make up a checkerboard, that would be pattern recognition. Or I could use the checkerboard as a whole, as a pattern in a larger context.

life>domain>kingdom

kingdom is a sub-pattern of domain, which in turn is a sub-pattern of life. But each one is indeed a pattern, or regex. We use pattern recognition in general to make those patterns or regexes.

I believe you are pretty much alone with that interpretation. If you can find somebody else who defines things the way you do then I'd like to see it though.

As far as I can tell, consensus is that regular expressions are patterns written in a formal language that describes a text string. E.g. ^a\d+[a-z]$ matches a string that starts with an a, followed at least one digit, and ends with any lowercase letter a through z. You cannot express a checkerboard in that way. You cannot express the treats an organism must have in order to classify it as a mammal in that way. You can express text strings.

nrg_alpha · December 29, 2008

Common sense logic is that either the program works or it doesn't. Making a webpage with one tag over another or one way over the other, makes it harder on the people programming the crawlers to program them. But once they program them, the bots work as they are programmed, and do not care on whit whether it's one tag or another, unless that's how they were programmed. Your arguing that crawlers are actually programmed to take that sort of thing into consideration, when they aren't. I'm not arguing that it's not "more work" for the bot; sure, it may take longer to process, but to claim that longer processing time == lower SEO ranking is not true.

I think there is a misunderstanding here..when I throw around terms like (gives a spider "more work") admittedly, this is perhaps not the wisest choice of words... As you pointed out.. programs just do the execution, regardless of how much stuff it has to go through.. When I mention more work, I mean the program (spider) has more stuff to 'process'. So the more processing it has to do, well, spiders will not plow through as quickly as say a site that has less bloat. There's no disputing that the spider will even get through a massive, terribly structured sites (as that is what they are programmed to do). Just that the more there is, the "more work" on it's part (again, not to be taken literally.. think in terms of processing instead.. it will still process as it was built to, but just it has to process more vs less content.. the speed of processing is the same I assume.. just has to go through more stuff to index the page. So from what I am gathering in the book I am in the middle of reading, this aspect does have an effect on rankings [to some degree..see below]).

To extend SEO here for a moment.. spiders crawling through a site is only one aspect of say Googles methods of site / page rankings.. it goes more deep than merely indexing content. There are many variables at play here. How long has a site been up and running? (this effects ranking to some degree). How many reputable inbound links does this site have? (this too has an effect on rankings).. Does this site employ black hat techniques? (boy, this has a potentially detrimental effect on rankings).. The reason for me bringing up the points I did was with respect to CSS (read: reduced (x)html markup) and the effect on ranking from strictly spider indexing (and there is much more to it than simply less code bloat.. we won't even bother getting into researching top key words searched,and how to implement that into a site successfully, or making use of <hx> tags with images as backgrounds (as used in say a header graphic) and text-indenting the actual text within those tags offscreen so that spiders see those for indexing while we the visitors have an image in its place). There is a plethora of variables at play.. so clean reduced code bloat and the 'amount of work' a spider goes through is not in itself a deciding factor. It should be taken into consideration in the 'grander scheme of things' to help improve your site's rankings.

And regex is 99% of what a crawler is. That's what crawlers do: find information based on patterns and return that information to wherever, to be processed by something else. A crawler's function is to filter information.

Really? If this is so, by that logic anyone who is extremely knowledgable in regex (and when I say regex, I don't mean from PCRE standpoint.. I have been following your discussions with Dan with regards to the definition of regex, and I know you don't in this case mean PCRE) can become a Google slayer?

I am not disputing that there isn't any regex involved.. Hell, I won't even dispute that there isn't a good chunk of it.. but I would issue the challenge here for you to please provide concrete, indisputable evidence that say, Google's search engine, is indeed 99% regex. Sorry. I don't buy it. There is a damn good reason why search engine algorithms such as Googles is a very tightly guarded secret. If it was infact 99% regex, then there is no way the remianing 1% is what makes Google so damn good.

So again, please post references in the form of undisputable evidence that backs up what you claim. Otherwise, I dare say I think there is more to it than simply regex. Otherwise, many other very talented regex coders would have slayed Google by now (and no, that other 1% wouldn't cut it).

And I'm sorry, my previous post did come off kind of harsh, and I apologize for that. But I maintain that those are 'snake oil' reasons.

Well, let me simply state that there is no need for your tone in your initial response to me. CV, I have not once disrespected you. I would expect the same in return. I can understand you not agreeing with the points I have made (and it is your given right to do so), but you could have certainly argued your point across with the same impact without the insults. Something to consider in future postings.. so Apology accepted. It is now water under the bridge.

.josh · December 29, 2008

Yes, I think there is a misunderstanding. I am not asserting that one's site ranking on a google search is solely based on regex. I know that much much more goes into it. Anything form keyword > content relevance to "presence" on other sites (which is why link exchange type systems were so popular at one point in time, to the point of exploitation), to straight up paying for a premium spot goes into being ranked. In short, everything you mentioned, and more goes into it.

But we're not talking about that. We're talking about crawler bots specifically. Their job is to filter info, nothing more. Some other part of the google code/system/whatever decides what to do with the info returned. So yes, a crawler bot is indeed 99% regex, but the extent of google is not just the crawler bot.

As far as becoming a google slayer with regex: I guess it depends on what side of the fence you're on. From a site's perspective, no you can't. If you were to use '?at' as a keyword on your site (in a meta tag or as content or whatever), google is not going to bring up your site if someone types in 'cat' or 'bat' (at least, I don't think it will...one would think that would have been something already exploited by now..and fixed).

But as someone who uses google to search, google does offer very limited regexing for your google searching pleasure. For instance, I can go to google.com and type in (cat|bat) and it will return results for both. Interestingly, if I type in (google|yahoo) yahoo comes in first! LOL.

Daniel0 · December 29, 2008

You are probably not going to see Google's algorithm anyways, and if you do, then you're likely under an NDA and won't be able to tell about it in details. They keep it secret so other people cannot copy them and so people won't exploit in-depth knowledge to get a higher position. We need to rely on observations from the outside and make guesses about how it works.

nrg_alpha · December 29, 2008

So yes, a crawler bot is indeed 99% regex

Again, proof please..

I can by example equally counter claim that "a crawler bot is indeed 87% regex, and 13% additional trade secret algorithms which not many people know of". Doesn't make my claim correct, despite me stating that indeed it is this or that.

Provide reputable links..evidence.

For instance, I can go to google.com and type in (cat|bat) and it will return results for both. Interestingly, if I type in (google|yahoo) yahoo comes in first! LOL.

That is interesting.. the immediate thought that came to mind was perhaps a Nondeterministic Finite Automaton engines [NFA] vs Determinisitic Finite Automaton engines [DFA] (the closest thing I can think of is from the bottom of page 146 in the Matering Regular Expressions book), which offers a simple test to see which engine your regex is using.. but I don't think this would be the case.. No matter which order I tried Yahoo or Google, Yahoo indeed was first. Odd.

I also just realised how off base this thread has become (going from CSS vs tables for layouts to spiders, regex and algorithms...) Funny (or not) how things can quickly digress.

Sign In

Tables for layout

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Important Information