Jump to content

Which to use? Curly quotes, straight quotes, <q> tags or HTML entities?


Fluoresce

Recommended Posts

Quotation marks are confusing me.

 

What do you guys use when it comes to quotation marks?

 

In HTML attributes and throughout the bodies of my web pages, I use the HTML entity ("). For example:

<a href="" title="Read "Article Name"">
<p>In his new book, he says: "This is a quote."</p>

I thought that this is the best practice.

 

However, today, I read that it's perfectly safe to use straight quotes (") in the body, and that I should use the HTML entity only in HTML attributes.

 

Is that correct?

 

But what if I want to use curly quotes in the body instead of straight quotes? Should I always use the HTML entities for curly quotes (“ and ”), or can I also safely use the characters (“”)?

 

I heard that straight quotes are safe in all browsers, even if you don't specify the character set of your web pages, but that curly quotes are only safe if you specify the character set or if you use the HTML entities.

 

Is that true?

 

And what about the <q> tag? Apparently, it's compatible with all browsers but they treat it differently.

Edited by Fluoresce
Link to comment
Share on other sites

Those are two different problems.

 

The reason why double quotes must be escaped in some contexts is because they (obviously) have a special meaning in HTML. A literal double quote within a double-quoted attribute is ambiguous, because it could either be a literal character or an attribute delimiter. The HTML parser cannot handle this. However, there's no such ambiguity within the content of an element or within a single-quoted attribute, so in those cases, you can safely use a literal double quote.

 

Whether or not you can use literal curly quotes within your document is an entirely unrelated issue. The problem is this: An HTML document by itself is just a sequence of bytes. If the browser should map those bytes to actual characters, then it must know how to do that. By far the best solution is to actually tell the browser which encoding you've used. This is done with the charset attribute in the Content-Type header and/or a meta element within the document itself.

 

If you explicitly specify the encoding, then you can safely use any character from the underlying character set (as long as it doesn't have a special meaning in HTML, of course). You do not have to encode the characters with HTML entities, and you shouldn't.

 

If you do not specify the encoding, then it's a very different story. The browser has to fall back on its default encoding which is probably something like ISO 8859-1 or Windows-1252. Now the only literal characters you can safely use are those from the ASCII set. A literal curly quote will almost certainly be misinterpreted. What you can do, however, is use HTML entities to represent Unicode characters within ASCII. This will also yield the right characters. But it does come at a price: The resulting document is much bigger (the entity takes 7 bytes, the same character in UTF-8 just 3), and it's much more effort on your part, because you have to encode every non-ASCII character.

 

So do declare the encoding, and do use literal curly quotes.

  • Like 1
Link to comment
Share on other sites

Thank you, Jacques1!

 

I've learnt quite a bit off you in the past couple of days. ;D

 

Please tell me if I've got this right . . .

 

As long as I have this meta tag:

<meta http-equiv="content-type" content="text/xml; charset=utf-8" />

on my web pages, I can have literal curly quotes (which I will produce like this: Alt+0147 and Alt+0148) in the bodies of my web pages.

 

In fact, I can have any UTF8 character.

 

They will all be compatible in all browsers. It doesn't matter what browser they're using and what web fonts they have installed. No users will see strange characters.

 

I do not have to use the curly quote HTML entities (“ and ”).

 

Is that all correct?

Link to comment
Share on other sites

Well, “all browsers” is a very broad term. Let's say: all modern mainstream browsers. I'm sure somewhere there's an obscure web client which doesn't understand UTF-8. But all browsers you'll encounter on a standard website do, and UTF-8 is in fact the official recommendation of the W3C.

 

Note that you should also declare the encoding in a Content-Type header. Using a meta element usually leads to the expected result, but it's somewhat paradox: The browser has to understand the document so that it can get the information which is necessary to understand the document. This only works under certain circumstances: The meta element itself must be ASCII-encoded, and it must be within the first 1,024 bytes of the document. There are no such issues with the Content-Type header. Actually, the meta element should only be used as a backup in case the user views the document offline (in which case the HTTP headers are not available).

 

Why do you use text/xml as the content type, by the way?

 

The font is yet another story. Not all fonts actually include all 100,000 Unicode characters, so using a very exotic character and a very exotic font can theoretically lead to problems. But this isn't gonna happen for a simple curly quote.

 

So if we leave aside all the obscure edge cases, then, yes, you can safely use any character you want.

Edited by Jacques1
  • Like 1
Link to comment
Share on other sites

One additional thing to mention, you need to ensure that when you create the document you need to ensure that your editor saves the document as UTF-8 as well. If you create and save your document in something like Windows-1252 but tell the browser that it's UTF-8 you'll still have issues because the literal curly-quote will not be encoded properly.

 

This is a somewhat common issue to people who are new to character encoding. They will configure their page with the meta tag and/or header but neglect to ensure they are actually creating a UTF-8 page in the first place with their editor.

  • Like 1
Link to comment
Share on other sites

Note that you should also declare the encoding in a Content-Type header. Using a meta element usually leads to the expected result, but it's somewhat paradox: The browser has to understand the document so that it can get the information which is necessary to understand the document. This only works under certain circumstances: The meta element itself must be ASCII-encoded, and it must be within the first 1,024 bytes of the document. There are no such issues with the Content-Type header. Actually, the meta element should only be used as a backup in case the user views the document offline (in which case the HTTP headers are not available).

 

Why do you use text/xml as the content type, by the way?

 

I use the content type text/xml because my doctype is XHTML 1.0 Strict. Is that wrong?

 

This is what my doctype and <head> element looks like. Please tell me if you can see any problems.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="EN" dir="ltr" xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="content-type" content="text/xml; charset=utf-8" />
        <meta name="robots" content="index, follow" />
        <link rel="stylesheet" type="text/css" href="/mystyle.css" />
        <link rel="shortcut icon" href="/images/favicon.ico" type="image/x-icon" />
        <meta name="description" content="" />
        <meta name="keywords" content="" />
        <title></title>
    </head>        
    <body>

"The meta element itself must be ASCII-encoded, and it must be within the first 1,024 bytes of the document."

 

Sorry, I don't understand what you mean when you say that it has to be ASCII-encoded. The charset attribute is set to utf-8.

 

Some of my pages have lots of PHP at the top. Does this mean that my content-type declaration might be outside of the first 1,024 bytes, or does PHP not count?

 

"Note that you should also declare the encoding in a Content-Type header."

 

Do you mean that I should put something like the following at the top of the page?

<?php header(Content-Type: text/xml) ?>
Link to comment
Share on other sites

I use the content type text/xml because my doctype is XHTML 1.0 Strict. Is that wrong?

 

 

Using XHTML is not wrong, but I don't see the point. Unless you have a specific reason for why you need it, just go with plain HTML. There are a lot of great new features in HTML5, and you can still use XHTML syntax if you like.

 

 

 

"The meta element itself must be ASCII-encoded, and it must be within the first 1,024 bytes of the document."

 

Sorry, I don't understand what you mean when you say that it has to be ASCII-encoded. The charset attribute is set to utf-8.

 

What I mean is that the <meta> tag itself as it stands in the HTML source code must be ASCII-encoded. This is automatically the case for UTF-8, because UTF-8 is a superset of ASCII.

 

 

 

Some of my pages have lots of PHP at the top. Does this mean that my content-type declaration might be outside of the first 1,024 bytes, or does PHP not count?

 

The browser doesn't know your PHP code. It only sees the resulting HTML document.

 

 

 

Do you mean that I should put something like the following at the top of the page?

<?php header(Content-Type: text/xml) ?>

 

You forgot the character encoding. But, yes, that's one way of setting the HTTP header. You can also do it with the webserver itself.

  • Like 1
Link to comment
Share on other sites

One additional thing to mention, you need to ensure that when you create the document you need to ensure that your editor saves the document as UTF-8 as well. If you create and save your document in something like Windows-1252 but tell the browser that it's UTF-8 you'll still have issues because the literal curly-quote will not be encoded properly.

 

This is a somewhat common issue to people who are new to character encoding. They will configure their page with the meta tag and/or header but neglect to ensure they are actually creating a UTF-8 page in the first place with their editor.

 

That's confused me a bit. :confused:

 

Let's say that I write something in MS Word and then I copy and paste it into my web page. Are the characters on that page now gong to be Windows-1252?

Link to comment
Share on other sites

No, you get the encoding which you've set in your editor. 

 

I've just checked, and the editor that I use (Aptana Studio 2) has cp1252 set as the default encoding.

 

What kind of trouble am I in, then? :confused:

 

I assume that I should re-save all of my web pages as UTF-8 and then re-upoad them. Is that correct?

 

Is there anything else that I should do?

 

My setup looks like this:

 

1) I've got this meta tag on all of my pages, at the top of the head element:

<meta http-equiv="content-type" content="text/xml; charset=utf-8" />

2) All of my MySQL connections include this:

mysql_set_charset("UTF8", $connection);

3) My database tables are set to utf8_general_ci.

 

I still haven't specified UTF-8 headers. I want to do it on the server instead of using the head() function. Do I do it in my php.ini file or in my .htaccess file?

 

Note that, when I check my headers, the Content-Type header just says "text/html". Shouldn't it also say UTF-8?

 

And what about my CSS files? I heard that I have to encode them in UTF-8 as well.

 

This is a very confusing subject for me. :shrug: I appreciate your help very much.

Link to comment
Share on other sites

CP-1252 and UTF-8 work the same way for ASCII characters, so you only have to convert the files which contain non-ASCII characters. It's probably enough to open the website in your browser and do a quick check for broken characters.

 

 

 

I still haven't specified UTF-8 headers. I want to do it on the server instead of using the head() function. Do I do it in my php.ini file or in my .htaccess file?

 

In the Apache configuration or an .htaccess file. Just google for it or check the manual.

 

 

 

Note that, when I check my headers, the Content-Type header just says "text/html". Shouldn't it also say UTF-8?

 

Preferably, yes. But it won't say that unless you add the charset attribute to the Content-Type header.

 

 

 

And what about my CSS files? I heard that I have to encode them in UTF-8 as well.

 

Yes. But since CSS files usually only contain ASCII characters (except maybe for comments), it makes no difference.

  • Like 1
Link to comment
Share on other sites

Okay, here's what I've done:

1) Encoded all of my pages in UTF-8 without BOM using Notepad++.

2) Added this to my .htaccess file:

IndexOptions +Charset=UTF-8

3) Changed the default charset in my php.ini file to:

default_charset = "utf-8"

4) Specified my MySQL connections with:

mysql_set_charset("UTF8", $connection);

5) Set my databases to utf8_general_ci.

6) Used this meta tag:

<meta http-equiv="content-type" content="text/xml; charset=utf-8" />

7) Added this to my external style sheet:

@charset "utf-8";

I'm guessing I've gone overboard. However, it all seems to work, so I'm happy. :happy-04:

 

I thank you guys—especially Jacques1—for your assistance.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.