rakaz

about standards, webdesign, usability and open source

CSS selector bugs: Case sensitivity

The previous version of the CSS selector test contained a test case for determining if the value of an attribute selector was compared in a case-insensitive way. We only tested the align attribute – which should be treated in a case-insensitive way. But there are also a lot of other attributes which should be tested in a case-sensitive way.

When the CSS attribute selectors are evaluated the browser will compare the value of an element attribute with the value specified in the selector. If there is a match it will apply certain style rules to the element to which that attribute belongs.

There are 6 ways how an attribute value could be compared:

  • is it equal to the specified value,
  • is it a space-separated list that contains the specified value,
  • is it a hyphen-separated list of which the first word is equal to the specified value,
  • does it begin with the specified value
  • does it contain the specified value
  • does it end with the specified value

What does the CSS specification say?

The problem is that all of these comparisons can be handled in two different ways: case-sensitive and case-insensitive. Which one should be used is not defined by the CSS standard. Instead the specification tell us explicitly that the case-sensitivity is determined by the document language: in our case HTML. So we should look at the HTML standard to find out which should be used.

The case-sensitivity of attribute names and values in selectors depends on the document language.

What does the HTML standard say about this?

The HTML standard makes it quite difficult for us. There is no single correct way. Each attribute has its own rules. So basically we need to make a list of which attributes should be handled in which way.

Each attribute definition includes information about the case-sensitivity of its values.

Fortunately there are a couple categories:

Category Description
CS The value is case-sensitive (i.e., user agents interpret “a” and “A” differently).
CI The value is case-insensitive (i.e., user agents interpret “a” and “A” as the same).
CN The value is not subject to case changes, e.g., because it is a number or a character from the document character set.
CA The element or attribute definition itself gives case information.
CT Consult the type definition for details about case-sensitivity.

The first and second categories are clear. The first should be treated in a case-sensitive way, the second in a case-insensitive way. The third is also clear: it can be treated in either way, it should not matter. The fourth and fifth are bigger problems – we should look at the attribute definition or the type of the attribute.

The only attribute that uses the CA category is the value attribute of the input element. Because we do now know how each value should be treated – it depends on the purpose of the input element. If the input is used for entering a phone number it should be considered case-neutral, but for entering most other information it is important that the information is treated in a case-sensitive way. Because we do not know the purpose of the input element we need to treat everything in a case-sensitive way.

All elements that use the CT category uses one of the following three types: script, uri, uri list. Lists should be treated the same way as the type of which it consists, so we need to look at the script and uri types:

URIs in general are case-sensitive. There may be URIs, or parts of URIs, where case doesn’t matter (e.g., machine names), but identifying these may not be easy. Users should always consider that URIs are case-sensitive (to be on the safe side).

So, uris should be treated in a case-sensitive way – just to be on the safe side. But how about scripts?

The case-sensitivity of script data depends on the scripting language.

The case-sensitivity of the script type is determined by the language of the script. We do not know which language is used by the script type, but we do know that the Javascript is case-sensitive. So just like the uri type we can consider this type to be case-sensitive – again just to be sure.

The CT category should be treated in the same way as the CS category.

Some more problems

Unfortunately there are a couple of problems in the HTML specification. The category for an attribute is dependant on the element to which it belongs. For example: the name attribute should be treated in a case-sensitive way if it belongs to an a element. If it belongs to an input element it should be treated in a case-insensitive way. Luckily this only applies to a limited set of attributes: type, name, value, weight and height.

The weight and height attributes generally belong to the CN category. It does not matter how they are treated. If the weight and height attribute belong to the applet element this changes. If that is the case it should belong to the CI category. This probably is a bug in the specification, because the type is the same in both cases: length. And the length type specifically tells us:

Length values are case-neutral.

So we can treat the weight and height attribute as case-neutral regardless of the element to which it belongs.

The value attribute is can be the CS, CA or even CN. We already determined that we should treat attribute in the CA category in a case-sensitive way. We also know that it does not matter how an attribute in the CN category is treated. So we can safely treat all value attributes in a case-sensitive way.

The name attribute is one that we cannot solve. For some elements it should be treated in a case-sensitive way, for example the a element. Other elements – such as the input element expect it to be treated in a case-insensitive way. There is no way around it. We need to look at the element if we want to know how to treat this attribute.

The type attribute is an even bigger problem. Not only does this attribute depend on the element it belongs to, but in one case it even depends on the parent element of the element it belongs to. Consider the following possibilities: the object element defines the type attribute as a content-type. This should be evaluated in a case-insensitive way. The same applies to the input element and the ul element. Both are case-insensitive. The type attribute of the ol element it is case-sensitive – the case of the value directly influences how the list is displayed.

The li element is what makes it even worse. If the li element is a child of an ul element its type attribute can contain the same values as the type attribute of the ul. This makes the type attribute of the li element case-insensitive. However if it is a child of an ol element, it can only have the same values of the ol element – making the type attribute of the li element case-sensitive. There is simply no proper way to make it easier.

The last problem that we need to solve are a couple of attributes were simply forgotten in the HTML specification. Nowhere in the specification is there a definition how it should be treated:

  • param > value
  • param > name
  • img > align
  • object > align
  • applet > align

Given that there are a couple of other elements that do define a behaviour for the align attribute we can safely assume that these should behave in the same way. Other elements that use the align attribute treat it in a case-insensitive way. So we should also do this for the img, object and applet element.

The problem of the value attribute is also simple to solve. We already discovered that it in all other cases it sould be treated in a case-sensitive way. We should also to this for the param element.

The name attribute is a bit more difficult. There is no single way to treat this attribute. Some elements treat it in a case-sensitive way, some in a case insensitive way. We do know that the param element is dependant on external factors. It is up to the browser plugin to the determine how it should be handled. And once again, because we do not know how it should be treated we should be on the safe side. Treat it in a case-sensitive way.

Our list of attributes

Case-sensitive title, id, class, content, scheme, datetime, summary, headers, abbr, standby, code, object, alt, label, prompt, for, value, profile, background, cite, href, src, longdesc, usemap, classid, codebase, data, archive, action, onload, onunload, onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onfocus, onblur, onkeypress, onkeydown, onkeyup, onsubmit, onreset, onselect, onchange

param:name
meta:name
a:name
applet:name
ol:type
li:type (if parent element is ol)

Case-insensitive lang, dir, http-equiv, text, link, vlink, alink, compact, align, frame, rules, valign, scope, axis, nowrap, hreflang, rel, rev, charset, codetype, declare, valuetype, shape, nohref, media, bgcolor, clear, color, face, noshade, noresize, scrolling, target, method, enctype, accept-charset, accept, checked, multiple, selected, disabled, readonly, language, defer

img:name
map:name
frame:name
iframe:name
form:name
input:name
button:name
select:name
textarea:name
input:type
button:type
ul:type
li:type (if parent element is ul)
a:type
link:type
object:type
param:type
style:type

Case-neutral version, width, start, border, cellspacing, cellpadding, char, charoff, span, rowspan, colspan, height, coords, hspace, vspace, style, size, rows, cols, frameborder, marginwidth, marginheight, maxlength, tabindex, accesskey

What about XHTML?

Up till now we’ve only talked about HTML and the HTML specification. Although XHTML looks like HTML, is a different language. Because the case-sensitivity is determined by the language, we need to take a complete new look at all the attributes. What does the XHTML spec say about case-sensitivity:

HTML 4 and XHTML both have some attributes that have pre-defined and limited sets of values (e.g. the type attribute of the input element). In SGML and XML, these are called enumerated attributes. Under HTML 4, the interpretation of these values was case-insensitive, so a value of TEXT was equivalent to a value of text. Under XML, the interpretation of these values is case-sensitive, and in XHTML 1 all of these values are defined in lower-case.

If an attribute consists of a list of defined values it should be treated in a case-sensitive way. Previously these values could be treated in a case-insensitive way. The same also applies to boolean attributes. These should contain their default value, which is defined in the DTD as a case-sensitive string.

The XHTML specification does not indicate that any other attributes should be treated in a different way. We can compare all the other attributes the same way as we used to do for HTML attribute. So the table only changes a little for XHTML documents.

Case-sensitive title, id, class, content, scheme, datetime, summary, headers, abbr, standby, code, object, alt, label, prompt, for, value, profile, background, cite, href, src, longdesc, usemap, classid, codebase, data, archive, action, onload, onunload, onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onfocus, onblur, onkeypress, onkeydown, onkeyup, onsubmit, onreset, onselect, onchange, dir, compact, align, scope, nowrap, frame, rules, valign, declare, valuetype, shape, nohref, clear, noshade, noresize, scrolling, method, checked, multiple, selected, disabled, readonly, defer

param:name
meta:name
a:name
applet:name
ul:type
ol:type
li:type
input:type
button:type

Case-insensitive http-equiv, text, link, vlink, alink, lang, axis, hreflang, rel, rev, charset, codetype, media, bgcolor, color, face, target, enctype, accept-charset, accept, language

img:name
map:name
frame:name
iframe:name
form:name
input:name
button:name
select:name
textarea:name
a:type
link:type
object:type
param:type
style:type

Case-neutral version, width, start, border, cellspacing, cellpadding, char, charoff, span, rowspan, colspan, height, coords, hspace, vspace, style, size, rows, cols, frameborder, marginwidth, marginheight, maxlength, tabindex, accesskey

A workable solution

All these rules and exceptions are not really a workable solution for browser developers. We need something more simple. Two simple lists that determine how an attribute should be treated: one for HTML and one for XHTML. Luckily, a workable solution isn’t that difficult to distill from our present lists. The two problematic attributes are name and type. By looking at how these attribute are used in the real-world we can easily classify these attributes as case-sensitive or case-insensitive.

First is the name attribute. This attribute determines the name of the variable that is created on the server after the form is submitted. Because most server-side scripting lanuguages are case-sensitive, the real world usage of this attribute is case-sensitive. Another possibility is that this attribute is used by a form element – which is deprecated and replaced by the case-sensitive id attribute. In the real word it should also not give any problem to treat this attribute in a case-sensitive way for the frame and the iframe element. Conclusion: just treat this attribute in a case-sensitive way.

The type attribute can also be easily solved. Because the type attribute is deprecated for use on the ul, ol, and li element we can simply ignore these elements. Also when XHTML is used, the use of the type attribute for button and input is easily solved in the real world. XHTML demands that you specify the value of both attributes in lower-case. If the XHTML document is valid it does not matter if it is compared using a case-insensitive method – even though is strictly not narrow enough. In the real world you are not going to run into any problems. So simply treat the type attribute in a case insensitive way.

So now we have our simple solution. Based on the information above we can produce a list of attributes that should be treated in a case-insensitive way. All other attributes can be treated in a case-sensitive way – even the neutral attributes.

Case-insensitive attributes for HTML documents lang, dir, http-equiv, text, link, vlink, alink, compact, align, frame, rules, valign, scope, axis, nowrap, hreflang, rel, rev, charset, codetype, declare, valuetype, shape, nohref, media, bgcolor, clear, color, face, noshade, noresize, scrolling, target, method, enctype, accept-charset, accept, checked, multiple, selected, disabled, readonly, language, defer, type
Case-insensitive attributes for XHTML documents http-equiv, text, link, vlink, alink, lang, axis, hreflang, rel, rev, charset, codetype, media, bgcolor, color, face, target, enctype, accept-charset, accept, language, type

2 Responses to “CSS selector bugs: Case sensitivity”

  1. zcorpan wrote on July 28th, 2007 at 11:36 am

    According to HTML5, all attribute values are case-sensitive as far as Selectors is concerned, both for the text/html and XML serializations.

  2. Diego Perini wrote on February 12th, 2009 at 1:00 am

    Rakaz,
    I thought you don’t mind if I used your research in my project…hope to be implementing it correctly. It is MIT Licensed:

    http://github.com/dperini/nwevents

    see in “src/nwmatcher.js” this is my selector engine. Stable mirrored in Google Code SVN.

    I have inserted a comment with references to your blog post.

    Thank you for the efforts and the time you have dedicated to this.

    Diego