CSS selector bugs: Case sensitivity
Monday, 16 October 2006
The previous version of the CSS selector test contained a test case for determining if the value of an attribute selector was compared in a case-insensitive way. We only tested the align attribute – which should be treated in a case-insensitive way. But there are also a lot of other attributes which should be tested in a case-sensitive way.
When the CSS attribute selectors are evaluated the browser will compare the value of an element attribute with the value specified in the selector. If there is a match it will apply certain style rules to the element to which that attribute belongs.
There are 6 ways how an attribute value could be compared:
- is it equal to the specified value,
- is it a space-separated list that contains the specified value,
- is it a hyphen-separated list of which the first word is equal to the specified value,
- does it begin with the specified value
- does it contain the specified value
- does it end with the specified value
What does the CSS specification say?
The problem is that all of these comparisons can be handled in two different ways: case-sensitive and case-insensitive. Which one should be used is not defined by the CSS standard. Instead the specification tell us explicitly that the case-sensitivity is determined by the document language: in our case HTML. So we should look at the HTML standard to find out which should be used.
The case-sensitivity of attribute names and values in selectors depends on the document language.
What does the HTML standard say about this?
The HTML standard makes it quite difficult for us. There is no single correct way. Each attribute has its own rules. So basically we need to make a list of which attributes should be handled in which way.
Each attribute definition includes information about the case-sensitivity of its values.
Fortunately there are a couple categories:
Category | Description |
---|---|
CS | The value is case-sensitive (i.e., user agents interpret “a” and “A” differently). |
CI | The value is case-insensitive (i.e., user agents interpret “a” and “A” as the same). |
CN | The value is not subject to case changes, e.g., because it is a number or a character from the document character set. |
CA | The element or attribute definition itself gives case information. |
CT | Consult the type definition for details about case-sensitivity. |
The first and second categories are clear. The first should be treated in a case-sensitive way, the second in a case-insensitive way. The third is also clear: it can be treated in either way, it should not matter. The fourth and fifth are bigger problems – we should look at the attribute definition or the type of the attribute.
The only attribute that uses the CA category is the value attribute of the input
element. Because we do now know how each value should be treated – it depends on the purpose of the input
element. If the input
is used for entering a phone number it should be considered case-neutral, but for entering most other information it is important that the information is treated in a case-sensitive way. Because we do not know the purpose of the input
element we need to treat everything in a case-sensitive way.
All elements that use the CT category uses one of the following three types: script, uri, uri list. Lists should be treated the same way as the type of which it consists, so we need to look at the script and uri types:
URIs in general are case-sensitive. There may be URIs, or parts of URIs, where case doesn’t matter (e.g., machine names), but identifying these may not be easy. Users should always consider that URIs are case-sensitive (to be on the safe side).
So, uris should be treated in a case-sensitive way – just to be on the safe side. But how about scripts?
The case-sensitivity of script data depends on the scripting language.
The case-sensitivity of the script type is determined by the language of the script. We do not know which language is used by the script type, but we do know that the Javascript is case-sensitive. So just like the uri type we can consider this type to be case-sensitive – again just to be sure.
The CT category should be treated in the same way as the CS category.
Some more problems
Unfortunately there are a couple of problems in the HTML specification. The category for an attribute is dependant on the element to which it belongs. For example: the name
attribute should be treated in a case-sensitive way if it belongs to an a
element. If it belongs to an input
element it should be treated in a case-insensitive way. Luckily this only applies to a limited set of attributes: type
, name
, value
, weight
and height
.
The weight
and height
attributes generally belong to the CN category. It does not matter how they are treated. If the weight
and height
attribute belong to the applet
element this changes. If that is the case it should belong to the CI category. This probably is a bug in the specification, because the type is the same in both cases: length. And the length type specifically tells us:
Length values are case-neutral.
So we can treat the weight
and height
attribute as case-neutral regardless of the element to which it belongs.
The value
attribute is can be the CS, CA or even CN. We already determined that we should treat attribute in the CA category in a case-sensitive way. We also know that it does not matter how an attribute in the CN category is treated. So we can safely treat all value attributes in a case-sensitive way.
The name
attribute is one that we cannot solve. For some elements it should be treated in a case-sensitive way, for example the a
element. Other elements – such as the input
element expect it to be treated in a case-insensitive way. There is no way around it. We need to look at the element if we want to know how to treat this attribute.
The type
attribute is an even bigger problem. Not only does this attribute depend on the element it belongs to, but in one case it even depends on the parent element of the element it belongs to. Consider the following possibilities: the object
element defines the type
attribute as a content-type. This should be evaluated in a case-insensitive way. The same applies to the input
element and the ul
element. Both are case-insensitive. The type
attribute of the ol
element it is case-sensitive – the case of the value directly influences how the list is displayed.
The li
element is what makes it even worse. If the li
element is a child of an ul
element its type
attribute can contain the same values as the type
attribute of the ul
. This makes the type
attribute of the li
element case-insensitive. However if it is a child of an ol
element, it can only have the same values of the ol
element – making the type attribute of the li
element case-sensitive. There is simply no proper way to make it easier.
The last problem that we need to solve are a couple of attributes were simply forgotten in the HTML specification. Nowhere in the specification is there a definition how it should be treated:
param > value
param > name
img > align
object > align
applet > align
Given that there are a couple of other elements that do define a behaviour for the align
attribute we can safely assume that these should behave in the same way. Other elements that use the align
attribute treat it in a case-insensitive way. So we should also do this for the img
, object
and applet
element.
The problem of the value
attribute is also simple to solve. We already discovered that it in all other cases it sould be treated in a case-sensitive way. We should also to this for the param
element.
The name
attribute is a bit more difficult. There is no single way to treat this attribute. Some elements treat it in a case-sensitive way, some in a case insensitive way. We do know that the param element is dependant on external factors. It is up to the browser plugin to the determine how it should be handled. And once again, because we do not know how it should be treated we should be on the safe side. Treat it in a case-sensitive way.
Our list of attributes
Case-sensitive | title, id, class, content, scheme, datetime, summary, headers, abbr, standby, code, object, alt, label, prompt, for, value, profile, background, cite, href, src, longdesc, usemap, classid, codebase, data, archive, action, onload, onunload, onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onfocus, onblur, onkeypress, onkeydown, onkeyup, onsubmit, onreset, onselect, onchange
param:name |
---|---|
Case-insensitive | lang, dir, http-equiv, text, link, vlink, alink, compact, align, frame, rules, valign, scope, axis, nowrap, hreflang, rel, rev, charset, codetype, declare, valuetype, shape, nohref, media, bgcolor, clear, color, face, noshade, noresize, scrolling, target, method, enctype, accept-charset, accept, checked, multiple, selected, disabled, readonly, language, defer
img:name |
Case-neutral | version, width, start, border, cellspacing, cellpadding, char, charoff, span, rowspan, colspan, height, coords, hspace, vspace, style, size, rows, cols, frameborder, marginwidth, marginheight, maxlength, tabindex, accesskey |
What about XHTML?
Up till now we’ve only talked about HTML and the HTML specification. Although XHTML looks like HTML, is a different language. Because the case-sensitivity is determined by the language, we need to take a complete new look at all the attributes. What does the XHTML spec say about case-sensitivity:
HTML 4 and XHTML both have some attributes that have pre-defined and limited sets of values (e.g. the type attribute of the input element). In SGML and XML, these are called enumerated attributes. Under HTML 4, the interpretation of these values was case-insensitive, so a value of TEXT was equivalent to a value of text. Under XML, the interpretation of these values is case-sensitive, and in XHTML 1 all of these values are defined in lower-case.
If an attribute consists of a list of defined values it should be treated in a case-sensitive way. Previously these values could be treated in a case-insensitive way. The same also applies to boolean attributes. These should contain their default value, which is defined in the DTD as a case-sensitive string.
The XHTML specification does not indicate that any other attributes should be treated in a different way. We can compare all the other attributes the same way as we used to do for HTML attribute. So the table only changes a little for XHTML documents.
Case-sensitive | title, id, class, content, scheme, datetime, summary, headers, abbr, standby, code, object, alt, label, prompt, for, value, profile, background, cite, href, src, longdesc, usemap, classid, codebase, data, archive, action, onload, onunload, onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout, onfocus, onblur, onkeypress, onkeydown, onkeyup, onsubmit, onreset, onselect, onchange, dir, compact, align, scope, nowrap, frame, rules, valign, declare, valuetype, shape, nohref, clear, noshade, noresize, scrolling, method, checked, multiple, selected, disabled, readonly, defer
param:name |
---|---|
Case-insensitive | http-equiv, text, link, vlink, alink, lang, axis, hreflang, rel, rev, charset, codetype, media, bgcolor, color, face, target, enctype, accept-charset, accept, language
img:name |
Case-neutral | version, width, start, border, cellspacing, cellpadding, char, charoff, span, rowspan, colspan, height, coords, hspace, vspace, style, size, rows, cols, frameborder, marginwidth, marginheight, maxlength, tabindex, accesskey |
A workable solution
All these rules and exceptions are not really a workable solution for browser developers. We need something more simple. Two simple lists that determine how an attribute should be treated: one for HTML and one for XHTML. Luckily, a workable solution isn’t that difficult to distill from our present lists. The two problematic attributes are name
and type
. By looking at how these attribute are used in the real-world we can easily classify these attributes as case-sensitive or case-insensitive.
First is the name
attribute. This attribute determines the name of the variable that is created on the server after the form is submitted. Because most server-side scripting lanuguages are case-sensitive, the real world usage of this attribute is case-sensitive. Another possibility is that this attribute is used by a form
element – which is deprecated and replaced by the case-sensitive id
attribute. In the real word it should also not give any problem to treat this attribute in a case-sensitive way for the frame
and the iframe
element. Conclusion: just treat this attribute in a case-sensitive way.
The type
attribute can also be easily solved. Because the type
attribute is deprecated for use on the ul
, ol
, and li
element we can simply ignore these elements. Also when XHTML is used, the use of the type
attribute for button
and input
is easily solved in the real world. XHTML demands that you specify the value of both attributes in lower-case. If the XHTML document is valid it does not matter if it is compared using a case-insensitive method – even though is strictly not narrow enough. In the real world you are not going to run into any problems. So simply treat the type
attribute in a case insensitive way.
So now we have our simple solution. Based on the information above we can produce a list of attributes that should be treated in a case-insensitive way. All other attributes can be treated in a case-sensitive way – even the neutral attributes.
Case-insensitive attributes for HTML documents | lang, dir, http-equiv, text, link, vlink, alink, compact, align, frame, rules, valign, scope, axis, nowrap, hreflang, rel, rev, charset, codetype, declare, valuetype, shape, nohref, media, bgcolor, clear, color, face, noshade, noresize, scrolling, target, method, enctype, accept-charset, accept, checked, multiple, selected, disabled, readonly, language, defer, type |
---|---|
Case-insensitive attributes for XHTML documents | http-equiv, text, link, vlink, alink, lang, axis, hreflang, rel, rev, charset, codetype, media, bgcolor, color, face, target, enctype, accept-charset, accept, language, type |
According to HTML5, all attribute values are case-sensitive as far as Selectors is concerned, both for the text/html and XML serializations.
Rakaz,
I thought you don’t mind if I used your research in my project…hope to be implementing it correctly. It is MIT Licensed:
http://github.com/dperini/nwevents
see in “src/nwmatcher.js” this is my selector engine. Stable mirrored in Google Code SVN.
I have inserted a comment with references to your blog post.
Thank you for the efforts and the time you have dedicated to this.
Diego