MAMA: Markup report, part 4: Forms, tables, and plug-ins, oh my!
Introduction
In this week's overview we wrap up MAMA's look at markup by covering its most complex structures—forms, tables, and plug-ins. These topics take Web pages from a simple series of text, links, images, and lists to an entirely different level. Forms greatly expand user interaction possibilities. Tables generate axial relationships—which authors have creatively distorted for their most popular (and questionable) use, creating pixel-perfect grid based layouts. Plug-ins afford extensibility beyond HTML's stock capabilities. Without any of these features, HTML would be a barren, unexciting markup language. For a deeper look at these areas and more, the following MAMA article topics are also available this week:
Forms
Aside from hyperlinks, forms are the main way in which users interact with the Web. Among their varied critical uses, forms allow people to find things with search engines, publish their thoughts with blogging systems, and buy things on e-commerce sites. Forms in general are very popular—found on up to one-third of all pages analyzed.
Elements used in forms
The popularity of the main types of form elements varies widely, and sometimes
surprisingly. For example, almost every FORM has an
INPUT, but relatively few make use of
TEXTAREA. Such variations may be due to a number of
factors, including inherent biases in MAMA's current URL set (a majority of MAMA's
URLs are Surface/Home
pages, which rarely have forms on them, apart from the increasingly-popular search field). The intended use of a Web page often dictates the types of elements
used, including form elements.
| ELEMENT | Frequency | ELEMENT | Frequency | |
|---|---|---|---|---|
FORM | 1,040,771 | TEXTAREA | 36,410 | |
INPUT | 1,008,545 | FIELDSET | 31,673 | |
SELECT | 285,362 | LEGEND | 18,269 | |
OPTION | 281,923 | BUTTON | 11,455 | |
LABEL | 159,631 | OPTGROUP | 5,348 |
The FORM element
We will start our look at form elements by looking at its main container element:
FORM. It was detected in 1,040,771 of MAMA's URLs.
Notice that the Action attribute is used on most of these
pages—it specifies what to do with the information the form is collecting.
This attribute is required, so the dominance here is understandable. The
Method attribute is only slightly less popular than the
Action attribute (89.4% of all forms usage).
| Form Attribute | Frequency |
|---|---|
Action | 977,934 |
Method | 930,343 |
Name | 570,643 |
Id | 266,886 |
Target | 199,085 |
The Method attribute
Approximately 70% of pages that specify an explicit HTTP Method use the
"post" method, while ~46% use the "get"
method. This would indicate a clear authoring preference for the
"post" method, but there are a few factors to
consider. About 15% of the pages specifying the Method
attribute use multiple forms on the page that mix both
"post" and "get" methods.
There are 110,428 URLs that used the FORM element
with no Method attribute; "get"
is the implied default value in such cases. This brings the relative preferences
for Method amongst all FORM
usages much closer: 62.2% for "post" and 51.6% for
an explicit or implied "get" value.
| Method value | Frequency |
|---|---|
| post | 647,234 |
| get | 426,192 |
The INPUT element
This popular element is used in 96.9% of all documents using forms. With the element's functionality being as overloaded as it is, this popularity is both understandable and expected. Some of its attributes are also very popular.
| ELEMENT/Attribute | Frequency | ELEMENT/Attribute | Frequency | |
|---|---|---|---|---|
INPUT | 1,008,545 | Maxlength | 329,415 | |
Type | 1,005,152 | Alt | 213,924 | |
Name | 990,058 | Border | 172,843 | |
Value | 947,403 | Checked | 135,049 | |
Size | 656,354 | Width | 120,420 | |
Src | 335,990 | Height | 119,902 |
The Type attribute
Many of the attributes for the INPUT element are only
applicable to specific Type attribute values, so we must
examine this attribute's values first.
| Attribute value | Frequency | Attribute value | Frequency | |
|---|---|---|---|---|
| text | 806,926 | radio | 159,626 | |
| hidden | 733,126 | empty | 110,971 | |
| submit | 568,445 | checkbox | 81,260 | |
| image | 337,286 | button | 71,031 | |
| password | 167,098 | reset | 17,417 |
We can now look more deeply at the various uses of the messy INPUT element:
- The "empty" value indicates that an
INPUTelement did not have aTypevalue at all. In such situations, a widget is interpreted asType="Text". In all, 79,050 URLs usedINPUTelements where none of them specified aTypeattribute. - In the early days of forms, "Submit" buttons were usually paired with a "Reset" button, but today, that seems to be passé. By comparison, "Reset" is rarely encountered now.
- The "Submit" and "Image" types: Because "Image" is a type of submittal, and each will often be used to the exclusion of the other, looking at their combined totals shows that submittal is the most popular function of forms (more popular than "Text"). This is actually an expected result.
- The
Type="Image" related attributes:WidthandHspace(horizontal dimensions) have just a slight edge overHeightandVspace(vertical dimensions), just like they do with theIMGelement. - The exclusive choice widget,
Type="Radio", is twice as popular as the multi-choiceType="Checkbox" widget.
Tables
Tables have a bad reputation among the markup purists in the development community,
because many authors often use them solely for Web page layout. Tables
generally increase the complexity of documents and can make them more difficult
to maintain. Authors do not really see these factors as significant drawbacks, though,
judging by the overwhelming popularity of layout tables in the MAMA result set. In practice, the use of presentational
tables by authors is what makes the main table-related elements some of the most
popular sub-elements of BODY, after the A
and IMG elements. The most frequently occurring of
these is the TABLE element, found in 2,894,184 of MAMA's
URLs (82.5%). Authors have a definite preference for the table elements they use.
Almost every table uses the TABLE, TR
and TD elements. All of the other elements are used rarely
by comparison. CAPTION, COL,
THEAD, COLGROUP, and
TFOOT are all used in less than 1% of
TABLE occurrences.
| ELEMENT | Frequency | ELEMENT | Frequency | |
|---|---|---|---|---|
TABLE | 2,894,184 | CAPTION | 23,306 | |
TD | 2,891,972 | COL | 21,775 | |
TR | 2,891,205 | THEAD | 21,474 | |
TBODY | 364,542 | COLGROUP | 12,225 | |
TH | 148,344 | TFOOT | 3,947 |
Attributes of the TABLE element
This wrapper element for table structures is (naturally) the most popular element
of its type. It ranks #8 overall in element popularity, used in 82.47% of all
MAMA's URLs. Many attributes were detected for this element, only some of which
are in the standards. A few of these attributes are VERY popular
with authors - Border, Width,
Cellpadding and Cellspacing
are used in ~90% of all URLs that use tables. Usage of other attributes, like
Rules and Frame barely
register; they are used in less than 0.5% of all TABLE
cases.
| Attribute | Frequency | Attribute | Frequency | |
|---|---|---|---|---|
Border | 2,691,899 | Height | 1,220,050 | |
Width | 2,637,117 | Bgcolor | 893,573 | |
Cellpadding | 2,585,020 | Bordercolor | 417,650 | |
Cellspacing | 2,578,416 | Background | 281,209 | |
Align | 1,226,047 | Valign | 87,291 |
The TD and TH elements
These two elements are grouped together because they mostly share the same
attributes and have very similar usage. But their usage rates could not be more
different. The most popular table sub-element is TD (detected in 2,891,972 URLs), and it is the 9th most popular element overall (used in 82.4%
of all URLs in MAMA and 99.9% of all URLs using the TABLE
element). The TH sub-element, on the other hand, is used in only 5.1%
of URLs using the TABLE element. Because of the
inherent attribute overlap between TD and
TH, it can be interesting to compare attribute usage
rates between the two elements. Percentages of the total element usage are
also provided to help cross-comparisons.
| TD Attribute | Frequency | % of Element |
TH Attribute | Frequency | % of Element | |
|---|---|---|---|---|---|---|
TD | 2,891,972 | -- | TH | 148,344 | -- | |
Width | 2,324,752 | 80.4% | Valign | 46,799 | 31.6% | |
Valign | 2,189,287 | 75.7% | Width | 45,709 | 30.8% | |
Align | 1,977,367 | 68.4% | Colspan | 38,587 | 26.0% | |
Colspan | 1,711,437 | 59.2% | Align | 35,710 | 24.1% | |
Height | 1,672,129 | 57.8% | Scope | 30,111 | 20.3% | |
Bgcolor | 1,306,542 | 45.2% | Height | 28,195 | 19.0% | |
Rowspan | 901,303 | 31.2% | Bgcolor | 22,406 | 15.1% | |
Background | 714,706 | 24.7% | Nowrap | 10,469 | 7.1% | |
Nowrap | 353,572 | 12.2% | Rowspan | 6,324 | 4.3% |
How deeply are tables nested?
One of the features requested for MAMA was the ability to detect deeply-nested
tables. Such structures can be excellent stress tests for a browser. In theory,
every TABLE open tag should have a corresponding closing
tag. As MAMA traversed a document, any TABLE open tags
added 1 to the current depth counter. A closing TABLE
tag would subtract 1 from the depth counter. When the depth counter hit a new high
score for the document, that value became the new "maximum table depth". This rather
simplistic system yielded a number for a document's "maximum table nesting depth"—it does not necessarily mean that the open and closing tags are properly nested;
that is another issue entirely. The average nesting depth when tables were used was 2.77.
The maximum nesting depth discovered was an astounding 745 deep at
http://www.artsforeveryone.com/.
Plug-ins
The Web has multiple elements to handle plug-ins because of simple evolution.
At first, there was no standardized way to use plug-ins, so solutions arose
haphazardly—APPLET, EMBED,
and PARAM. The standards process produced a cohesive
solution in the OBJECT element, but authoring inertia
seems to indicate that APPLET and EMBED
are not going anywhere. Rather than the OBJECT element
being used instead of EMBED, the majority of OBJECT tags are used
in conjunction with EMBED elements.
In all, 503,783 URLs use both EMBED and OBJECT
elements (94.5% of all OBJECT and 92.3% of all
EMBED instances).
| ELEMENT | Frequency |
|---|---|
PARAM | 576,702 |
OBJECT | 533,343 |
EMBED | 545,734 |
APPLET | 52,160 |
Flash usage
MAMA tried to discover evidence of Flash usage in every document it analyzed. It had to resort to looking for a number of different factors, as authors can use Flash in many ways. Its use was detected by satisfying one or more of the following components:
- Any
PARAMelement that contained the substrings ".swf" or "flash" - Any MIME types containing the substring "flash"
from getting any
EMBED[Src] orOBJECT[Data] URLs - Any scripting component containing the substring "flash" or ".swf"
Using these criteria, 1,176,227 URLs were found to be using Flash. This is a
MUCH higher result than one would expect by looking solely at
the EMBED and OBJECT elements.
This means that either some aspect(s) of MAMA's detection mechanism are too
relaxed, or that some part of the analysis is flagging a lot of positive matches
that EMBED or OBJECT detection
alone does not catch. If any part of the above detection is suspect, it is likely
to be the scripting detection of Flash (due to the simplistic nature of its
substring search). Judging by anecdotal evidence seen over the years, the number
is probably pretty accurate; scripting is frequently given the duty of dynamically
generating plug-in markup.
Java usage
As with Flash, there were a number of methods MAMA used to detect Java usage. The following criteria were used to judge whether or not Java was being used in a URL and resulted in the detection of 53,688 matches:
- Any usage of the
APPLETelement - Any
PARAMelement that contained the substrings ".class" or "java" - Any MIME types containing the substring "java"
from getting any
OBJECT[Data] URLs - Any scripting component containing the substring "application/java-vm"
Conclusion
Now that we have spent several weeks looking intensely at HTML's many markup topics (and rightly so), we will next be turning our attention to other important Web page technologies that are vital to address in any examination of the Web. Next week we will look at the details of CSS usage: the whos, whats, wheres, whens, whys, and hows of the way CSS is used.
This article is licensed under a Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.
Comments
The forum archive of this article is still available on My Opera.
No new comments accepted.