Friday, January 2, 2009

XHTML Tutorial


What Is XHTML?

  • XHTML stands for EXtensible HyperText Markup Language
  • XHTML is aimed to replace HTML
  • XHTML is almost identical to HTML 4.01
  • XHTML is a stricter and cleaner version of HTML
  • XHTML is HTML defined as an XML application
  • XHTML is a W3C Recommendation

What You Should Already Know

Before you continue you should have a basic understanding of the following:

  • HTML and the basics of building web pages

If you want to study HTML first, please read our HTML tutorial.


XHTML is a combination of HTML and XML (EXtensible Markup Language).

XHTML consists of all the elements in HTML 4.01 combined with the syntax of XML.


Why XHTML?

We have reached a point where many pages on the WWW contain "bad" HTML.

The following HTML code will work fine if you view it in a browser, even if it does not follow the HTML rules:

This is bad HTML

Bad HTML

XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.

XML was designed to describe data and HTML was designed to display data.

Today's market consists of different browser technologies, some browsers run Internet on computers, and some browsers run Internet on mobile phones and hand helds. The last-mentioned do not have the resources or power to interpret a "bad" markup language.

Therefore - by combining HTML and XML, and their strengths, we got a markup language that is useful now and in the future - XHTML.

XHTML pages can be read by all XML enabled devices AND while waiting for the rest of the world to upgrade to XML supported browsers, XHTML gives you the opportunity to write "well-formed" documents now, that work in all browsers and that are backward browser compatible !!!


You can prepare yourself for XHTML by starting to write strict HTML.


How To Get Ready For XHTML

XHTML is not very different from HTML 4.01, so bringing your code up to the 4.01 standard is a good start. Our complete HTML 4.01 reference can help you with that.

In addition, you should start NOW to write your HTML code in lowercase letters, and NEVER make the bad habit of skipping end tags like the

.

Happy coding!


The Most Important Differences:

  • XHTML elements must be properly nested
  • XHTML documents must be well-formed
  • Tag names must be in lowercase
  • All XHTML elements must be closed

Elements Must Be Properly Nested

In HTML some elements can be improperly nested within each other like this:

This text is bold and italic

In XHTML all elements must be properly nested within each other like this:

This text is bold and italic

Note: A common mistake in nested lists, is to forget that the inside list must be within a li element, like this:

      
  • Coffee
  •   
  • Tea
  •     
            
    • Black tea
    •       
    • Green tea
    •     
        
    • Milk
    • This is correct:

          
      • Coffee
      •   
      • Tea
      •     
                
        • Black tea
        •       
        • Green tea
        •     
            
            
        • Milk
        • Notice that we have inserted a tag after the tag in the "correct" code example.


          Documents Must Be Well-formed

          All XHTML elements must be nested within the root element. All other elements can have sub (children) elements. Sub elements must be in pairs and correctly nested within their parent element. The basic document structure is:

           ... 
           ... 


          Tag Names Must Be In Lower Case

          This is because XHTML documents are XML applications. XML is case-sensitive. Tags like
          and
          are interpreted as different tags.

          This is wrong:

          This is a paragraph

          This is correct:

          This is a paragraph


          All XHTML Elements Must Be Closed

          Non-empty elements must have an end tag.

          This is wrong:

          This is a paragraph

          This is another paragraph

          This is correct:

          This is a paragraph

          This is another paragraph


          Empty Elements Must Also Be Closed

          Empty elements must either have an end tag or the start tag must end with />.

          This is wrong:

          This is a break
          
          Here comes a horizontal rule:
          Here's an image Happy face

          This is correct:

          This is a break
          
           
          Here comes a horizontal rule:
          Here's an image Happy face


          IMPORTANT Compatibility Note:

          To make your XHTML compatible with today's browsers, you should add an extra space before the "/" symbol like this:
          , and this:


          .


          Writing XHTML demands a clean HTML syntax.


          Some More XHTML Syntax Rules:

          • Attribute names must be in lower case
          • Attribute values must be quoted
          • Attribute minimization is forbidden
          • The id attribute replaces the name attribute
          • The XHTML DTD defines mandatory elements

          Attribute Names Must Be In Lower Case

          This is wrong:

          This is correct:


          Attribute Values Must Be Quoted

          This is wrong:

          This is correct:


          Attribute Minimization Is Forbidden

          This is wrong:

          This is correct:

          Here is a list of the minimized attributes in HTML and how they should be written in XHTML:

          HTML

          XHTML

          compact

          compact="compact"

          checked

          checked="checked"

          declare

          declare="declare"

          readonly

          readonly="readonly"

          disabled

          disabled="disabled"

          selected

          selected="selected"

          defer

          defer="defer"

          ismap

          ismap="ismap"

          nohref

          nohref="nohref"

          noshade

          noshade="noshade"

          nowrap

          nowrap="nowrap"

          multiple

          multiple="multiple"

          noresize

          noresize="noresize"


          The id Attribute Replaces The name Attribute

          HTML 4.01 defines a name attribute for the elements a, applet, frame, iframe, img, and map. In XHTML the name attribute is deprecated. Use id instead.

          This is wrong:

          This is correct:

          Note: To interoperate with older browsers for a while, you should use both name and id, with identical attribute values, like this:

          IMPORTANT Compatibility Note:

          To make your XHTML compatible with today's browsers, you should add an extra space before the "/" symbol.


          The Lang Attribute

          The lang attribute applies to almost every XHTML element. It specifies the language of the content within an element.

          If you use the lang attribute in an element, you must add the xml:lang attribute, like this:

          Heia Norge!


          Mandatory XHTML Elements

          All XHTML documents must have a DOCTYPE declaration. The html, head and body elements must be present, and the title must be present inside the head element.

          This is a minimum XHTML document template:

          Title goes here
          Body text goes here

          Note: The DOCTYPE declaration is not a part of the XHTML document itself. It is not an XHTML element, and it should not have a closing tag.

          Note: The xmlns attribute inside the tag is required in XHTML. However, the validator on w3.org does not complain when this attribute is missing in an XHTML document. This is because "xmlns=http://www.w3.org/1999/xhtml" is a fixed value and will be added to the tag even if you do not include it.

          You will learn more about the XHTML document type definition in the next chapter.


          The XHTML standard defines three Document Type Definitions.

          The most common is the XHTML Transitional.


          The Is Mandatory

          An XHTML document consists of three main parts:

          • the DOCTYPE
          • the Head
          • the Body

          The basic document structure is:

          ... 
           ... 

          The DOCTYPE declaration should always be the first line in an XHTML document.


          An XHTML Example

          This is a simple (minimal) XHTML document:

          PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
          simple document

          a simple paragraph

          The DOCTYPE declaration defines the document type:

          PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

          The rest of the document looks like HTML:

          simple document

          a simple paragraph


          The 3 Document Type Definitions

          • DTD specifies the syntax of a web page in SGML.
          • DTD is used by SGML applications, such as HTML, to specify rules that apply to the markup of documents of a particular type, including a set of element and entity declarations.
          • XHTML is specified in an SGML document type definition or 'DTD'.
          • An XHTML DTD describes in precise, computer-readable language, the allowed syntax and grammar of XHTML markup.

          There are currently 3 XHTML document types:

          • STRICT
          • TRANSITIONAL
          • FRAMESET

          XHTML 1.0 specifies three XML document types that correspond to three DTDs: Strict, Transitional, and Frameset.

          XHTML 1.0 Strict

          PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

          Use this when you want really clean markup, free of presentational clutter. Use this together with Cascading Style Sheets.

          XHTML 1.0 Transitional

          PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

          Use this when you need to take advantage of HTML's presentational features and when you want to support browsers that don't understand Cascading Style Sheets.

          XHTML 1.0 Frameset

          PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

          Use this when you want to use HTML Frames to partition the browser window into two or more frames.


          How W3Schools Was Converted To XHTML

          W3Schools was converted from HTML to XHTML the weekend of 18. and 19. December 1999, by Hege Refsnes and Ståle Refsnes.

          To convert a Web site from HTML to XHTML, you should be familiar with the XHTML syntax rules of the previous chapters. The following steps were executed (in the order listed below):


          A DOCTYPE Definition Was Added

          The following DOCTYPE declaration was added as the first line of every page:

          "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

          Note that we used the transitional DTD. We could have chosen the strict DTD, but found it a little too "strict", and a little too hard to conform to.


          A Note About The DOCTYPE

          Your pages must have a DOCTYPE declaration if you want them to validate as correct XHTML.

          Be aware however, that newer browsers (like Internet Explorer 6) might treat your document differently depending on the declaration. If the browser reads a document with a DOCTYPE, it might treat the document as "correct". Malformed XHTML might fall over and display differently than without a DOCTYPE.


          Lower Case Tag And Attribute Names

          Since XHTML is case sensitive, and since XHTML only accepts lower case HTML tags and attribute names, a general search and replace function was executed to replace all upper case tags with lowercase tags. The same was done for attribute names. We have always tried to use lower case names in our Web, so the replace function did not produce many real substitutions.


          All Attributes Were Quoted

          Since the W3C XHTML 1.0 Recommendation states that all attribute values must be quoted, every page in the web was checked to see that attributes values were properly quoted. This was a time-consuming job, and we will surely never again forget to put quotes around our attribute values.


          Empty Tags:
          ,
          and

          Empty tags are not allowed in XHTML. The


          and
          tags should be replaced with
          and
          .

          This produced a problem with Netscape that misinterpreted the
          tag. We don't know why, but changing it to
          worked fine. After that discovery, a general search and replace function was executed to swap the tags.

          A few other tags (like the tag) were suffering from the same problem as above. We decided not to close the tags with , but with /> at the end of the tag. This was done manually.


          The Web Site Was Validated

          After that, all pages were validated against the official W3C DTD with this link: XHTML Validator. A few more errors were found and edited manually. The most common error was missing tags in lists.

          Should we have used a converting tool? Well, we could have used TIDY.

          Dave Raggett's HTML TIDY is a free utility for cleaning up HTML code. It also works great on the hard-to-read markup generated by specialized HTML editors and conversion tools, and it can help you identify where you need to pay further attention on making your pages more accessible to people with disabilities.

          The reason why we didn't use Tidy? We knew about XHTML when we started writing this web site. We knew that we had to use lowercase tag names and that we had to quote our attributes. So when the time came (to do the conversion), we simply had to test our pages against the W3C XHTML validator and correct the few mistakes. AND - we have learned a lot about writing "tidy" HTML code.


          An XHTML document is validated against a Document Type Definition.


          Validate XHTML With A DTD

          An XHTML document is validated against a Document Type Definition (DTD). Before an XHTML file can be properly validated, a correct DTD must be added as the first line of the file.

          The Strict DTD includes elements and attributes that have not been deprecated or do not appear in framesets:

          !DOCTYPE html PUBLIC
          "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"

          The Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes:

          !DOCTYPE html PUBLIC
          "-//W3C//DTD XHTML 1.0 Transitional//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"

          The Frameset DTD includes everything in the transitional DTD plus frames as well:

          !DOCTYPE html PUBLIC
          "-//W3C//DTD XHTML 1.0 Frameset//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"

          This is a simple XHTML document:

          PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
          simple document

          a simple paragraph


          Test Your XHTML With The W3C Validator

          Input your page address in the box below:

          The XHTML modularization model defines the modules of XHTML.


          Why XHTML Modularization?

          XHTML is a simple, but large language. XHTML contains most of the functionality a web developer will need.

          For some purposes XHTML is too large and complex, and for other purposes it is much too simple.

          By splitting XHTML into modules, the W3C (World Wide web Consortium) has created small and well-defined sets of XHTML elements that can be used separately for simple devices as well as combined with other XML standards into larger and more complex applications.

          With modular XHTML, product and application designers can:

          • Choose the elements to be supported by a device using standard XHTML building blocks.
          • Add extensions to XHTML, using XML, without breaking the XHTML standard.
          • Simplify XHTML for devices like hand held computers, mobile phones, TV, and home appliances.
          • Extend XHTML for complex applications by adding new XML functionality (like MathML, SVG, Voice and Multimedia).
          • Define XHTML profiles like XHTML Basic (a subset of XHTML for mobile devices).

          XHTML Modules

          W3C has split the definition of XHTML into 28 modules:

          Module name

          Description

          Applet Module

          Defines the deprecated* applet element.

          Base Module

          Defines the base element.

          Basic Forms Module

          Defines the basic forms elements.

          Basic Tables Module

          Defines the basic table elements.

          Bi-directional Text Module

          Defines the bdo element.

          Client Image Map Module

          Defines browser side image map elements.

          Edit Module

          Defines the editing elements del and ins.

          Forms Module

          Defines all elements used in forms.

          Frames Module

          Defines the frameset elements.

          Hypertext Module

          Defines the a element.

          Iframe Module

          Defines the iframe element.

          Image Module

          Defines the img element.

          Intrinsic Events Module

          Defines event attributes like onblur and onchange.

          Legacy Module

          Defines deprecated* elements and attributes.

          Link Module

          Defines the link element.

          List Module

          Defines the list elements ol, li, ul, dd, dt, and dl.

          Metainformation Module

          Defines the meta element.

          Name Identification Module

          Defines the deprecated* name attribute.

          Object Module

          Defines the object and param elements.

          Presentation Module

          Defines presentation elements like b and i.

          Scripting Module

          Defines the script and noscript elements.

          Server Image Map Module

          Defines server side image map elements.

          Structure Module

          Defines the elements html, head, title and body.

          Style Attribute Module

          Defines the style attribute.

          Style Sheet Module

          Defines the style element.

          Tables Module

          Defines the elements used in tables.

          Target Module

          Defines the target attribute.

          Text Module

          Defines text container elements like p and h1.

          * Deprecated elements should not be used in XHTML.


          XHTML tags can have attributes. The special attributes for each tag are listed under each tag description. The attributes listed here are the core and language attributes that are standard for all tags (with a few exceptions).


          Core Attributes

          Not valid in base, head, html, meta, param, script, style, and title elements.

          Attribute

          Value

          Description

          class

          class_rule or style_rule

          The class of the element

          id

          id_name

          A unique id for the element

          style

          style_definition

          An inline style definition

          title

          tooltip_text

          A text to display in a tool tip


          Language Attributes

          Not valid in base, br, frame, frameset, hr, iframe, param, and script elements.

          Attribute

          Value

          Description

          dir

          ltr | rtl

          Sets the text direction

          lang

          language_code

          Sets the language code


          Keyboard Attributes

          Attribute

          Value

          Description

          accesskey

          character

          Sets a keyboard shortcut to access an element

          tabindex

          number

          Sets the tab order of an element


          New to HTML 4.0 was the ability to let HTML events trigger actions in the browser, like starting a JavaScript when a user clicks on an HTML element. Below is a list of attributes that can be inserted into HTML tags to define event actions.

          If you want to learn more about programming with these events, you should study our JavaScript tutorial and our DHTML tutorial.


          Window Events

          Only valid in body and frameset elements

          Attribute

          Value

          Description

          onload

          script

          Script to be run when a document loads

          onunload

          script

          Script to be run when a document unloads


          Form Element Events

          Only valid in form elements.

          Attribute

          Value

          Description

          onchange

          script

          Script to be run when the element changes

          onsubmit

          script

          Script to be run when the form is submitted

          onreset

          script

          Script to be run when the form is reset

          onselect

          script

          Script to be run when the element is selected

          onblur

          script

          Script to be run when the element loses focus

          onfocus

          script

          Script to be run when the element gets focus


          Keyboard Events

          Not valid in base, bdo, br, frame, frameset, head, html, iframe, meta, param, script, style, and title elements.

          Attribute

          Value

          Description

          onkeydown

          script

          What to do when key is pressed

          onkeypress

          script

          What to do when key is pressed and released

          onkeyup

          script

          What to do when key is released


          Mouse Events

          Not valid in base, bdo, br, frame, frameset, head, html, iframe, meta, param, script, style, and title elements.

          Attribute

          Value

          Description

          onclick

          script

          What to do on a mouse click

          ondblclick

          script

          What to do on a mouse doubleclick

          onmousedown

          script

          What to do when mouse button is pressed

          onmousemove

          script

          What to do when mouse pointer moves

          onmouseover

          script

          What to do when mouse pointer moves over an element

          onmouseout

          script

          What to do when mouse pointer moves out of an element

          onmouseup

          script

          What to do when mouse button is released


          XHTML Summary

          This tutorial has taught you how to create stricter and cleaner HTML pages.

          You have learned that all XHTML elements must be properly nested, XHTML documents must be well-formed, all tag names must be in lowercase, and that all XHTML elements must be closed.

          You have also learned that all XHTML documents must have a DOCTYPE declaration, and that the html, head, title, and body elements must be present.

          For more information on XHTML, please look at our XHTML reference.


          0 comments:

          Search

          My Blog List