XHTML

What It Is

Language for writing web pages

Appearance: color, italics

Semantics: headings, lists, etc.

Relationships: links

Many good WYSIWYG editors available

Many people still insist on formatting by hand

Have to understand structure and rules to manipulate it using programs

In The Beginning

SGML

Standard Generalized Markup Language

Developed 1969-86 by Charles Goldfarb and others at IBM

Add information to medical and legal documents so that computers could process them

Large and complex (500-page specification)

And Then

HTML

HyperText Markup Language

First developed by Tim Berners-Lee in 1989

(Much) simpler than SGML: anyone could write it

Pragmatic: mixed appearance, semantics, and relationships

Inextensible: authors could not define new markup rules

Today

XML

eXtensible Markup Language

First version approved in 1998

More complex than HTML (but still simpler than SGML)

Authors can define new markup rules

XHTML

Re-definition of HTML using XML's rules

Enforces rules that HTML authors often ignored

Elements and Tags

XHTML document is a tree of elements

Show elements with tags

Tags enclosed in angle brackets <>

Opening tag <tag> must be matched by closing tag </tag>

Everything in between is called the tag's content

Tags must be closed in reverse order of being opened

<a><b>content</b></a> is legal

<a><b>content</a></b> is not

A Simple Web Page

<html>
<body>
<h1>Page Title</h1>
<p>
First <em>paragraph</em>.
</p>
<p>Second <b>paragraph</b>.</p>
</body>
</html>

Appearance

Page Title

First paragraph.

Second paragraph.

Tags Used (And Their Close Kin)

html HTML page (must be outermost)
body Body of page (visible content)
h1 Level-1 heading
h2, h3, h4 Sub-heading, sub-sub-heading, etc.
p Paragraph
em Emphasized (italics)
b Bold

Page Format

Every page has exactly one html element

The root element of the document

May contain one head element

Page title, author, version, etc.

Not displayed: used by search engines, etc.

Must contain one body element

Put text directly into page

Refer indirectly to images, sound, video, etc.

A Slightly Larger Page

<html>
<head>
<title>A Slightly Larger Page</title>
</head>
<body> <!-- displayed content starts here -->
<h1>A Slightly Larger Page</h1>
<p align="center">First paragraph.</p>
<p>Second paragraph.</p>
<address>Greg Wilson<br/>Room 2002</address>
</body>
</html>

Larger Appearance

A Slightly Larger Page

First paragraph.

Second paragraph.

Greg Wilson
Room 2002

Short Forms and Comments

Elements with no content can be written <x/> instead of <x></x>

<br/> is a line break

<hr/> is a horizontal rule

Comments like this: <!-- a comment -->

Attributes

<p align="centered"> creates a centered paragraph

align="centered" is an attribute

Element may have any number of attributes...

...but each must be unique

Attributes must have values, which must be quoted

HTML didn't insist, but XML and XHTML do

<tag color=blue> is illegal (unquoted)

<tag invisible> is also illegal (no value)

Special Characters

Use escape sequences for special characters

Don't need to escape " in normal text

Although technically you should

&lt; < &gt; >
&quot; " &amp; &
&yen; ¥ &copy; ©
&micro; µ &plusmn; ±
&nbsp; non-breaking space

Images

Use img element to link to image

src attribute specifies image file

alt attribute useful for indexing, and for the visually impared

<html>
<body>
<h1>Coyote</h1>
<img src="../img/people/coyote.gif" alt="Coyote">
</body>
</html>

Image Appearance

Coyote

Coyote

Lists

Use <ol> for ordered list

List items marked with <li>

Use <ul> for unordered list

List items also marked with <li>

Use <dl> for definition list (dictionary)

Definition terms marked with <dt>

Definition data marked with <dd>

List Example

<ul>
  <li> First unordered </li>
  <li> Second unordered </li>
</ul>
<ol>
  <li> First ordered </li>
  <li> Second ordered </li>
</ol>
<dl>
  <dt> Term </dt>
  <dd> and definition </dd>
</dl>

List Appearance

  • First unordered
  • Second unordered
  1. First ordered
  2. Second ordered
Term
and definition

Tables

Three main elements (with many options):

<table> is entire table

<tr> is table row

<td> is table data (single column element)

Often abused for general layout

HTML doesn't provide better mechanisms

Frames are not better

Table Example

<table cellspacing="2" cellpadding="2" border="2">
<tr>
  <td>upper left</td>
  <td>upper right</td>
</tr>
<tr>
  <td>lower left</td>
  <td>lower right</td>
</tr>
</table>
upper left upper right
lower left lower right

Links

Links are what makes it hypertext

http://www.ddj.com/index.html has:

Protocol: HTTP

Host: www.ddj.com

Path: /index.html

Protocol is how to communicate

Host is who to talk to

Path is what to get

Interpreted by host machine

Putting Links in Pages

<html>
<body>
<p>
<a href="http://www.ddj.com">DDJ</a>
<br/>
<a href="simple_page.html">Simple Page</a>
<br/>
<a href="link_page.html">This Page</a>
</p>
</body>
</html>

Link Appearance

DDJ
Simple Page
This Page

Links in Detail

DDJ link only has host name

Up to web server to decide how to handle request

Typically return a default page like index.html

Second and third links don't specify protocol or host

Fetch file from local machine

Can also use http://localhost/... for this

Always best to use relative paths (so that directory trees can be moved around)

Anchors

Use <a name="text"> to name an element

Link to it with <a href="wherever#text">

Yes, they should have used a different element...

<a href="#refs">References</a>
...other text...
<a name="refs"><h1>References</h1></a>
<p>Look <a href="pdb.html#history">
here</a> for the history of PDB.</p>

$Id: xhtml.html,v 1.1 2007/01/02 01:57:50 reid Exp $