What It Is
Language for writing web pages
Appearance: color, italics
Semantics: headings, lists, etc.
Relationships: links
Many good WYSIWYG editors available
Many people still insist on formatting by hand
Have to understand structure and rules to manipulate it using programs
In The Beginning
SGML
Standard Generalized Markup Language
Developed 1969-86 by Charles Goldfarb and others at IBM
Add information to medical and legal documents so that computers could process them
Large and complex (500-page specification)
And Then
HTML
HyperText Markup Language
First developed by Tim Berners-Lee in 1989
(Much) simpler than SGML: anyone could write it
Pragmatic: mixed appearance, semantics, and relationships
Inextensible: authors could not define new markup rules
Today
XML
eXtensible Markup Language
First version approved in 1998
More complex than HTML (but still simpler than SGML)
Authors can define new markup rules
XHTML
Re-definition of HTML using XML's rules
Enforces rules that HTML authors often ignored
Elements and Tags
XHTML document is a tree of elements
Show elements with tags
Tags enclosed in angle brackets <>
Opening tag <tag> must be matched by closing tag </tag>
Everything in between is called the tag's content
Tags must be closed in reverse order of being opened
<a><b>content</b></a> is legal
<a><b>content</a></b> is not
A Simple Web Page
<html> <body> <h1>Page Title</h1> <p> First <em>paragraph</em>. </p> <p>Second <b>paragraph</b>.</p> </body> </html>
Appearance
Page TitleFirst paragraph. Second paragraph. |
Tags Used (And Their Close Kin)
| html | HTML page (must be outermost) |
| body | Body of page (visible content) |
| h1 | Level-1 heading |
| h2, h3, h4 | Sub-heading, sub-sub-heading, etc. |
| p | Paragraph |
| em | Emphasized (italics) |
| b | Bold |
Page Format
Every page has exactly one html element
The root element of the document
May contain one head element
Page title, author, version, etc.
Not displayed: used by search engines, etc.
Must contain one body element
Put text directly into page
Refer indirectly to images, sound, video, etc.
A Slightly Larger Page
<html> <head> <title>A Slightly Larger Page</title> </head> <body> <!-- displayed content starts here --> <h1>A Slightly Larger Page</h1> <p align="center">First paragraph.</p> <p>Second paragraph.</p> <address>Greg Wilson<br/>Room 2002</address> </body> </html>
Larger Appearance
A Slightly Larger PageFirst paragraph. Second paragraph. Greg WilsonRoom 2002 |
Short Forms and Comments
Elements with no content can be written <x/> instead of <x></x>
<br/> is a line break
<hr/> is a horizontal rule
Comments like this: <!-- a comment -->
Attributes
<p align="centered"> creates a centered paragraph
align="centered" is an attribute
Element may have any number of attributes...
...but each must be unique
Attributes must have values, which must be quoted
HTML didn't insist, but XML and XHTML do
<tag color=blue> is illegal (unquoted)
<tag invisible> is also illegal (no value)
Special Characters
Use escape sequences for special characters
Don't need to escape " in normal text
Although technically you should
| < | < | > | > |
| " | " | & | & |
| ¥ | ¥ | © | © |
| µ | µ | ± | ± |
| | non-breaking space | ||
Images
Use img element to link to image
src attribute specifies image file
alt attribute useful for indexing, and for the visually impared
<html> <body> <h1>Coyote</h1> <img src="../img/people/coyote.gif" alt="Coyote"> </body> </html>
Image Appearance
Coyote
|
Lists
Use <ol> for ordered list
List items marked with <li>
Use <ul> for unordered list
List items also marked with <li>
Use <dl> for definition list (dictionary)
Definition terms marked with <dt>
Definition data marked with <dd>
List Example
<ul> <li> First unordered </li> <li> Second unordered </li> </ul> <ol> <li> First ordered </li> <li> Second ordered </li> </ol> <dl> <dt> Term </dt> <dd> and definition </dd> </dl>
List Appearance
|
Tables
Three main elements (with many options):
<table> is entire table
<tr> is table row
<td> is table data (single column element)
Often abused for general layout
HTML doesn't provide better mechanisms
Frames are not better
Table Example
<table cellspacing="2" cellpadding="2" border="2"> <tr> <td>upper left</td> <td>upper right</td> </tr> <tr> <td>lower left</td> <td>lower right</td> </tr> </table>
| upper left | upper right |
| lower left | lower right |
Links
Links are what makes it hypertext
http://www.ddj.com/index.html has:
Protocol: HTTP
Host: www.ddj.com
Path: /index.html
Protocol is how to communicate
Host is who to talk to
Path is what to get
Interpreted by host machine
Putting Links in Pages
<html> <body> <p> <a href="http://www.ddj.com">DDJ</a> <br/> <a href="simple_page.html">Simple Page</a> <br/> <a href="link_page.html">This Page</a> </p> </body> </html>
Link Appearance
Links in Detail
DDJ link only has host name
Up to web server to decide how to handle request
Typically return a default page like index.html
Second and third links don't specify protocol or host
Fetch file from local machine
Can also use http://localhost/... for this
Always best to use relative paths (so that directory trees can be moved around)
Anchors
Use <a name="text"> to name an element
Link to it with <a href="wherever#text">
Yes, they should have used a different element...
<a href="#refs">References</a> ...other text... <a name="refs"><h1>References</h1></a> <p>Look <a href="pdb.html#history"> here</a> for the history of PDB.</p>
$Id: xhtml.html,v 1.1 2007/01/02 01:57:50 reid Exp $