Gordon Fletcher & Anita Greenhill
This proposal originally appeared in the November 1995 issue of the Australian Library Journal
An early version also appeared in the November 1995 issue of ASLIB Proceedings
The rapid growth of the Internet has outstripped conventions for citing material from that source. Distinguishing material as a [computer file] does not provide sufficient information about the platform necessary for reading it. The URL provides useful information, but augmenting it with other details such as author and date not only provides a meaningful citation, its similarity to conventional bibliographic notation lends a greater degree of legitimacy in academic discourse. The article considers information derivable from the URL, and HTML documents (including non-displayed source text), in order to derive bibliography and inline text citations for various kinds of material. The conventions proposed are applicable to Gopher, FTP, Usenet News, journals distributed by listservers, and email.
Despite the rapid growth of the Internet during 1994 and 1995 no adequate or consistent method of referencing material form this source has developed. Failure to address this issue will result in Internet resources not being awarded full recognition within academic discourse. Unless corrected, the significance of this oversight will be exacerbated as more academic journals become available on-line and more computer-literate students enter tertiary study. Furthermore, the status of researchers who have published in this medium will be affected and universities may deprive themselves of the staff best equipped to meet the challenges of the electronic age.
The methods used to reference material gained from the Internet should echo existing referencing styles. This consistency would improve the readability of references to Internet-based resources and would not distinguish the material solely because of its contemporary distribution material. The work of Li and Crane (1993) currently represents the only consistent system for referencing electronic resources. Its publication prior to the wide adoption of HTML prohibited the development of as concise an approach to Internet resources than might otherwise have been possible.
Although there are a large variety of referencing systems available, the solution being proposed is consistent with the Guide to Referencing (Dow 1995) and the Australian Government Publishing Service's Style Manual (1994).
This article proposes the development of a consistent bibliographical referencing method that emerges from available information in Internet-based file formats including the Hyper-text Mark-up Language (HTML). If adopted, it will avoid the necessity for inclusion of the [computer file] label that has become a de facto and inadequate solution to a complex problem. As an early contribution to the field of Internet citations, some of the proposals offer options for consideration rather than being prescriptive.
The Problem
Distinguishing material as [computer file] has a limited utility in its acknowledgment of the need for additional tools to display the material. This generic label, however, ignores the variety of different computer platforms and file formats that currently exist. Few of these formats are interchangeable with other formats or the specific readers that are used to decode the files. Distinguishing material as a [computer file] is not a sufficiently informative pointer for referencing computer--based material. An effort should be made to provide information that will enable the same material to be retrieved by a different researcher on the basis of the information provided by the author's bibliography. This can only be achieved with a more thorough incorporation of additional details into the body of the bibliographic entry. Among considerations for later retrieval of a file is its physical and symbolic location (these two aspects are roughly equivalent to the city and publisher entries in conventional reference), the file name (the title) and the file format.
Currently the conventions established by the World Wide Web (WWW) - to access a variety of file formats globally - provide the most concise and informative system of incorporating Internet-based resources into referenced works. This system is based upon the Uniform Resource Locator (URL). In graphical WWW browsers activating the open Show Locations reveals the URL of each document being accessed. The generic format of an URL is:
file_format://computer.type_of_computer.country_code/file_directory/file_name
Thus
http://www.gu.edu.au/gwis/cinemedia/CineMedia.home.html
refers to
- a Hyper-text document (http)
- based at the Griffith University WWW server (www.gu)
- which is an educational institution (edu)
- physically located in Australia (au).
- The file itself is nested within two directories (/gwis/cinemedia)
- and is identified by the name CineMedia.home.html
- the end html also indicates a Hyper-text document format.
The file name is case sensitive, thus cinemedia.HOME.HTML does not point to the same file as CinMedia.home.html. Researchers using this system for their computer-based material must be aware of this when referencing files.
HTML file format
The world Wide Web' Hyper-text Markup Language (HTML) is becoming the file format most commonly used for on-line academic journals. This is a result of the rapid growth of the World Wide Web and the ease-of-use of its graphical user interface. The basis of the the WWW is the Uniform Resource Locator. The uniformity of this file indentifier commends it as the basis for bibliographic referencing of WWW documents. An advantageous consequence of complex machine names and URLs is that the document written in HTML format also contain a simplified 'real world' title field. This information appears in the title bar of the WWW browser's viewing window when the site is accessed. All HTML documents can be assumed to have a file format, a title and a URL by virtue of their existence as a WWW document. The 'voluntary' or added aspects of the document may include the author's name, an institution (in place of publisher details) and a form of date.
Documents with authors present no problem and fortunately, as with most publishing and public expression, the anonymous HTML author is relatively rare. Documents with no personal author will often have an institutional body referred to within the document itself. This may take the form of a link another WWW site or a direct reference to the institution. Sites often contain a Hyper-text link at the base of the files that allows the reader to email the site's author. Using the email address of the author in place of their actual name provides a unique identifier that can be reused meaningfully. Failing these possibilities - and this situation would be relatively rare - the HTML file contains information that is not necessarily displayed directly by the HTML browser. Most browsers allow the user to read the document's source code. The source code is usually available through a series of commands such as View + Source. As HTML is straight text with readable, attached layout tags it is possible to the read the information in the <HEAD> section of the document to obtain an author's name or institution. Unfortunately the <HEAD> tag in HTML is now optional reducing the value of this last option for the future.
Consideration must be given to the fact that there are at least two types of HTML documents on the Internet. The most common type are collections. These sites have no content in the sense of readable or research material, as they are simply a collection of pointers to other sites compiled by the document's author. The value of the better collections is their systematic grouping of sites for perusal. These collections are the most likely to be anonymous but the least likely to be referenced academically. The less common content-provider sites contain electronic journals or on-line data. As a result these is a high probability that an author or an institution will be referred to in the text.
Identifying the publisher of HTML material may become increasingly difficult as a result of the commercialisation of the Internet. This process has allowed research groups and units to maintain a distinct server often with a commcercial (.com) suffix while still maintaining an association with their original institution. The first preference referencing a publisher would be to include the name of the institution most acknowledged for its assistance in the HTML document.
An educational (.edu) site is, however, more likely to be the source of the HTML file. The site referred to by the following URL contains institutional information:
http://www.gu.edu.au/gwis/hub/hub.home.html
This is however rather cryptic. There are Internet resources available that allow this site name to be searched for with a real world name being the returned resul. This form of retrieval is dependent on the site being registered with one of these indexes. This additional research requires detailed knowledge of the Internet's available resources - a condition that should, perhaps preface any situation in which Internet resources will be referenced.
In attempting to make references to Internet material retrievable at a later date, the differenc between citing hub.home.html and Humanities HUB (the title of the above example) is significant. The full URL allows a re-connection if it was correctly noted in the bibliography. The title of the resource can be used in a network search using one of the better search engines, such as Lycos, Yahoo, AltaVista or Webcrawler. Attempting this form of retrieval assumes that the site is registered with one of these indexes. A potential advantage in using the document's title rather than its URL location is its accessibility after a file is moved. Although most sites place a pointer to a file's new location when it is moved, these are usually only maintained for a short period. The moved file could, however, be retrieved by a network search using the titles as the keyword(s). The problem of providing onging retrievability can be hedged with the inclusion of both sets of details in a reference.
Alteration of files by the author could be likened to creation of a new edition at the expense of an earlier one. This situation would hopefully be relatively rare for academically orientated material. However referencing on-line daily newspapers cannot avoid this loss of referred material. In these situations the researchers would be advised to maintain a personal archive of Internet material. Students submitting coursework, who choose to reference from the Internet, should maintain an archive of the material they refer to.
Internet resources should be used judiciously. Referencing to an on-line article with its additional complexities should not be attempted if a printed version can be obtained. This helps reduce the complexity that occurs in bibliographies that refer to non-traditional mediums. Encouraging the referencing of paper versions does not devalue the importance of their electronic versions. On-line journals can be considered as working papers that allow the author to rapidly identify articles of interest and relevance. The scarer paper copy could then be obtained and used as a referencing copy, thus allowing the researcher to provide full referencing and page numbering details. However, this will become less viable with strained library resources and an increasing number of journals becoming solely available on-line.
The best methods for referencing HTML files can be derived from the referencing styles used for monographs. The author remains as the basis of a bibliography's order. The year of the original uploading could be included if it is acknowledged in the document itself. The title of the reference equates with the title that appears at the top of the document's window. The publisher details would be the name of the institution where the file was maintained - if it is ascertainable from the document. The place of publication would be entirely replaced by the
document's URL. Providing the full URL provides a level of redundancy in the entry
that allows the cross-checking of provided
references. This method also enables references to be made to those sites with no
apparent publisher. Using an article from
the C-Theory site as an example of the complete bibliography entry would be:
Brenner, Anita 1995, The Murder Trial: Genre or Event-Scene?, C-Theory,
http://english-server.hss.cmu.edu/ctheory/e-murder_trial.html
It would be associated with in-line references-
that appear as (Brenner, 1995). This
example is unusual as it relates to an ejournal that does not currently maintain volume
or numbering details. For that reason we
suggest treating the work as a separate entity with the title in italics and the 'journal title' as added information. If C-Theory
changed this policy the new reference would
place the article in quotation marks, italicise the title of the ejournal, and include
volume and number details.
A minimal bibliographic entry would
appear as:
The Murder Trial: Genre or Event-Scene?
-1995 http://english-server.hss.cmu.edu/ctheory/e-murder_trial.html
The corresponding in-line references
would be (The Murder Trial: Genre or
Event-Scene? ~1995) which still provides
some information for other researchers. This
type of reference would occur when only
the title and the URL were available to the
researcher.
As a slight deviation from conventional
referencing the final full stop in the bibliographic entry should be omitted to
avoid URL addressing confusion. To avoid
additional confusion the URL should only
be split with a space or soft return after
the forward slashes and not between words
of the location.
This system of referencing does not recognise those documents with an ejournal
affiliation. As files can be reflected (duplicated) at different sites with different URLs, recognising the institution as the publisher may not always be useful.
However where periodical-style information is evident or available through Hyper-text
links it should be acknowledged conventionally. As an example:
Foucault, Michel 1995, 'Madness, the
Absence of Work', excerpts, tr. P. Stasny
& D. Stengel, Critical lnquiry, vol.21, no.2,
Winter. http://www.uchicago~edu:80/u.schalarly/CritInq/v21n2.foucault.html
The institution/publisher acknowledgment is replaced by an ejournal affiliation.
For those ejournals that maintain conventional volume numbering, this
information could be included after the journal's name. The URL should still be required regardless of the amount of other bibliographic details available. This insistence upon a URL provides a recognisable label that replaces the [computer file] tag and
provides a consistent format for researchers to use.
There are a number of possibilities for
dealing with updated sites. We suggest
avoiding the no date (n.d.) or circa (c.) tags
used for hardcopy items: although there
may be uncertainty as to their exact date,
hardcopy texts are not subject to change
as may be the case with electronic documents. There are a number of mathematical
symbols that could be used to indicate the
year (or more precise date) the file was
accessed. These are the 'less than','less than
or equal to', and the 'approximately equal
to' symbols which would be used thus: <1995,
1995, or 1995. However we suggest the
most appropriate symbol for undated
Internet sites and particularly WWW documents is the tilde(~). This symbol is already
in common use as part of many URLs to
indicate the directory it immediately precedes is a personal directory and could be
extended to recognise that, used against a
date, the tilde recognises the referenced
material may be changeable.
We hope that the requirement for these
conventions to deal with undated sites will
be shortlived. Authors of academic documents on the Internet should become more
aware of the need for researchers to adequately reference their material and should
hegin providing full bibliographic data as
part of the document's header. The format
of referencing applied to monographs allows
these welcome additions to be included without the need for a new approach to
referencing HTML documents. Similarly,
extensions to the basic referencing details
of a monograph such as different editions
or different publisher (WWW locations) can
be readily accommodated.
Page numbers and pages themselves are
non-existent in HTML files. Documents contained within a single file are often referred
to in Internet jargon as a page. Longer documents are generally brokcn down by their
chapters into a series of individual HTML
files or pages. This current and established
Internet practice allows some level of pointers to be developed with the use of a
monograph-derived referencing system. As a fictitious example:
Collins, Jane ~1995, My life as an AOLer
- Introduction, Institute for Internet
Studies, http://www.bob.com/~jcollins/aolerl.html
Collins,Jane ~1995a, My life as an AOLer
- Hackerhood, Institute for Internet
Studies, http://www.bob.com/-jcollins/aoler2.html
In text this could be referenced as (Collins
~1995) and (Collins ~1995a).
Although this is messy as it replaces page
numbers with references to multiple, same-year publications, no simple solution appears to exist. A more sophisticated approach,
requiring a greater awareness of how HTML
files are constructed, would be to use the
fiiename in place of the page number For
example (Collins ~1995: aolerl.html) and
(Collins ~1995: aoler2.html) would refer to
the same bibliographic references as above.
There are additional sophistications that
could be developed with a greater awareness ofhow HTML files are coded. The <a
name> tags in HTML subdivide a single document into many smaller pieces. The
purpose of these tags is to allow a user to
click on a word or symbol at the top of the
document and be immediately moved to
where the <a name> tag is situated. A
reference to a sentence preceded by the <a
name=#modem> tag in the introduction document of the reference above could be
referenced in this manner
(Collins ~1995: #modem) or (Collins
~1995: aolerl.html#modem)
This level of detail approximates to individual page references in a conventional
book. The major problem with this method
is the complexity in ascertaining if these
codes exist within the document. Not all
HTML documents use the <a name> tag.
A solution to the lack of page numbers
for Internet-based resources might be to
establish a style sheet for printing HTML
documents. A preset top, bottom, left and
right margin coupled with a defined font
and point size presented as a publicly available style guide would allow researchers
to match the printer setup for the printers
on their WWW browsers to these parameters - always with the understanding that
the source code has not altered in any way.
The resultant pages could be referenced as
(Collins ~1995:[31]) with the square brackets acknowledging the variable nature of
the numbering. Other solutions that could
be adopted include numbering paragraphs;
counting fixed numbers of lines, say 25, as
a single page; or having authors include a
number as an integral part of the document
between each 'page'.
Another problem in using the Internet
for academic referencing is that researchers
must be aware of the location of the document they are viewing at any given time.
There is a growing tendency for WWW
browsers to have a default configuration
in which the Show Locationcommand is turned
off. While it is just a matter of clicking it
on, a degree of awareness and training is
necessary.
There are a number of requirements for
consistent referencing of HTML files. The
adherence to existing conventions is important. The use of a style guide for printed
HTML files provides a means for referencing
larger documents. Authors of HTML files
should be encouraged to include bibliographic material within their files, with a
minimum request that this material is contained within comment tags or the <HEAD>.
This information could be coded as an actual bibliographic reference for ease of use
and access. As a non-displayed comment
the additions would simply be of the form:
<!-- Greenhill, Anita & Fletcher, Gordon 1997, Humanities HUB, Faculty of
Humanities, Griffith University, http://www.gu.edu.au/gwis/hub/hub.home.html -->
A similar line could be included in the
displayed document. This would allow the
printed document to be easily re-accessed
on-line. The inclusion of a document's URL
on the printed hardcopy of an HTML file
is not something that is currently done automatically by the printing command of
WWW browsers, however, this is likely to
become an available option. Among the proposed features of HTML+, the next version
of the HTML language, is the tag, <PRINTOUT>, which, if it were supported by WWW
browsers, would provide exactly this solution.
Gopher
The referencing scheme outlined for
HTML files is equally applicable to the other Internet resources that can be accessed via a graphical WWW browser. Gopher services, as the text based predccessors to the
WWW, represent substantial investments
of time that are not readily transferred to
WWW-based HTML files. Fortunately this
is not an obstacle to acccssing the large
number of resources available through
GopherSpace. Many universities still maintain gopher servcrs that use software other
than WWW browsers. This software hides
the server, directory and filename information from theuser. It can be retrieved
but may require a level of skill beyond that
necessary for day-to-day use of the Internet.
Researchers and students are strongly
advised to use a consistent interface to the
Internet for both ease of use and regular
referencing methods.
The URL used to access gopher servers
via a WWW browser is similar to a Hyper-text URL but is prefaced by the gopher://
tag. For example, the Marx and Engels archive can be accessed through the URL:
gopher://csf.colorada.edu/11/psn/Marx
These URLs are usually less self-explanatory and longer thnn those used by the WWW
but remain as sensitive to misspelling, upper
and lower case conflicts, and misplaced punctuation. However the format used to
reference HTML files is equally applicable. Essentially the gopher URL should be
used instead of the place of publication, with
the remainder of the bibliographic entry
treated as a reference to a monograph with
as much detail being provided as possible.
The relative age of GopherSpace does,
however, present problems in accessing full
bibliographic data. Gopher sites operate at
a more institutional level than the WWW.
While WWW pages have readily identifiable individual authors within the
overarching framework of the institutional server, the gopher site and the provider
institution have a more closely integrated
relationship. The WWW could be said to
encourage page Authors while GopherSpace
harbours anonymous programmers.
A solution could be to ascribe authorship
of apparently anonymous gopher sites to
the smallest identifiable institutional unit
- often a computer science department. Thus
a reference for a gopher site may appear
as:
Library Services ~1995, Internet User
Glossary, North Carolina State University,
gopher://dewey~lib.ncsu.edu:70/7waissrc%3A/.wais/Internet-user-glossary
The utility of this author ascription is
debatable. But with the decline in gopher
services, there should be less need to provide references for these materials. The
majority of new users on the Internet prefer the more graphical WWW user interface.
This filtering effect ensures that some
dcgree of experience and skill is developed
in referencing and accessing Internet
resources before the student or researcher
finds it necessary to reference gopher sites.
Time may also reduce the need for gopher
references as the information contained on
gopher servers is transferred to WWW sites.
The example of the World Factbook also
raises questions in relation to the distortion between actual academic authorship
and the digitising and preparation of material for electronic distribution. Although
there are skills involved in both processes,
academic works require the acknowledgement of the academic author of the work.
Acknowledgment of an individualresponsible for the digitising could be included
after the title of the work. This echoes the
style for acknowledging editors and translators where the original author remains
of paramount importance. There would
appear, however, tn be little utility in acknowledging an institutional body in
this role when it is recognised as the electronic publisher in the reference and usually implied as such in the URL given for the
document.
File Transfer Protocol (FTP)
File Transfer Protacol (FTP) is used to
download software or text from a site
remote to the user. If the user is accessing
FTP through a WWW browser the text is
displayed 'raw' with little or no formatting.
These files can normally be attributed to
individual people with all the appropriate
referencing details. FTP is the earliest type
of Internet publishing and,when it was (and
occasionally still is) used, the material was
a digitised version of conventionally published material. If the material cannot be
accessed in the printed edition the URL is,
once again, recognisable, eg
ftp://nysernet.org/pub/resources/guides/bigdummy.txt
The resultant refcrence is also recognisable:Gaffin, Adam 1994, EFF's Guide to the
Internet v2.3, Electronic Frontier
Foundation, ftp://nysernet.org/pub/resources/guides/bigdummy.txt
Those who access FTP via software other than a WWW browser can easily convert
their reference to a standard URL by adding
the tag ftp:// to the front of the server, directory and filename details that are needed
to access the material.
FTP documents again reinforce the need
for a printout style sheet that specifies a
series of standard margins, fonts and point
sizes while acknowledging that no formatting changes are conducted on the
document. This would allow a square bracketed page number to give a general page
guide for in-line referencing.
Usenet News
Usenet News can be accessed in a number of different ways. Currently there seems
to be no clear preference for one particular software reader. Some graphical WWW
browsers can access news. The hierarchical nature of the news systems, and the
distributed nature ofthe material, prohibits
a conventional URL system from being used.
Nwws is physically held on each subscribed
server. This results in totally meaningless
URLs for later reference. A URL that points
to the Griffith University server is not useable by someone accessing news from
another university.
This different distribution method lends
itself towards a more periodical-orientated
style of referencing. There is usually some
form of authorship, acknowledged, although
newsgroups do not exclude the possibility
of pseudonyms being used.
News items usually have a header that
approximates a title. The newsgroup, itself,
takes the role of the journal. The full date
of the original posting is available and can
be used in the way that volume and number are used in conventional journals. This
provides a potentially useful reference. As
an example:
Graham, Adrian 1995,'Fishing in
Mauritius', alt.fishing, 29 July.
Unless research was being conducted
specifically on computer-mediated communication, the stranger postings with
unusual names and titles - such as the following - would simply not appear as
references:
AOhell 1995,'Hi all...', alt.slack, 20 Feb.
Another prohlem with references to
Usenet News is the temporary nature of
the postings. Not all newsgroup postings
are archived, and the references of today
become aether tomorrow. Although a number of the major newsgroups are archived,
finding~ them and determining specific references may require far more effort than
the final piece of information is really
worth. These considerations are really only
a major concern to a researcher wishing to
consult items in an existing bibliography.
To access this material it might be easier
to contact the author of the article or
research directly. This emphasises the
importance of students, authors and
researchers maintaining a private archive
if they have to use these transient materials.
Journals distributed by listservers
Listservers provide the closest the
Internet has to 3 hand-delivered journal.
Thc listserver of a specific journal posts a
full copy of thejournal to each subscriber's
email address every time a new issue is
completed. Fortunately, the header of an
email contains most of tbe referencing
information needed for constructing a journal-like bibliographic entry. The order and
type of information will vary from listserver
to listserver but generally the author of the
specific article will bc acknowledged as 3
proper name or sometimes as an email
address. The title and journal name is covered by the subject and originator sections
of the header. The year will always be the
same year as the year of receipt at the email
address. The date will be contained within the header - when this does not occur
the month of receipt can be used in the reference. Many listservers issue material more
than once a month, but used in combination with the author/title will provide a
unique identifier. Some listservers and
newsgroups are quite closely linked and
knowing these combinations will assist the
researcher in providing full bibliographic
material.Email
Email is the personal communication of
the Internet. Referencing to emai should
be undertaken with the same judiciousness
that is used with all personal communication. Personal communications are
acknowledged only by in-line referenccs and
not in bibliographies. In-line references simply acknowledge the interlocutor and the
date with an annotation. Email provides
similar informatinn. For example:
(Bloodaxe, Eric 1995, email, 24 July)
If the person's name is unclear, the section of the email address in front of the ~ symbol could be used. In that case
[email] would become:
(E.Bloodaxe 1995, email, 24 July)
However, some institutions and commercial service providers prefer to use a
numeric system when allocating personal
email identities. A minimal email communication conducted with someone connected
to one of these sites verges upon incomprehensibility, eg:
(S450062 1995, email 24 July)
Conclusion
The methods we have outlined rely heavily on the phenomenon of the World Wide
Web and its method of accessing Internet
resources. This provides backward compatibility with earlier resources, as well as
a standard on which to base future developments.
Currently, papers and articles published
on the Internet are not recognised as having a legitimate place in academic publishing
resumes. Drawing documents on the
Internet more closely in line with conventional material - from a referencing point
of view - may assist in bringing about a
change in policy. This more progressive position will be assisted by an increasingly
computer-literate student body desirous of
incorporating Internet material into their
work. Encouraging this, through the provision of referencing guidelines and tolerance
towards the use of these documents, will provide a richer base of material to draw
upon than is sometimes available through conventional library resources. When funding for items such as paper periodicals is reduced, the ability to access and reference
hypertext versions of journals and other
materials becomes important. This academic advantage is increased by the nature
of the medium, in that it provides an infinite number of copies in contrast to a single paper copy in a library. Refusal to recognise electronic publishing has the potential to lower the public profile of a university and impede its staff development.
References
Dow, Lesley 1995, Guide to Referencing,
Faculty of Humanities, Griffith University, Brisbane.
Li, Xia & Nancy Crane 1993, Electronic Style
- a guide to citing electronic information,
Meckler, Westport CT.
Style Manual - for authors, editors and printers 1994, 5th edition, Australian
Government Publishing Service, Canberra.
See Also
WWW Virtual Library - Online Referencing
Digitised - April 1997
URL: http://www.spaceless.com/WWWVL/refs.html