Chapter 3: Research Methodology

Research Methodology

Taxonomic Approaches

The use of taxonomies to order and interpret data from a variety of sources has not been a popular solution recently for the development of social science based interpretations. The major exceptions to this methodological aversion are been found in archaeology, material culture studies and museology. The continued use of taxonomies as a methodological device is primarily a result of the type of evidence available to researchers in these fields. A central focus of researchers of these disciplines is to seek evidence of cultural activity and traits through recovered artefacts. Methods, such as taxonomic analysis, that offer systematic treatment of artefactual material enable ready comparison of cultural activity at a variety of levels from the personal through to the social and political (Csikszentmihalyi & Halton 1994, 146).

In many specialised fields of academic enquiry - but particularly museology - a significant proportion of a researcher’s efforts go towards actually developing this systematic examination of the objects of their research. In some cases this ‘mechanical’ work is recognised for the time and skill that is required to develop such a project. These types of projects can become the reference in the field or laboratory for other researchers who share interest in similar objects of enquiry. While this respect is deserved, the taxonomy is not an end in itself in terms of a sustained and focused research project. The taxonomy is a means by which the cultural influences on a class of artefacts can be understood, compared and discussed in a common and meaningful way. In this way the classification scheme - and the taxonomy developed from it - is a tool to aid the analysis, interpretation and understanding of cultural situations. A systematic mechanism for comparison is the rationale for utilising a taxonomic approach in this thesis.

The possible configurations of a taxonomy for any given perceived class of artefacts are not a definite process. The emphasis and elements of an artefact that a taxonomy utilises to differentiate individual items and find similarities reflects the purpose and aims of the researcher. As the argument of Chapter 2 suggests, artefacts do not possess any inherent qualities that should or must be included in a classificatory schema. Some qualities of an artefact, however, will present themselves as more useful or plausible than others for a stated research project. For example, in analysing pottery from a particular region, qualities such as the thickness of the walls, the width of the neck, the height and circumference of the pot and the decorations or surface treatments on the item can all contribute to an understanding of the provenance, use and manufacture of the artefact (Gibson 2002). Other aspects of a pot may be itemised. In studies of food consumption, for example, the presence of pollen or other organic material provides telling detail regarding the usage of an item (Gibson 2002, 22). In contrast, other tools, including those of more complex manufacture and items described as high technology, have tended to be categorised by their function and their manufacturer’s stated purpose rather than by their form. An example of this is the differentiation of computers based on their operating system, whether it is Macintosh, Unix or Windows rather than any specifically physical or stylistic difference. While technical criteria are the specifications used to retail computers, day-to-day conversations regarding computers generally use descriptions such as a “Unix Machine”, “Windows Computer” or, simply, a “Macintosh”. These different focal points for defining a class of artefacts reflects the varying criteria used in their examination.

Taxonomies that consider and represent the tools and technologies of contemporary cultures may have other limitations. De Certeau (1988, xviii) in discussing the personal trajectories of everyday life claims that

statistical investigation remains virtually ignorant of these trajectories, since it is satisfied with classifying, calculating and putting into tables the “lexical” units which compose them but to which they cannot be reduced, and with doing this in reference to its own categories and taxonomies.

Historical relativity limits the utility of taxonomies across time for the examination of other technologies even if they share a similar purpose or function. A taxonomic analysis of the means of transport that focuses upon the mechanical details may be sufficiently generalised to accommodate a broad range of artefacts over an extensive historical period. In contrast a taxonomy that classifies contemporary forms of cars may need to be regularly updated as manufacturers add new features and older features are retired or fall out of favour with designers and consumers. Other restrictions on the utility of a taxonomy could involve locational and variable ‘properties’. The Universal Decimal Classification scheme, discussed in further detail in the next section, at least tacitly recognises these restrictions and attempts to systematise the range of variations with the use of coded representations for the specific properties of an item.

Taxonomies are by no means a perfect representation of groups of artefacts they purport to document. Taxonomies satisfy the need of a researcher to comprehend a complex set of interrelated and variable factors recognised between individual items within a generalised collection (Csikszentmihalyi & Halton 1994, 146). Examining the Web benefits from the generalising and systematising aspects of taxonomies by consolidating the vast array of data that can be collected from ‘virtual’ research sites and representing this data in a systematic and meaningful manner. In overcoming the volume of data available from the Web through taxonomic classification the ‘virtual’ presents its own set of peculiar problems in comprehending the many properties of its artefacts. Awareness of the initial limitations of the taxonomic approach accentuates the need to understand those properties that can be identified, disentangled and interpreted.

Utilising Taxonomies for the Web

The Web offers a range of services and capabilities that extend human capabilities. Understanding their meaning in a cultural context can be undertaken in a range of ways. This section presents the “Top 500 Search Terms” newsletter as one of these services. The “Top 500 Search Terms” newsletter is in many respects a distillation of one of the central services offered through and for the Web; search engines. In considering some of the history and structure of the newsletter and of search engines more generally the possibility of applying a taxonomic approach to this data is considered in this section.

The need for services such as the ‘Top 500 Search Terms’ newsletter and the development of the more advertising oriented search engine such as Overture.com reflects a range of issues that have impacted on the Web during its first ten years of development. The most immediate of these are the commercial imperatives that affect any business - they must eventually produce positive revenue. Many of the original Web-based search engines began as student projects or hobbies to offer navigational assistance through the rapidly expanding Web (www.search-marketing.info/search-engine-history/). As the Web itself has no inherent indexing system or references points, it became clear very early in the history of the Web that a system was necessary that delivered through the Web itself a friendly, fast and relatively simple entry point to other Web sites. Without the union catalogue of Web sites that is the search engine, the Web’s advantages over other information technologies such as Gopher could only be slight. Search engines are - for most users of the Web - still a major starting point for navigating the Web (Pew Internet 2002; Pew Internet 2002a). However, few, if any of these projects commenced with a clear business orientation or business plans. In many cases media companies speculatively bought the intellectual property of these developments and their often famous domain names. The case of Altavista is a clear example of this situation. The technology first passed from its original owners, Digital Equipment Corporation (better known as DEC) who had developed the search engine to showcase their technology to the Compaq Corporation when they purchased DEC. Compaq then sold the search engine to CMGi, an Internet advertising company, responsible for many of the ‘banner ads’ on large corporate Web sites. In a further twist that reflects the shifting and sometimes tenuous economics of ‘eCommerce’, CMGi sold Altavista to Overture.com in February 2003 (http://www.websearchworkshop.co.uk/altavista-history.htm). Subsequently, the original Web directory, Yahoo, expanded its search technology holdings by buying Overture.com and its properties.

This potted history of Altavista also mirrors more general trends in the development of information technology. The sale of DEC to Compaq, which was the first company to produce a “100% compatible” PC, confirms the rise of the personal computer over the mainframe such as those manufactured by DEC (Cringely 1996, 171). CMGi’s purchase reveals the rise to dominance of software companies and the speculative models of eCommerce. The sale to Overture.com confirms this dominance of software and the Web while also reflecting a more restrained and refined form of eCommerce.

The shifting terrain of the search engine ‘business’ is also found with another of the early search engines, The World Wide Web Worm, which eventually formed the original ‘free’ core to Overture.com/Goto.com’s database of pay per click advertisers. The ability to produce revenue from these search engines was, and still largely is, relatively limited. An increasing number of pay-for-placement search engines focus upon the revenue from the advertisers in their databases (Wakeford 2000, 33) although developers of ‘conventional’ search engines continue to experiment with other potential revenue models. The most tried of these alternative options, subscription services and user pays-per-view, have tended to be rejected by consumers. The continued suspension of the NorthernLight engine (which was funded primarily by paid searches for commercial information) is one example of this consumer resistance.

The most commonly cited twinned and related causes for the failure of the ‘user-pays’ model are the resistance that Web-based consumers have in paying for any form of information and the seeming abundance of duplicated information that can be currently found on the Web (Dvorak 2002). Abundance of information may be illusory as the free offerings of unauthorative web sites cannot be directly compared to commercial information brokers such as the online research databases including Silverplatter or academic information networks such as Athena. However it is the perception held by users of Web regarding this abundance of information and not a quantifiable comparison that currently prevents the acceptance of pay-per-view. The most successful exponents of the pay-per-view information brokering strategy tend to offer information that is unique or not readily gathered elsewhere (for example ovid.com and launch.yahoo.com). The NorthernLight search engine offered a free Web search engine service that includes, among its results, links to relevant documents that are available upon the payment of a fee that varies from a dollar to many hundreds of dollars. It is significant to note that NorthernLight has been currently offline (since August 2003) and was until recently promising a new “Business Search” for March 2004. The short message on the web site (northernlight.com) emphasises that they are ‘the tiny search engine company’ but offers no public indication why their search engine is now unavailable.

With pay-for-placement, search engines attempt to recoup their investment by directly charging the Web sites who want to be listed. The result of this development is that the majority of listings seen on the most prominent search result pages of these pay-per-click sites tend to be commercially oriented. In other words, those sites who pay for a listing are generally only those who have something tangible to sell to their potential visitors. This strategy is a potentially self-defeating as the forms of artefacts being sought by users of the Web are most readily located at the larger, more expansive, older and free search engines not at the pay-per-placement search engines . The limitations of this business approach are indicated in the collated “Top 500 Search Terms” newsletter that suggests there is a sustained and broad interest in all types of things that are free. Recognition of the difficulty associated with producing profit from search engines at a corporate level is evident in the 14th of July 2003 press release (www.corporate-ir.net/ireye/ir_site.zhtml?ticker=OVER&script=410&layout=0&item_id=430830) announcing Yahoo.com acquisition of Overture.com making it a wholly owned subsidiary. It is such a complex commercial environment that in the press release great effort is made not to describe each company’s major asset as ‘simply’ a search engine.

Classificatory Schema and Universal Decimal Classification

The Universal Decimal Classification scheme shapes the taxonomic approach of this work. This general system for classification orders the interpretation of the data obtained from the “Top 500 Search Terms” newsletters. While many works exist that have utilised taxonomies of some form, few have required the services of a scheme with such an expansive scope and the capability to incorporate individual items from a variety of provenances and of many forms. The only schemes that are currently in existence that are tested and have proven longevity are associated with the management of libraries. The three major schemes are described as the Library of Congress Classification, Dewey Decimal Classification and Universal Decimal Classification. Universal Decimal Classification is the least popular of these three schemes and its use is primarily confined to the UK and Europe. Each scheme covers the broadest expanse of human experience and understanding but of these it is the Universal Decimal Classification scheme that has most regularly been applied to Web based problems of categorisation (for example sosig.ac.uk and udcc.org/research.htm). The rationale for this particular scheme varies between the Web sites that make use of it, however, the inherent features of UDC including its capacity to recognise items of different forms, its regularised approach to classification and its ‘universal’ scope offer advantages over the older Dewey Decimal scheme or less universal perspective of the Library of Congress scheme. Both these latter approaches are also more obviously targeted towards the problems of conventional library management.

My rationale for the use of what is primarily a ‘library’ based and consequently generalised approach also precludes the use of other classificatory schemes. Other schemas are crafted for more specific solutions. For example, Standard Industries Classifications (www.softshare.com/tables/sic/), commonly used in national census taking, is appropriate for the taxonomies of labour division but offers little utility beyond this task. This narrow usage is further restricted by similar schemes used at a national level such as the Australia and New Zealand or United States codes. The “Topic Maps Published Subjects” (www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj) list offers more possibilities as a general list and its Web-based mission for an authorative web of information commends it in many ways to this work. However, its purpose is generally centred on the presentation of information and reflects an encyclopedic treatment and rationale for Web sites and the Web itself. While this clearly commends its use for the Web it does preclude other types of artefacts that are not so readily classified within the simplistic perspective of being a “Web site”.

The UDC uses a decimal system to divide human experience and understanding into ten separate classes of which nine are currently utilised. The main classes are represented by a single digit and describe the broadest levels of the scheme. Each class is then divided into a further ten sub-classes. These divisions are also represented with a single digit. For example, within the ‘Generalia’ class represented by a 0 can be found ‘Librarianship’, which is represented as 2. These digits are combined to unambiguously assign the UDC code of 02 to Librarianship. A further decimal division is then possible to create a three-digit code. The code for “Church Music”, for example is 783. In the UDC a three digit class number can be directly read as a sub-division of the class 78 that represents ‘Music’ and the main class 7, which incorporates ‘The Arts, Entertainment and Sport’. Having multiple sub-divisions means that UDC allocations are not read as a conventional number. There is a significant difference between 007 and 7 and a somewhat more subtle difference between 7 and 700 – the latter number is for this reason not commonly used in the current classificatory schema.

In addition to these main classes of classification the UDC offers a series of auxiliary tables. These tables provide additional information and clarification to the major classification. Auxiliaries are one of the strengths of the UDC in its capacity to accurately classify an item in ways that are not solely defined by their position within human knowledge and experience. It is possible to ascribe a range of second level classifications; a language, a region or location, a time, or a material. The auxiliaries also enable the introduction of non-standard notations to reflect everyday usage and understandings. Properties and form auxiliaries potentially enable the use of this system for the classification of material culture artefacts in, at least, a general sense. The stated purpose of the UDC (British Standards Institution 1999, 5) does not include the intention to provide a scheme for museologists confronted with large collections of similar objects, particularly as specialist work generally exists in such fields, but this option does exist. The properties auxiliary covers areas such as “properties of existence”, “properties of structure”, “properties of shape”, “properties of arrangement” and others which enables an artefact to be positioned irrespective of its purpose. Another auxiliary of significance for material culture oriented studies enables the categorisation of “persons and personal characteristics”. The granularity of this auxiliary recognises the human influence upon artefacts and their relationship to cultural activity directly within the scheme. With the flexibility found in these wide ranging perspectives the UDC is provisioned with a broad scope and persistence.

The other clear advantage of UDC in relation to the Web and to the systematic analysis of large datasets is that, while its initial design predates modern computer systems, it is compatible with contemporary and digital data manipulation tools. Using ‘regular’ punctuation symbols and a numeric code that embeds the structure of the scheme’s logic lends itself to use in the programmatic analysis of data and the rapid parsing of electronically based data into a systematic coded form. Nuanced use of auxiliaries meanwhile preserves some context of the collected data.

The full power and scale of UDC has only been partly realised in this thesis. The UDC’s extensibility enables a systematic classification of Web sites in a manner that extends beyond a single brutal level of classification - the undifferentiated Web site. Similarly, the UDC enables a preservation of some context for the data utilised in this thesis that may otherwise have been lost in subsequent interpretations and analysis.

Data gathered from the “Top 500 Search Terms” newsletter is, at one level, ‘simply’ a systematic and columnar list of ranked words and the raw counts from which their ranking is derived. The relatively straightforward format of these lists enables the ready but somewhat rudimentary classification of keywords into major and minor UDC classes. The first two sets of data from the uncensored lists were used to test the initial process of classification before it was applied to the 16 months of data from the 14th September 2001. These first two weeks of unedited data were discarded from eventual analysis in order to utilise the subsequently standardised format received in the form of a surge list and offers a consistent performer list of keywords. The initial examination of data also offered 500 initial keywords allowed the initial classification to become a base for consistent classification of data by ensuring that repeated keywords appear in subsequent lists were treated in the same manner.

The data from the keyword lists received between March and December 2000 is not considered directly in the analysis or interpretation chapters of this thesis as it was received as edited lists with the ‘adult’ terms removed by the list’s developers. As later data reveals, ‘adult’ terms and concepts on these lists are a significant presence and the application of what is an ‘adult’ term, and what is not, can often be a personal and arbitrary delineation. During the classification process it became clear how readily assumptions regarding the intended meaning of a term can influence later interpretations. It is also clear from these early edited lists that while specific ‘adult’ terms are removed from these lists the tallying of specific ‘adult’ entertainment Web sites are still included. The earlier lists are, however, considered less formally in Chapter 5 in the development of the discussion regarding taxonomies and these research sites.

The data that was initially classified was systematically addressed. Specific Web site addresses were classified together so that ‘www.name.com’, ‘name.com’ and, often, ‘name’ are considered as a single artefact. Similarly, where the name of a Web site was not among these possible address variations these entries were also classified together. Such a subtle distinction appears to be most commonly associated with pornographic Web sites. This grouping of classificatory entries does not occur when the name, by itself, is generic nor has a broader interpretation. As there is no way of determining the context of the search for a specific word it is inappropriate to automatically assume that the intention of the search was to specifically locate the Web site that utilises that term. Probably the most common examples of the situations where the address and the isolated term were not assumed to reflect the same intention are in words such as, ‘mp3’ and ‘netscape’. In most cases the classification of these keywords remains sufficiently close to that of the Web site of the same name to reflect the likely association that exists.

The classification of the received data also groups together terms with upper and lower case variations of the same keywords and ‘slight’ misspellings of the main term where the intention of the misspelling is unambiguous. The most immediate and first example of this type of variation in the examined data were the range of variations for Osama bin Laden, e.g. ‘bin Laden’, ‘Osama Bin Ladin’, ‘bin ladin’, ‘Usama Bin Laden’ and Nostradamus, e.g. ‘Nostradomus’, ‘Nostrodamus’, ‘nostradomas’, ‘nostradomus’ and ‘Nostradamas’. Terms such as ‘Osama Bin Laden Photo’ and ‘Nostradomus Prophecies’ are classified separately but have a close proximity to the main entry reflecting an appropriate distinction in the intention of the original search while making a clear and close association.

As the data examined involved over 26000 individual entries software was created by the author of this thesis to systematically examine each term and assign it a consistent classification if one had already been allocated or to prompt for a new classification where one did not exist. When this occurred the programme presented a selection of existing classifications that were programmatically considered to be close to the term under consideration. Programmatic support for maintaining classificatory consistency included phonetic comparison using the Double Metaphone algorithm. This comparative analysis was achieved with a publicly available programming class called DoubleMetaphone (Phillips 2000). The Double Metaphone algorithm utilises a series of standardised “rules” which reduces any English word into a four-letter representation. The intention of this examination is that ‘like’ sounding words should have the same encoding. Similar terms that meet the criteria for being added to the same classificatory space as an existing term could then be identified in this algorithmic manner. After these newly encountered terms had been classified from the main classes of the UDC the options of additional auxiliary table information was also added to the keyword’s entry. It is stressed that the software developed supported the task of classification and did not in any way automatically ascribe or assume classificatory assignments to the data. The application provides a level of data management that enabled the initial parsing of the data to be consistent and compatible with the aims of this research. The software produced a modified view of data that ranked the search terms by the UDC’s main class counts. The software also calculated the cumulative raw count of total searches performed for each list from each week. Cumulative counts allow for the creation and use of percentiles for any given class or set of classes to have comparable meaning across a time series. The revised lists and the additional information attached to each keyword provided enough consistent information to create the graphical representations of the data in Section 5.2 and to develop the argument of this work made through Chapters 5 and 6.

The use of the auxiliary tables of the UDC were extended in this parsing of the data in order to provide a clearer representation of the artefacts desired and sought by a particular search. The auxiliaries were used to indicate the ‘property of the desire’ and the ‘property of the term’. Not all the terms in the lists could be ascribed with these identifications as a result of the term’s own ambiguity or its generality.

These search terms only provide the ‘noun’ for understanding a broad range of artefacts. What is absent from most of the most popular search terms is any form of ‘verb’ that in combination would define a clearer, more focussed, set of desired or sought after artefacts and assist in explaining what is intended to be ‘done’ with these particular artefacts. Such a limitation is a clear self-critique of the classificatory approach employed by this research and, particularly in light of its anthropological orientation; the provenance of these desired artefacts is unclear. The additional details that provenance provides would allow more specific interpretations to be made regarding the desires expressed in individual search terms. Chapter 7 considers how the nuances of more precise data regarding provenance would allow further research to expand and clarifying the arguments of my current thesis.

In contrast to the lack of provenance imparted by the collected data, the classificatory schema enables treatment of large-scale data across a period of time in a systematic manner. The use of the UDC scheme including its auxiliary tables enables individual classes to be extracted and compared in a way that indicates relationships that have not been arbitrarily imposed on the data by the researcher but reflect cultural associations that the UDC itself captures in its classificatory scheme. One use of examining these relationships is explored in Section 5.3 when, what are described as the cultural contexts of the cultural complex are examined. This approach proves to be particularly useful for recognising how the changeability in the desire for specific celebrities underlies a consistent interest in these categories of ‘artefacts’.

Sociolinguistic Issues

While this work is not primarily focussed on a socio-linguistic examination of the “virtual”, a series of broadly socio-linguistic issues do impact on the methodology of this work and the data being considered. In turn these issues and the forms of interpretation that are derived from this data are also affected. The heavy predominance of what appear to be “adult” concepts within the lists of the “Top 500 Search Terms” newsletters and the consistently performing keywords most clearly evidences this cause for concern. One of the key hierarchical “head words” for the classification of the adult material that is found in the data is the term ‘sex’. It is regularly in the top ten of consistently requested keywords and in the raw data received from the newsletter it often appears in first place. However, this position cannot be dismissed solely as final evidence for the claim that the Web is the most popular contemporary medium for distributing pornography. Mehta & Plaza (1997, 65) make the unsupported suggestion that “pornographic images on the Internet tend to represent the dominant interests of network users”. While an indeterminate percentage of requests must reflect this interest, the variety of other more specific and less ambiguous terms coupled with the name of specific pornography Web sites that appear in the list of consistently performing terms also suggests that those seeking pornography do so with a clear goal in mind. The issue of context in the case of this term and others is not directly resolved by the use of a classificatory scheme such as Universal Decimal Classification. Porter (1997, xi) makes the claim with respect to the Internet that “there are no vocal intonations, no signatures, no gestures or embraces. There are words, but they often seem words stripped of context, words desperately burdened by the lack of the other familiar markers of identity in the strange, ethereal realm.” There is no means for determining whether the term, ‘sex’, reflects an interest in human or non-human biology, is a request for educational material, or is to be read as a noun or a verb. The term ‘sex’ by itself reveals many of the limitations of current search engine technology. Having such a degree of fluidity associated with a single term also reflects some of the limitations of this research approach and the danger of ‘reading’ interpretations derived from it in isolation from other forms of cultural analysis and interpretation.

Other less common terms scattered through the lists also present potentially similar levels of obscurity. The difficulty is not solely the plurality of meanings associated with a single term but also the use of the same terms in different industries and environments or the cultural differences associated with the use of the same term. Examples of these include the terms ‘cloning’, ‘free’ and ‘quotes’. While the issue of cloning is a consistent performer in mass media reports, due, in part, to the varied range of emotive issues that it offers, it is also the name for an aspect of object oriented programming methodologies that are among the last to be grasped by students of software engineering. The predominance of technical information available to students on the Web suggests that it is possible for these two separate meanings to compete for importance within the collection of search terms being examined by this work. Ultimately without supporting evidence there is no clear means for discerning the artefact being sought (or indeed those found by the people who originally generated the search).

The predominance of the term, ‘free’, within the lists also presents a classificatory difficulty. Considered by its own it does not provide a context for classification as the term modifies an unassociated and unstated noun. The term is not artificially isolated from another term in the list of keywords as a series of terms also appear in the list with a noun attached. The search for “free” things includes ‘free sex’, ‘free porn’, ‘free clip art’, ‘free games’, ‘free download music’, ‘free fonts’, ‘free greeting cards’ and many more. The term ‘free’ in isolation can only be classified with difficulty and then only within an auxiliary table of the Universal Decimal Classification scheme as a generic modifier.

‘Quotes’, also reveals the ambiguity of many commonly searched for terms found in the data. Stock market quotes and famous quotes are both popular services offered at many Web sites. The popularity for both types of ‘quotes’ is revealed by the presence of many more specific keywords within the lists that are linked to this ambiguous usage. For examples, the search terms, ‘Famous quotes’, ‘stock quotes’ and ‘love quotes’ all figure in the gathered lists. In this case, because the possible intention of the term represents only two major possibilities the more generic term is split between the two classificatory positions of “stock quotes” and “famous quotes”. The ability to do this reveals another advantage of the UDC schema. By enabling the researcher to join or associate two classes together symbolically it allows the scheme to represent more complex (or ambiguous) classes of artefacts. Conceptual linkages are shown with the appropriate class numbers joined together by a colon (:) or a plus (+). For example “PI Day” is classified as 083:398.3 to indication its connection to a mathematical principle as well as a form of celebration or festivity.

A final initial curiosity is the fortunately rare situation where the “fame” of a domain name such as Altavista or Yahoo is used to capture people who misspell the name. In most cases these imitators are short-lived, remain dormant or are legally pursued by the owner of the more “famous” domain. Hijacking misspelling of famous names situation is common and can be readily tested by typing these names into a Web browser and substituting some or all of the vowels. Sites such as Yehaa.com, Yehoo.com, Atlavista.com and others have all attempted to capture an audience in this way. Only one site regularly appears in the lists of search terms that blurs the intention of the original search; the security and hacking site ‘Astalavista.com’. While the possibility for mistyping ‘Altavista’ appears to be remote, the popularity of the Arnold Schwarznegger character in the film Terminator has meant that Altavista is often mispronounced as “Astalavista” in the mistaken belief that this is the name of the major search engine. It is also possible that a search for ‘Astalavista’ is actually an attempt to track down the source of this “famous quote”. The level of ambiguity associated with a single terms links this situation to other curiousities of this data. The popularity of the Astalavista site also means that it is a site often referred by other people looking for the type of resources that the site provides. Astalavista presents itself as a “security” site offering the software tools to defeat “black hat” hackers. Its brief is broad and extends to offering links to software hacks and cracks that specifically defeat many of the restrictions placed in commercial software or shareware. In short, while the term ‘Astalavista’ is classified as a Web site related to security it encompasses a much broader range of topics than a single designation.

The gathered data shows that the meaning of individual terms is more dependent upon a broad set of intersecting cultural conditions and events than a specific Web-based provenance. Awareness of this situation is significant because while this thesis is oriented around a discussion of popular Web search terms it is not “about” the Web. It is about how contemporary culture finds expression through search terms.

These anomalies and complexities in the classification do not overwhelm or obscure the data to a degree that would make systematic classification nonsensical. In many cases the classification process reveals the presence of a cultural and logical relationship between individual keywords that goes some way towards accounting for the possibility of multiple interpretations. It is a curious but unsurprising parallel that one of the most common complaints leveled at all search engines is the lack of relevance the presented results have to the query originally presented to it (Baeza 1998; Grewal 1999). Yet, given the ambiguity of these most popular terms and by implication the returned search results, it is surprising that in these circumstances search engines can provide any relevant results. What the person composing the search wants and what they type are not the same thing. Some search engines now recognise that a single simplistic interaction in which the search query is typed in and the ‘right’ results are then displayed is not appropriate or possible. One solution is to provide a focussing option like the “More results like this…” button that enables the search engine to respond to the initial request in a more focussed and, for the person searching, appropriate way. Metacrawler, and (formerly) Northern Light, both utilised grouping strategies to offer a greater degree of relevance. By examining the results returned by a search an internal logic is revealed through the other terms also associated with each result. Using groups (or folders) hopefully offers a parallel combination of concepts and desires as those intended by the person conducting the search.

The data from each of the “Top 500 Search Terms” newsletters presents these socio-linguistic difficulties because of the process being applied to the data. The classification process is, in effect, taking the individual elements of data and placing them in a more specific context than when they are received as lists of keywords. A person navigating the hierarchy of a classification tree - such as one based on the UDC - to locate Web sites that focus on a particular topic would have less concerns regarding relevancy but would sacrifice speed and a potentially short mouse trail of clicks to get to appropriate Web sites. A similar analogy can be found in the one between the all-encompassing scope of a search engine and the smaller directed navigation paths of directories such as Yahoo or the Open Directory Project.

‘Virtual’ Ethnographies/Ethnographies conducted in the ‘Virtual’

Ethnographies necessarily and inevitably present a bounded impression of cultural life. Selecting boundaries for the observation of cultural life, however, that are too narrow may describe a ‘sub-culture’ and stand as a metonym for broader cultural situations. Similarly too expansive a boundary will produce descriptions of the most homogenised level of cultural constructions. While neither of these extreme perspectives are ‘better’ or closer to an imagined ‘absolute truth’, each, in isolation, represents a ‘whole’ culture in deeply contrasting ways. Miller & Slater (2000, 21) in arguing for a traditional ethnographic approach in their own research express concern about the increasingly broad definition of this research method to the extent that “the idea of an Internet ethnography has come to mean almost entirely the study of online ‘community’ and relationships - the ethnography of cyberspace.” In this section I consider the received corpus of knowledge found in anthropology and other disciplines regarding the application of ethnographic methods. The application of these earlier methods is currently being tacitly, rather than critically, discarded in the name of conducting ‘virtual ethnography’ (cf. Miller and Slater 2000). The concern is that virtual ethnography is being applied and used indiscriminately as a result of a relatively brief history of the electronically mediated ‘virtual’ and the immediate convenience and communication the ‘virtual’ provides to those people most likely, and most capable, to critically review its significance - such as academics and IT professionals. It is these groups, in a very specific - possibly subcultural - way, that have largely represented the ‘us’ of cyberculture. Analysing cultural formations that are so ‘close’ to that of the researcher is a notoriously - in anthropology, at least - fraught task (Mengham 2001). However, while this factor is clearly a significant influence on any critical discussions of cyberculture, the more expansive definition for ‘us’ argued for by this thesis incorporates a cultural milieu that is closely intertwined with the current experiences of globalisation and pervasive commercially motivated technological determinism. The ‘virtual’ is consequently part of a ‘broader’, more vaguely defined ‘mainstream’ culture and consequently it is only loosely articulated, or identified with, by its participants.

Much of the available literature concerning cyberculture and the ‘virtual’ follows a ‘boundary-first’ approach (Vasseleu 1997, 47; cf. Turkle 1996, 21; Green 1997, 59; Mitra 2000, 679). This defines the parameters for examining cyberculture within a claimed boundary between ‘virtual’ and physical actuality. In other words, much of the research that has been conducted into the ‘virtual’ almost arbitrarily delineates the field of study as being ‘all that is virtual’. The ‘virtual ethnography’ then sets out to describe a series of situations and interrelationships that, in the most extreme cases, treat the ‘virtual’ world as a distinct but single culture in itself (Hine 2000, 21). While these works generally acknowledge the ‘virtual’ as a significant location for contemporary cultural formations they do not identify or perceive different ‘parts’ of cyberspace, nor do they consider the possibility that some ‘virtual’ artefacts are not de facto integral parts of cyberculture.

There is a clear historical analogy here to the increasing subtlety with which anthropology has discussed indigenous Australians (e.g. Langton 2003). Rough coverall descriptions have given way to more sensitive and acute readings that, as a consequence, provide for more insightful understanding of specific cultural arrangements. These more sophisticated readings have developed with the increased contributions of the ‘researched’ in a role that is antithetical to the researcher. In a subtle reversal, less refined readings of cyberspace mirror an historically specific moment in the earliest development of cyberculture - an environment in which even the most experienced researcher could only be an inexperienced ‘researched’ - a ‘newbie’ (Wakeford 2000, 33). Cyberspace and the ‘virtual, however, - as a wholly humanly defined construction - are increasingly articulated as a consequence of this research as well as a result of its increasingly broad popularity and use. Contrariwise, the experience of the ’virtual’ cannot be reduced, or theorised, as simply ‘just another’ aspect of everyday actuality. The presence of cyberspace as a viable location for human interaction, communication and transaction is distinct from, but related to, previous forms of spatial arrangements and introduces a different combination of experiences, information and influences to ‘our’ everyday lives. In this light, I do not offer any ‘new’ methodological approach to the perennial problem of interpreting culture (or cyberculture). Perhaps more practically, this thesis - by eschewing the automatic use of ethnography - presents a greater level of subtlety and particularism in the interpretation of cultural phenomena observed in ‘virtual’ locations.

Understanding cyberculture as the culture of ‘us’ presents a wide-ranging and extensive landscape of phenomena that cannot be considered in toto - in detail. The scale of the Internet and the amount of human activity conducted on it reduces the capacity of ethnographic research in cyberculture to specific practices, events or sub-cultural group that can be distinctly identified. Arguably, no methodology with an ethnographic inspiration has ever had the capacity, despite claims to the contrary, to present an holistic perspective of cultural phenomenon in anything except abbreviated terms.

Ethnographic methodologies are de facto seen as an identifying hallmark of anthropological research. For this reason alone, it is not surprising that anthropological research regarding cyberculture and the ‘virtual’ have employed this approach in a variety of interesting and innovative ways (Hine 2000, 21). Anthropology, as the study of culture, is well suited to take up the study of cyberculture (cf. Hakken 1999). Its founding anthro-centrism, the sensitivities to locality and spatiality as well as its capacity for particularism make an intellectual tradition that is accustomed to interpreting the ‘strangeness’ found in the ‘virtual’ situations of cyberculture. However, to successfully undertake these tasks it is necessary to avoid the reification of specific technologies or to describe a technology as concomitant with a cultural phenomenon. Celebratory presentations are a tendency found in the earliest texts of cyberculture (Rushkoff 1994; Dyson et al 1994; Rheingold 1995; Turkle 1996; Barlow 1996) and one that is perpetuated through current mainstream journalism. Cyberculture is not, however, confined to computer-mediated - but socially constructed - cyberspaces. Cyberculture is multi-located and multi-sited in an ethnographic sense (Marcus 1995). Wakeford (2000, 39) takes this observation further by claiming that, “we constantly construct the Web as we conduct our research, rather than researching something that is already ‘out there’.” Cyberculture, as a whole, can only be described in the broadest most vaguely inclusive terms in the same ‘meaningful’ manner that ‘Western culture’ or ‘modern culture’ might be used - in this sense ‘cyberculture’ might be better described as a mainstream culture. A ‘virtual’ ethnography that arbitrarily delineates its field of study at the border between virtual and ‘real’ may only be telling half of a complex story.

A relatively straightforward example of the limitations of considering only the virtual aspects of a cultural phenomenon is found with the development of e-commerce. An increasingly common trend among ‘virtual stores’ that are managed from the United States is their refusal to ship outside the 48 continental US states. The development of these policies are unrelated to the increasing popularity of ‘virtual malls’ and online shopping but to more ‘physical’ fulfillment problems such as the cost of shipping, credit card fraud and the inability to meet demand. An examination of only the virtual aspects of e-commerce - an admittedly increasingly unlikely approach - would be confronted with the incongruous juxtaposition of the popularity of online shopping with restrictive shipping policies. E-commerce also must deal with the ethereal nature of money (Cohen 1997, 233; Zelizer 1997, 11; Williams 1998). An artefact that is already simultaneously ‘virtual’ and real, it does not clearly manifest itself but must be represented through artefacts that stand for it, for example; coins, credit cards and e-gold (.com). Other situations take the complex provenance of ‘money’ even further. The auctioning of characters and ‘tools’ for use in various online role playing games (but particularly EverQuest) that is conducted through the ebay.com auction house emphasises the ‘reality’ of the virtual (Reynolds 2003). Participants choose to spend money on goods and even ‘experiences’ that are only useful or ‘meaningful’ in the context of a single online gaming ‘world’. Similarly, the rudimentary discussion of ‘hits’ to a web page often raises more questions than can possibly be answered from within an artificially narrowed ‘virtual’ perspective. Analysing these statistics reveals little of the contexts, pathways or rationale that generated any particular hit to a web page. A concern addressed in part by the focus of this thesis on search engine terms that are responsible for many Web site ‘hits’. There is no guarantee that a hit can be equated with a human viewing of a page merely that the hit was the consequence of some remote, but possibly deferred, human activity - such as a ‘web robot’ (see Section 5.1). ‘Hits’ in this way span ‘virtual’ and physical actuality in ways that associate both virtual and physical action.

Any investigation of cultural phenomena must avoid the temptation to reify the technology of electronically mediated locations to the detriment of exploring the intricacies and vagaries of particular cultural activity that these technologies imply and represent. Few traditional ethnographies were, or are, satisfied to conclude with the discovery of artefacts of human activity. The ‘classic’, and rapidly aging, works of cyberculture such as Rheingold’s (1995) Virtual Community and Benedikt’s (1993) Cyberspace: First Steps, despite being criticised for their evangelical positions, all point to the historical continuity between the contemporary cultural formations including cyberculture and those cultural formations that have both preceded and informed them. Historical continuity is an aspect of cultural research and must be considered alongside spatial continuities. The meanings of cultural activity do not conveniently exist within the arbitrary boundaries of ‘different’ locations (Marcus 1995).

Tangible evidence for the extent and form of cyberculture pervades our daily lives through various artefacts and interactions. Broadcast media has assumed the “look and feel” of the “interfaces” to cyberspaces. The Australian television programmes Burke’s Backyard and Totally Wild or the BBC’s children’s channels, for example, have both utilised navigation bars to provide a ‘menu’ to each episode yet both still utilise a linear broadcast media. News and current affairs shows gain feedback about their stories through web-based discussion sites. Bands and entertainers get audience response about their recent shows through fan sites and, sometimes, modify their subsequent performances. Many independent bands utilise the MP3 or Ogg Vorbis format to distribute music to fans. Major web search engines advertise on the sides of buses. Grocery items including packaged foodstuffs, and even fast foods, carry web addresses on their packaging. Many Australian and international institutions of higher education have, have had or are pursuing “virtual” and flexible teaching and learning options. The rapid uptake of digital and ‘next generation’ cellular phones provide us with an ever-present ‘nearly imminent’ form of cyberspace that intercedes in our daily lives on buses and trains and in lifts and movie theatres. These developments all serve to increasingly blur the already arbitrary boundaries of physical and ‘virtual’ experience. ‘Our’ culture and mainstream culture is cyberculture irrespective of our personal connection - or its lack - to the Internet or other cyberspaces.

The technologies that enable the presence of ‘virtual’ locations and the cultural practices that these spaces engender, in turn, exert influence on cultural activities conducted ‘outside’ cyberspace. Most significantly, it is these influences that blur the boundaries of cyberculture from other presumably distinct cultures. Ethnographies traditionally looked towards a complex of arrangements, practices and artefacts in order to identify distinct cultures. Language, cultural practices and geographic proximity all help to define a cultural boundary. Despite these demarcations, debate still occurs regarding the classificatory accuracy of any ethnography (Davies 1999, 14-5 & 156). The boundaries of cyberculture are equally ‘fuzzy’ in their capacity to be readily defined and examined.

A number of factors can be identified as contributory to both the definition and fuzziness of cyberculture. Globalisation and the colonising impact of computer-based technologies are among the most significant. In this environment, traditional markers of cultural difference such as language boundaries are less meaningful as a form of written (or more accurately ‘typed’) ‘net’ pidgin based on English, various keyboard symbols and other languages finds its way into emails, SMS texts, message boards and chat rooms as well as the text of web sites. Increasingly aspects of this ‘net’ pidgin encroach upon other forms of communication that are ostensibly unrelated to any specific cyberspace but are embedded in the wider experiences of cyberculture, examples of software manuals and goods bought via the internet or elsewhere all reveal that while the English alphabet is a commonly used set of symbols the meanings ascribed to particular combinations of symbols can be highly transitory. Email written by university students to their lecturers often reveal a degree of comfort with “SMS English” as their primary written dialect - including concluding messages with “love xxx”.

The capacity to communicate generalised meanings rather than the ability to articulate a specific formal language has become the criteria for participation within cyberculture. A low literacy skill participation requirement is perhaps one advantage of a primarily textual web-as-cyberspace. Written text can more readily be deciphered by speakers of other languages through online translation tools such as babelfish(.altavista.com) which can approximate a level of mutual comprehension. As with language, while geographic separation has not been effaced by globalisation its significance as a basis for defining cultural difference has been altered. The increasing contraction of physical distance – for example, through cheap international flights and increased emigration - has contributed to the homogenisation of cultural practices and is further reinforced by the way in which computer technology restricts the acceptable range of cultural practice to a known subset of possibilities.

The use of particular software technologies, such as Microsoft Word to compose documents or Internet Explorer to surf the Web has an impact on the forms of understanding and cultural practices that can be enacted. A Web page that will not render in Internet Explorer cannot be seen by a majority of Web users.

The methods for navigating through a non-linear text such as a Web page is a culturally specific skill yet it is a skill that is assumed to be possessed - to some degree - by all users of the Web. A document composed with Word can only be written and presented in a finite number of ways - without the most laborious editing - because it is primarily designed for correspondence compatible with North American business sensibilities. The definition and articulation of cyberspace does however increasingly implicate software as an aspect of hegemonic power and mainstream culture. While we are not all ‘geeks’, and do not necessarily spend inordinate amounts of time online - effectively living online, ‘we’ still feel the impact of cyberspace. The blurring of contemporary mainstream culture and cyberculture makes their distinction increasingly less distinct with the transformation of household material culture into ‘wired’ objects. LG, for example, offers a fridge onto the consumer market that incorporates a flat screen computer in its componentry (TechDigest 2003). It is claimed that the fridge will contact the service department if it is in need of repair. However, this expensive (over $US 6000) device is a minimal implementation of earlier claims that similar devices would monitor its ‘input’ and ‘output’ and automatically order groceries from a local online supermarket once the contents of the fridge are depleted. Cyberspace, or at least the means with which to interface with it, is to be found in the most unexpected places and, in the case of the wired-fridge at least, gives a more tangible anthro-centric meaning to the logging of ‘hits’ on a personal fridge-based web server.

Irrespective of these developments cyberspace is also defined as being analogous to, and an extension of, broadcast and mass media (cf. Holmes 1997a; Hardey 2002). However, this definition is a restrictive potentially misdirected one. The phenomenon of cyberspace is not solely a new communications strategy but rather a complex of cultural practices that occupy multiple virtual and physical locations. Ethnographic perspectives of the locations are still largely focussed upon the use of a particular ‘site’ (Hine 2000; Greenhill et al 2002) (and often a MUD or a MOO) and the observed interactions between the participants. The focus of web-based ethnographies raises questions regarding the boundaries of ethnography, of the culture being examined, and whether these two delimitations should be parallel. These are boundaries imposed and defined by the researcher, in reference to a particular cultural landscape, which demarcates the flow of people, artefacts and communication between sites.

Examining cultural phenomena in this locationally bound manner also raises the possibility, for this thesis, that its research concerns the sub-cultural practices of a much broader, less definite, culture. As a consequence, the discussions and conclusions concerning a sub-culture of mainstream and ‘cyber’-culture may be better understood in a metonymic relationship to this broader phenomena. The specific sub-cultural differences that are observed and discerned are spectacular anomalies against the wider backdrop of a ‘whole’ culture. Reducing the extremes of practices observed in cyberculture to the realm of the sub-cultural superficially appears to efface traditional distinctions and reduce their significance to the status of museum curiosities. The homogenising effect of globalisation reduces difference within cyberculture as readily as it does in economics, capitalism and politics. It is however difference within the practices of cyberculture and other locations of cultural practice that are indicators of resistance and persistence. For cultural groups who have embraced cyberspace as a means of survival they must chart a path between the impetus to political and cultural action in order to maintain a separate identity against the pervasive influence of an increasingly mainstream cyberculture that promotes an almost robotic uniformity in the form of standards that extend beyond ‘simply’ technological definitions (w3.org/TR/#Recommendations). In this context, it is also necessary to consider what defines cultural survival in the age of cyber- and ‘virtual’ culture. Persistence of cultural identity represents a series of political decisions regarding the form of association with mainstream culture, the technologies of cyberspace and one’s own culture.

Described simplistically the work of the ethnographer in ‘virtual’ locations can sound like a commendably focussed and politically astute piece of research. However this would have us understand the research location as a cultural formation ‘in itself’ or more problematically as an entire culture. What is omitted in this form of research is not everything external to a particular culture but of the ‘rest’ of the culture itself irrespective of whether the culture in question is one resisting the mainstream or a mainstream culture itself. An oversight of this type potentially obscures the rationale for the research itself. To draw upon a media analogy analysis of Web ‘culture’ is of the same order of abstraction as attempting to wholly or accurately describe amorphous “Australian” culture through an examination of Neighbours episodes or interpreting “English” culture as a mirror of Coronation Street. While the television programme is a seminal representation of this culture it provides a condensed, abbreviated and privileged vision of the culture that inspires it. This problematises what is meant by ‘ethnography’ as analyses of Neighbours have been completed under the rubric of media and cultural studies (Wober & Fazal 1994; McKee 2001). Similarly, studying the ‘site’ of the ‘online’ household fridge may give renewed impetus to the garbage studies of the 1960s and 1970s.

These conceptual and competing concerns do not necessarily place urgency on the manner in which ethnographies are conducted. The wide range of research activities that could be completed under the title of ethnography was already vast before the 1980s and the advent of the ‘virtual’ as a research location. More important, and particularly with the arrival of the ‘virtual’ ethnography, is the need to re-examine the intended object of examination desired by this methodology. The closely bound associations between the discipline of anthropology and ethnographic methodology as well as the interpretation of culture can obscure specific intention with an indefinite and ill-defined impetus to merely catalogue the visible traces of cultures and cultural practices. Examining the communicative transactions between the participants of a “chat space” or a MOO would appear to exhibit the worst excesses of the modernist imperative to obtain information for information’s sake. Knowing everything that occurred over a finite period in a specific location may not reveal much about the culture or the communities that are ostensibly being examined.

An analogy can be found here with the anthropological examinations of the earliest forms of written culture. While occasional pieces provide broad illumination of the cultural practices of the period much preserved written history records land tenure and other legal contracts (Michalowski 1996; Fleming 1998; Diringer 1962). These documents concern only aspects of the culture they describe and broader cultural meanings are available only through informed inference. Similarly, in the case of the ‘chat space’ or the MOO what is actually being inferred is contemporaneously present beyond the virtual ‘pale’ - in the physical actuality of ‘real life’. ‘Logging’ the use of an online coffee machine (cl.cam.ac.uk/coffee/coffee.html) is certainly possible but in an ethnographic sense understanding the ways in which the consumption of coffee and the ways that the associated office or university culture may have altered would probably provide a more bountiful ethnographic work. A work, however, with this focus would still remain a study of an aspect of mainstream and ‘virtual’ culture.

The common inspiration for ethnographies appears to be remarkably consistent: to ‘know’ a culture or community. Much ‘virtual’ ethnography perhaps as a result of the early booster literature (Dyson et al 1994; Rheingold 1994; Rushkoff 1994; Barlow 1996) not only starts with this inspiration but also with the assumption that a ‘consensual community’ is both pre-existent and present in the spaces that they are examining (Rutter 2001; Jarvenpaa et al 2001; Oxendine 2003). For some works, there is also an assumption that these communities of association are independent constructions that have in effect developed spontaneously in cyberspace. This may actually be possible with groups such as ICQ users or Peer to Peer (P2P) users (and particularly the BitTorrent system). The plausibility of ‘spontaneous’ associations such as this are unlikely when most online groups define themselves in terms of a relationship that was initiated and is defined beyond the ‘virtual’, for example Metallica fans, Jungian Psychologists or Welsh Nationalists. There is also a need to disentangle the use of particular software as the basis for the conventional classification of common cultural practice such as is implied by the discussion of an ‘IPod Community’ or an MSN group. This is not a new tendency in theoried examinations. The working of Bronze and Iron has been used as a meta-description of historical periods of shared common technological traits but separate cultural identities (Bahm 1992, 71 & 230; see Gibson 2002 109-116). The individual cultures are delineated as distinctive because of the different methods they employed for working the metal, the different artefacts that they crafted from the metal and the manner in which these artefacts were then utilised. This analogy to software is problematic as it is not a ‘raw’ material or alloy from which artefacts are crafted. Software is itself an artefact (or even a complex of artefacts) that represents a condensation of human labour as well as the meanings laid onto it by its designers and its users.

The problem of defining the limits of a culture is not, however, inherent in the locations of ‘cyberspace’ but a consequence of studying ‘ourselves’. ‘Ourselves’ and ‘us’ within cyberculture or mainstream culture are among the most fraught of definitional research categories. They encompass the rhetoric and impact of ‘globalisation’ as well as the difficulties of defining any group of people for the purpose of research. The classification of what does or does not constitute a community in contemporary culture is also itself a rich source of debate. These are not new problems brought by the presence of the ‘virtual’. Rather, the presence of cyberspace within contemporary academic discourses accelerates ‘our’ own awareness of the potential meaninglessness of these terms and the definitions that underpin them.

It is both a curiosity and reassuring that cyberculture and the ‘virtual’ is discussed within an anthropological context. Other disciplines also stake claims to the analysis of cyberspace and its associated cultural conditions (specifically cyberspace in its manifestation as the World Wide Web). Brown-Syed (1999) outlines an information systems approach...

The overall aim of the current research is to identify header and text elements [of a web page] which will assist researchers in determining quickly the relevance, authenticity, and value of digital artifacts, as well as information about the credentials of their creators.

Beyond the veil of cyberspace this research would probably be received with a degree of caution. It is a phenomena of cyberspace and cyberculture research that researchers hear alarm bells when their artefacts acquire the status of being digital.

Brown-Syed (1999) continues…

We contend that the absence of editorial control, coupled with the alarming tendency toward deliberate skewing of Web search results by profiteers, will stress network bandwidth resources, contribute to a crisis of confidence in the Internet, and if unchecked, may potentially have far reaching consequences for the growth of scientific and scholarly knowledge, and for its dissemination to the public.

A related example that similarly responds to Brown-Syed’s concerns that should be familiar to anyone involved in tertiary education has been the exaggerated fears related to the Internet and plagiarism. Increasingly, however, it appears that the power of Web search engines makes it easier to locate plagiarised work but not necessarily any easier for students to plagiarise. Often the task of verifying a blatant case is simple - a case of typing a select phrase from an essay into a search field and perusing the resultant matches. In some cases tangible evidence against the perpetrator is only two or three clicks away and can be located in exactly the same manner that the student originally located ‘their’ work.

The example of “web-based” plagiarism stresses the absence of ‘serious’ efforts at contextualisation in many works that examine cyberculture and cyberspace. Context is not bound by the arbitrary delineation of a location, including the ‘virtual’, yet this is treated as a bounding parameter for much research - including the research used in this thesis. In spite of these methodological determinations, ‘real-time’ chat sessions such as ICQ include obligatory introductions that generally attempt to geographically locate the participants - not necessarily to ensure a commonality or to reaffirm communality but to reassure each participant of the relatively safe physical and conceptual distances from one another. The distance provides the confidence to rapidly alternate between being a ‘Norwegian Cindy Crawford’ or ‘short and solid’ as one person variously insisted to me in a single ICQ conversation, although admittedly these descriptions may have been intended to imply the same physical features. The postmodern penchant for playfulness is clearly an aspect of contemporary mainstream culture enacted in the ‘virtual’.

The development of a contextual understanding (what might even be argued as an ethnographic understanding) can too readily be deferred to the construction of a specifically located description of a particular place. In my own personal experience of managing a chat space a specific location of the ‘ethnographic’ would appear to be plagued with difficulty, if not completely impossible. Many of the participants do not confine themselves to any particular ‘topic-based’ chat. They literally wander between the ‘theology’, ‘anthropology’ and ‘architecture’ topics. As a consequence of one request, the generically labeled ‘music’ space has also become part of this repertoire. The regulars in each of these spaces often conduct their conversations in an almost random manner so that a linear description of any one location does not encapsulate the entirety of the conversations that any of the regular participants have had - as they may have added comments and discussion elsewhere at the same time.

However, there is still a need to ‘own’ a space. One message appeared in the music chat space...

um, looks like everyone has a page except me... eva has her art page, roch has his theology, firpo has his philosophy and d, mira, and calaf have their music page but.…(music) What About Me.… sniff..

by: g (transitory posting on www.spaceless.com/chat/music.html)

This comment and my fleeting but long term exposure to this group of people increasingly suggests that these particular spaces are better understood as a form of cultural intersection - a meeting point at various points in individual’s everyday lives. However, the current form that the discussion takes in these chat spaces does not deny the possibility for the construction of some form of community or at the very least a sense of affiliation. There have been individual efforts by participants to solidify a community around this particular space. The most successful effort was dubbed ‘the blob’ by some regular participants of the architecture chat space. One participant packaged up some of his work (some sketches and plans) and mailed the material to another participant. The next recipient of ‘the blob’ added material and then posted it onwards to yet another regular. The intention was to add new people and as ‘the blob’ completed a complete circuit the original ‘blobee’ would remove their first round of material and add something new. Unfortunately to the best of everyone’s knowledge ‘the blob’ mysteriously disappeared somewhere on its way from Jamaica to Australia.

Although these events are clearly setbacks this collection of spaces appears to have been successful in bringing together a collection of people who share interests which extend beyond their own capacity to use a computer and the Web site has helped to maintain those interests. The success of the Web site appears to be at least partly related to the relatively non-commercial form of the site - the participants aren’t attempting to stand in the middle of the electronic equivalent of a ‘theme park’ or ‘shopping mall’ - places where these forms of conversation are both rare and subsidiary to the primary purpose of the space. Yet ‘stickiness’ - the e-commerce term for a web site that keeps users - is the very quality desired by the online malls and portals such as Yahoo, Lycos and Netcenter. Just as in other techno-social spaces you are in a wholly built environment which is constituted for a narrowly defined purpose. Unlike the electronic shopping mall however, the more conventional environment of the mall is prepared to eventually concede a ‘human’ need to exit the mall albeit for potentially other techno-social spaces - including the car (Miller 2001; Dant & Martin 2001).

Elsewhere on the web somewhat more concerted efforts are being undertaken to build virtual communities without a commercial impetus. One of the theology chat space regulars pointed me to the bishopric of Partenia (www.partenia.org) whose Web site describes the circumstance for the creation of this electronic community...

In January 1995 Jacques Gaillot abruptly received his resignation [his notice] from his office at Evreux. In a rather surrealist way, this eviction was transformed in an appointment at an ancient and fictitious see, Partenia in Algeria. This made him a kind of virtual Bishop of which his potential parishioners were spread all over the planet... A year later, he decided to take the institution at its word, he opened a web site to dialogue with every body in the world. It was immediately successful: thousands of Internet users from all over France, Canada, Australia and dozens of countries, laymen or clerics, Christians or non Christians, for or against, conversed on many various subjects.

Partenia has clearly the potential to develop some form of communality. The means to undertake this may be electronically mediated but the inspiration for the community is founded upon much more traditional forms of association that are familiar to anthropologists and others.

Watching the ebb and flow of the chat space I administer increasingly convinces me that virtual ethnographies should not represent a reification of provenance. Emphasising the reification of place reflects a particular technical moment in the development of mainstream culture and in the ‘virtual’ becoming part of ‘our’ culture. One of the people, Peter Taylor (Taylor, Lopez & Quadrelli 1996), who had been closely involved with the development of flexible learning at Griffith University often claimed that ‘flexible’ would soon not be in the title of his position as this would become the way all teaching and learning will be conducted. I make a similar claim for virtual ethnographies and cyberculture. Those who conduct ethnographies that include and attempt to comprehend virtual spaces are conducting the anthropology of ‘us’ and facing all the problems that self-reflexive methodologies introduce. We seek the ‘other’ of ‘virtual’ culture and find a mirror.

Synthesis - ‘Virtual’ Taxonomies

In this work I have drawn together the various advantages that a taxonomic oriented and interpretative methodology offers. The necessary cautions that must be applied to the use of ethnographic methods in the context of the available data and in attempting to reflect the theoretical orientation of this work along with the focus implied by the primary research question all shape the approach utilised here. The awareness and perspective for which ethnographic approaches are academically respected do, however, underlie this work in its focus upon examining and, to an extent, revealing contemporary cultural conditions based on available evidence. The aim of the research and the methodology employed is not an attempt to create a definitive classification of the Web in its entirety or even a small piece of it but rather to provide a systematic means to interpret and understand the exchange practices conducted on the Web, the ‘types’ of artefacts exchanged and their relationship to contemporary cultural practices more generally.

The examination of virtual artefacts is not a case of classification for classification’s sake. It is an attempt to understand the contemporary phenomena of the “virtual” in a more sophisticated and subtle manner than simply reducing everything observed to ‘just’ being a collection of Web sites. An analogous situation to such a coarse examination would be an effort to understand the impact of printing technologies on contemporary cultural practices by considering all artefacts produced by applying ink to a flat surface as books. To make this claim would impede any serious examination and requires ignoring the relationship of newspapers, pamphlets, posters and even tickets to cultural practice. A survey of Melanesian pottery groups would similarly offer little insight if every item was simply described as a “pot” (Lauer 1974). Ultimately, more subtle efforts at taxonomic classification offer insight into a particular culture when used in conjunction with works that utilise other research methods including ethnography. The variety of ethnographically oriented work available relating to the Web and “virtual” cultures provides sufficient range and depth to complement the current work (Bukatman 1995; Adams 1997; Fisher 1997; Seidler 1998; Hakken 1999; Dean 1999; Mitra 1999; Miller & Slater 2000; Galusky 2001; Michaelson & Pohl 2001; Crang 2002; Watt et al. 2002; Parrish n.d.). Most of these existent works similarly avoid the dangers of reducing a Web site to being “just” a Web site by examining a single site as a nexus of interrelated cultural practices.

The process used to examine the data gathered for this research is determined - in part - by the nature of the data itself. However, there is a general series of stages that was systematically applied.

  1. Initial data collection

The advantage of studying the “virtual” is the relatifve ease that comprehensive collections of longitudinal data can be gathered. In the case of the “Top 500 Search Terms” the data was received as a weekly newsletter. My access and relation to the source of the research data and the relative ease that the data could be obtained solved the ‘problem’ of data gathering.

With sufficient access and a suitably automated method for gathering the data itself the study of “virtual” sites is particularly suitable for longitudinal studies. There are minimal budgetary constraints and the main danger at this stage of the research is that too much data can be gathered than cannot be realistically deciphered in a timely manner. Any concern for being overcome by the volume of generated data is significantly outweighed by the possibility for studies of “virtual” phenomena to become the site for piloting and testing larger scale or wider ranging studies including those intended for corporeal environments (Tomas 1991, 31; Tyler 2002, 204). The potential offered by the ‘social laboratory’ is only possible however with the premise that the “virtual” is part of, and not distinct from, contemporary cultural practices. I pursue this idea as a central claim made throughout this thesis.

  1. Selection of a classificatory schema

The broad scope of the data gathered from the “Top 500 Search Terms” newsletter necessitated a classificatory schema broad enough to encompass the full scope of human experience and knowledge. Few existing schemes provide this scope and within the context of material culture studies and archaeological examinations of artefacts none exist. The Universal Decimal Classification scheme was selected as it provides the scope and flexibility to cover the range of artefacts desired and sought after through Web-based search engines.

A scheme designed for library management tasks has its limitations and is by no means a definitive solution to the classification of cultural practices. Examinations of more specific aspects of cultural practices conducted in the “virtual” may be capable of utilising other schemes. However, it is also possible, in the tradition of archaeologists and ethnographers in the field, that the development of more subtle, nuanced and appropriate schemes could be developed for this purpose.

Other advantages of Universal Decimal Classification are its consistency, its recognition as a British Standard and its control by an international body, the UDC Consortium. While the scheme is used less worldwide in a library context than the Dewey or Library of Congress schemes it does have institutional adherents who are familiar with the scheme and its underlying concepts.

  1. Classification of the data items

Each data item was examined individually through the interface of the classification software to determine a primary classification and, where appropriate additional auxiliary detail. In a small number of cases the use of more than one auxiliary was required. Where a classification had already been made for a particular term it was reapplied. In situations where no classification had been ascribed, the term was first examined for its similarities with existing classifications. The term was also examined phonetically for possible matches. If no matches could be found programmatically in this manner a broader survey of already classified terms was made to ensure consistency. In circumstances where no similar term could be found that had already been classified I examined the term in its own right and assigned classification data.

The initial list utilised for classification was derived from the Universal Decimal Classification Pocket Edition (1999). The 1999 edition of the classification scheme is considered to be a small edition and primarily intended for students of classification and those unfamiliar with the UDC. However, this edition proved sufficient for the research task as, unlike library classification, the aim of the classification process was to group the desire for ‘like’ artefacts together and not to separate individual items in order to produce a unique shelf order.

These classifications and the phonetic representation of the term were then stored with the term itself. The outcome of this process, while time-consuming, expanded and more clearly systematised the information associated with each data item.

My own knowledge and focus was a heavy influence on the process of classification. The classification decisions made at this stage influenced subsequent interpretations. Classification itself is not an obvious or flawless process. It requires knowledge of the items being classified and the context in which they will be used. In library contexts, classification tends to aim for the ease of locating items and simplicity. Overly detailed classifications produce less flexibility and can be confusing to their intended audience. In museological contexts the classification of items is complemented by accessioning numbers and management schemes that are disentangled from the specific details or knowledge of a particular item. In the context of a less definite desire and seeking of artefacts it is important to avoid making the classification of a specific item stand in proxy for that item or to make it the focus of the purpose for examining the item in the first place. The purpose for the classification of the data in this manner is to understand the types of artefacts being sought and exchanged and to understand the areas of human experience and knowledge that are sought through Web-based search engines.

  1. Collation/Collection of the classified material

With the classification of the data it is possible to produce an overview of individual weeks of data collection or of general trends within specific cultural activities across the Web`. The collection and collation process of the research was also handled programmatically by software I developed for this thesis.

The data from the “Top 500 Search Terms” newsletter included the total raw count of searches reported each week, the terms included in each week, a summary of the classificatory allocation across the nine major divisions of the UDC and the range in the raw count between the highest and lowest terms included in each list. The range and cross-referencing of data provides a summaary of the whole sample for each week of data collection.

  1. Textual/Graphical interpretation and representation of the taxa

The final stage of this process is the graphical and textual representations of data and its interpretation through the classification scheme in a coherent and comprehensible manner. The software developed for this research incorporated the capacity to represent weekly data graphically in the form of a bar graph. The resultant weekly datasets that are produced can be visually compared while also revealing trends and changes that occurred during the period of research. Any form of graphical representation offers a higher level of legibility over the collected raw data itself (Tufte 1997). The use of graphs also offers a visually clear means to represent the entirety of the UDC scheme without requiring intimate knowledge of its design rationale or the individual components of its classes. The systematic structure of the UDC scheme also offers a meaningful way of ‘zooming’ into parts of the represented data while retaining the same level of graphical representation and comprehension. A dynamic version of this representation, which is beyond the scope of the software currently developed for this research, could in fact provide a means for visually and unambiguously navigating the Web.

These graphical representations complement the textual representation of the data that is found in, and occupies the attention of, the latter chapters of this thesis. An advantage of the ‘virtual’ provenance of the accumulated data is the relative ease with which it can be directly returned to in order to highlight specific points and anomalies. Immediate access and a consistent form of reporting offers a level of flexibility and provides a direct means for verifying the veracity of claims made in the interpretation of the data.

The textual interpretation of the data is the rationale for the research process and consequently takes a particular focus out of a myriad of possibilities. Its attention and purpose are neither neutral nor apolitical and it services the agenda posited by the initial research question. In my research that agenda includes the promotion of attention to the artefact and a tacit questioning of the contemporary claims made regarding the Web and specific aspects of it such as eCommerce. Critical engagement with the research data is done within a systematic and structured framework that services the original research question. As the research question is the outcome of decisions made by the researcher the political agenda of the research is complementary to these stated aims and does not impede their presentation. The conclusions that are ultimately reached through my research are the culmination and intersection of all these influences.

today I have not failed. I've just found 10,000 ways that won't work. — Edison