concordancer

A concordancer takes a text and identifies the frequency of individual words. This concordancer also offers the ability to produce a list of sentences that use a selected word in context. You can list the frequency of words as a list or a word cloud (in the vein of del.icio.us. As an aspect of research I am doing into the Web, classification and meaning extraction the concordancer will also list key phrases in the document (using a Yahoo service) and eventually offer a UDC classification for the page as well as extracting the references and quotes on the page (mainly for academic texts). I am also working on allowing the system to let you save the results with a unique identifier. A somewhat understated aspect of the options here is the way you can extract a random selection of sentences that show you how a word or phrase might be used in context (the _real_ purpose of concordancers).

The potential for this system is to compare documents (with potential applications for research and the detection of plagiarism).

U Penn's online library is a good source for links to new online books in a useful format. However text files over 250k and showing words that occur more than once as a text cloud tend to overload the server - so for the moment concordancing the Bible is tricky. However, using a count table and restricting the shown words to those over 10 occurences and removing the 1000 most common words all help performance (or at least prevents things stopping cold).

Finally, the system can concordance from web pages, text files, inputted text that are either online on your local filesystem or cut and pasted in the textarea. I am also working on allowing PDFs, RTF and word documents to be concordanced but proprietary file formats are a very large pain.

concordance a web page
upload a file

count table text cloud
alphabetical sort frequency sort
Check to remove the 1000 most common (English) words -
Only show words that appear more than time(s)
Make concordance link words to wikipedia dictionary nothing
Show yahoo phrases and/or calculate duplicate word tuplets
Show indicative word usage of with sentences.
Show document statistics Yes No
Show Google Keywords
Attempt to extract quotes (primarily for academic texts) -
Attempt to extract references (primarily for academic texts) -
Suggest a UDC classification for this page - (in progress)