A concordancer takes a text and identifies the frequency of individual words. This concordancer also offers the ability to produce a list of sentences that use a selected word in context. You can list the frequency of words as a list or a word cloud (in the vein of As an aspect of research I am doing into the Web, classification and meaning extraction the concordancer will also list key phrases in the document (using a Yahoo service) and eventually offer a UDC classification for the page as well as extracting the references and quotes on the page (mainly for academic texts). I am also working on allowing the system to let you save the results with a unique identifier. A somewhat understated aspect of the options here is the way you can extract a random selection of sentences that show you how a word or phrase might be used in context (the _real_ purpose of concordancers).

The potential for this system is to compare documents (with potential applications for research and the detection of plagiarism).

U Penn's online library is a good source for links to new online books in a useful format. However text files over 250k and showing words that occur more than once as a text cloud tend to overload the server - so for the moment concordancing the Bible is tricky. However, using a count table and restricting the shown words to those over 10 occurences and removing the 1000 most common words all help performance (or at least prevents things stopping cold).

Finally, the system can concordance from web pages, text files, inputted text that are either online on your local filesystem or cut and pasted in the textarea. I am also working on allowing PDFs, RTF and word documents to be concordanced but proprietary file formats are a very large pain.

