Issue 3

Summer 2000

Electronic Medievalia


(Some months ago on the MEDTEXT-L mailing list, discussion centered around the use of computers and the humanities, particularly in medieval studies. The subject is of personal interest to me as well as being of interest to any serious scholar in the field. As a result of the discussion on MEDTEXT-L, and of private discussions with others, I decided it may be interesting to explore some aspects of how the WWW, and computers in general are affecting the way we work in the field of Medieval Studies. Medievalists were among the first in the Humanities to use computistical tools, email lists, and the web to further our work, but should we now ask how this has affected the way we think? To this end I invited a guest columnist for this forum of Heroic Age.

Dr. Jim Marchand is the founder of Medtext-L, and a man of wide learning, former professor at several universities including Harvard, UC-Berkley, and Cornell to name a few in his long career. Dr. Marchand has also written previously on the issues of computers and the internet in the humanities. So, I asked him to give us a few words from his vantage point on computers, the Web, humanities, and the effects on human thought. -L.J. Swain, Electronic Medievalia Editor.)      


Humanities and the Web

From Whence We Came

By Jim Marchand


Each advance in technique and technology brings with it the advantages it is devised to bring, but inevitably it also brings with it disadvantages, some of them not noticed at first.

Most of the activity we medievalists indulge in has to do with information storage and retrieval. At present our information is stored in the form of words, in books, in libraries. The automatization of this storage and retrieval activity over the centuries has brought with it advantages and disadvantages. The invention of the codex got us somewhat out of the bind that the scroll had put us in with its necessarily sequential access, and the invention of schemes such as the Eusebian Canons and Euthalian matter allowed us better retrieval of the Bible text. The later development of chapter/verse numbering and the rise of Bible concordances relieved the burden on the memory. Some say it brought about the decline of memory and mnemonic training, so necessary for the medieval author.

Another theme which we have seen over and over is availability. With the advent of the printing press, books became more available, libraries arose, and with libraries the various systems of cataloguing, such as Dewey, LOC, Harris. With these systems then the necessity of compartmentalizing and labeling knowledge. We were at the mercy of the cataloguer. Our retrieval of knowledge was hampered at every turn of the way by his ineptitude, which was inevitable, and by the well-known inadequacy of the various attempts to compartmentalize knowledge, such as the LOC Headings. If you were in a library which put Old Prussian under German, for example, you might never get to the Old Prussian books. If you were in a library which did not analyze series, you might never find Schwyzer's Griechische Grammatik. Or, if you used a bibliography where Danish slash o was ignored, so that Møller became Mller, etc. In addition, the rise of large libraries owned by individual scholar(s) led to the welcome vulgarization of scholarship. The selection of the books to be put in the library, the availability of books to individual scholar(s), all of these were at times quite haphazard. One was at the mercy also of the publisher, the editor, the "gate-keepers of science." Not everything which is good gets published, and not everything which is published is good.

The first use of the `computer' in textual studies, so far as I know, was the count of Gothic letters done by Martin Joos in the early '40s, using punched cards. The invention of the digital computer brought with it the possibility of concordances, word-counts, frequency counts, rudimentary statistics and the beginnings of corpus linguistics with the work of such people as Zipf and Yule. Father Busa gave us the first computer generated concordance, and there was for a time a blossoming of Key Word In Context (KWIC) concordances, with their well-known problems of useless context and failure to disambiguate homographs. The late '50s and early '60s also brought with them grand schemes such as machine translation and statistical studies. Many of these seemed aimed at turning over the scholar's job to the computer, a trend one sees over and over. Attribution by computer was and is a particularly vicious example; one still hears "The computer has demonstrated that Paul did not write ..." One must realize that the computer can tell us nothing; it is only an aid for the scholar. It is as if one said: "The card-file has revealed ..." The transformational-generative movement of the late 50s and early 60s with its insistence of the yes-no nature of each bifurcation was patently calqued on the flow-charts of the computer scientists, one might go down in a chart but not up (the irreversibility of arrow). We were already seeing the deep influence of computer-think.

The discovery of miniaturization continued, with the discovery of the transistor and its deployment in the electronic industries led to about in the late '70s the personal computer, the diskette as a storage device, finally the hard-disk: first 10 Mb, then 20 Mb, then larger and larger, add-ons, etc., a true revolution. The scholar could now get out from under the tyranny of the mainframe and its keepersand could do his own programming (in BASIC, for example), and there arose a host of cottage industry programs for humanities data processing. The development of OCR (Kurzweil), at first available only at larger institutions, offered the scholar with the ability to input large corpora and to manipulate them. Programming languages proliferated, and one has only to look at Kernighan and Plauger's still useful set of programs for humanities data manipulation to see how far we were, even in the early eighties. The font problem, which may seem to some to be insignificant, but which is of prime importance to the medievalists, was solved: with programs like FancyFont, you could make any character you wanted, and WYSIWYG was just around the corner. The advantages were enormous, and the disadvantages few, or so we thought. I was able, for example, to correct several of the impressions of Gundolf as to Herder and Goethe's preference for words, though I am not sure that raw statistics should be cited against Gundolf's superb intuitions. Meanwhile, incredible things became available. Just for example: formerly one had to go to Bryant Tuckerman, Planetary, Lunar, and Solar Positions, 2 vols. (Philadelphia: The American Philosophical Society, 1962, 1964), make all kinds of calculations based on each chart, etc. to calculate planetary positions. Now, with such programs as The Dance of the Planets, one could look up and see the night sky of Munich, December 1, 1200, with Wolfram von Eschenbach. And yet, Heinz von Foerster still warned us against taking computer metaphors too seriously and allowing them to change our ways of thinking.

Our next revolution was the invention and deployment of the internet. The scholar was now able to get in touch with everybody by e-mail, to download huge corpora by ftp or gopher, even to download and unzip tools for handling these corpora. OCR became available on the desktop, and people scanned in everything, unfortunately much of it worthless, but who decided? Lists arose, where one could talk to ones colleagues, ask questions, get information, send out information, etc. A posting on a list such as Medtextl immediately reaches more people than one could hope for with an article. This, brought with it many things often seen as disadvantages, of course, particularly the vulgarization again of scholarship. People who shouldn't be there, some of them not even academics, were now included in "our"group. This is, of course, IMHO a huge advantage for our field in actuality. Speaking of disadvantages, Gore Vidal spoke out strongly against the use of the word processor in writing, pointing out how it affected our processes of composition and does away with any study of the prenatal versions of the work of art, since Heine would presumably have erased each of his failed attempts. One could, however, with a CD-ROM reader and judicious choice, have a huge library of materials available and do real medieval scholarship sitting out under a tree. The OED, Migne, Cetedoc, a dictionary of 12 languages, a database of 150 computer journals, Gibbon, Roget's Thesaurus, Stith Thompson's Motif Index, etc. etc. ready at hand, and all under twenty pounds. A library larger than some I have worked at, and of course involving some outlay of money. But then everything has a downside.

This brings me to the WWW, the purpose of this rambling column. Miniaturization continues, and it seems quite likely that by the time you read this, small computers will be widely available with wireless contact with the internet, the ability to transform voice into writing, and writing into voice, etc. So that perhaps my ideal 20 lb. pack will be reduced to less than 5 lbs.

All of the trends which were present in previous computer revolutions are here in spades.

1. There is no standardization or even compatibility; one cannot be sure that what one sees on his screen will appear on other screens, and this is particularly vicious in the case of medieval studies. It is to be hoped that everyone will adopt Unicode in the near future; this would enable us to write in any known writing system and to be able to transmit this writing to anyone else on WWW.
2. Skill levels differ. Some colleagues will be able to program, some only to program in JAVA or PERL, some not to program at all, and this can increase frustration and limit the ability to communicate.
3. Equipment differs. Some will still be using old 3028 computers with little memory, etc.
4. Operating systems differ. Some will still be using DOS, some Windows 3.1, some Windows 2000, some UNIX, many LINUX etc. All of these lead to frustration and inability to communicate. Just this week, two of the lists I belong to had pleas registered to use only ASCII.
5. The most FAQ that one sees is of the form "Where can I find ...?" Browsers help, especially such browsers as the Golden-Globe site of Tennessee Bob Peckham, where you can find almost anything medieval (http://www.utm.edu/departments/french/french.html), but there is that old problem of terminology.
6. The problem with terminology is that of lumping and splitting, already seen by Socrates: if you assign something to a category, you are removing it from another category; if you attribute a work to x, you are taking it from y. The computer has brought us those huge repositories of concepts and terms, such as the Library of Congress Subject Headings, which have grown by leaps and bounds in the computer age, and we now have the capability to search huge databases, but there is always a downside: which computer bases and how? What to do with conflicts with native taxonomies? How to face the daunting task of learning each of the upfront engines? And, most important, how do we avoid the thought control which occurs when I make you use my categories?
7. We have enormous capabilities in such areasas data control and statistics, especially of huge corpora, but we must learn that we cannot quantify until we quantize, we must "discrete the continuum" before we can use our computers to count and help us to analyze.
8. One of the downsides one does not always notice is the loss of serendipity. Browsing through a library shelf or reading through an encyclopedia often brings discovery with it, browsing with a browser is not the same thing.
9. Huge corpora, like the Patrologia Latina, are now widely available, and with them we have a tool one could scarcely have thought of a few decades ago. Much of the laborious source-checking of yesteryear is now gone, but, again, so is the serendipity of reading Migne. You do not have to know anything much to search Migne, but there is always a cost. Notice also that we are at the mercy of the market, of whoever decides to put something on. RAM grows larger and larger, far outstripping the growth of storage, since RAM is needed for games. I can, for example, use the OED without any trouble, and it is a great blessing, but Grimm is not yet on the web. One can obtain on CD-ROM Lutz Roerich's outstanding, Das grosse Lexikon der sprichwoertlichen Redensarten, All kinds of small encyclopedias are available, but who would not prefer the Handwoerterbuch des deutschen Aberglaubens?
10. One of the things everyone complains about it the lack of readability involved with computer texts. You can download Virgil, but it would be hard to think of carrying a laptop or a palmtop out under a tree and reading him using it. Of course, there are attempts to remedy this, and one hopes they are successful, but at the moment they have not been.
11. There are great programs to do anything one wants, but the problem is to know about them, obtain them, and install them. The learning curve is often so steep that one just gives up, or hires a second party with all the attendant evils. Those thick books which used to be shipped with the software are no longer.
12. As always, there are the haves and the have-nots. Some do not have connections to a university, some are too poor to buy one of those monsters, many are ignorant of what is available, and the skill levels of users represent a real separation of the haves from the have-nots.

Enough of this jeremiad, let us look at how the WWW has changed our world of scholarship, is changing it, ought to change it.

1. Availability is the first keyword. One cannot read everything one can download, and people who could not have thought of doing academic work can do it now; the downside of availability can easily be overcome by a scanner, for which the OCR software costs more than the hardware. Availability has become and is becoming so much less of a problem now, and it will obviously increase as time goes on. The digital library is upon us.
2. As browsers get more efficient and one has fewer upfront engines one thinks of the CD-ROMs of the Hispanic Seminary of Medieval Studies, for example, we are able to search huge corpora, and speed is no longer a problem. The individual scholar can expect in the near future terabytes of storage, with ever faster GREP utilities to interrogate them. Someone is going to have to develop methodologies for dealing with these new types of knowledge.
3. Publishing and the dissemination of knowledge and research results has already changed. One used to speak of peer review, of vanity presses, of the Selbstverlag and the Samizdat, and the impediments to progress represented by the "Gatekeepers of Science." This is changing, will have to change; at present, those who hold the reins of power have not come to grips with the fact that someone will now have to read and not just count the scholarship of a colleague, and that one cannot delegate this responsibility to "peer review." There are signs that this is coming about.
4. I cannot end without again pointing to a downside: the evanescence of sites. There is nothing more troubling than having ones favorite website disappear, with all those things one might have downloaded. Many of the early lists did not have archives, and the scholarship, some of it quite valuable, deposited there is lost.

We are getting there, no longer slowly but surely. I think the most important thing we need to remember is not to let the machine dominate us, to avoid thinking like a computer, to keep HumThink alive.


