Since we submitted our first pre-proposal for the Perseus Project in September 1985, we have received generous support from many sources. These include major support from the Annenberg/CPB Projects (which invested $2.5 million with which the project began planning and developing collections on classical Greece in 1987) and the Digital Library Initiative Phase 2 (which provided $2.8 million in 1998 and allowed us to explore the issues of digital libraries for the humanities in general). The National Endowment for the Humanities, the National Science Foundation, the Institute for Museum and Library Services, the Fund for the Improvement for Postsecondary Education, the Department of Education, the Mellon Foundation, and the National Endowment for the Arts have all provided generous support. We also gratefully thank private individuals who have supported our research over the years. For an overview of the research that we conducted, see here.
For an overview of the research that we are currently pursuing, see here. The following lists currently active grants that support that research according the order in which they were funded.
-
The Dynamic Lexicon: The National Endowment for the Humanities: (2008: $284,999). This project involves creating new reference works for Greek and Latin from a large collection of texts and structured knowledge sources (such as treebanks) within the cyberinfrastructure of a digital library. Built on the technologies of parallel text analysis (including word sense induction and disambiguation) and automatic syntactic parsing, these reference works will allow us to present the possible senses for any Greek or Latin word while also providing syntactic information and statistical data about its use in any collection of texts or any subset of that collection - not simply, for example, how oratio is used in all of Latin literature, but only within the works of Cicero (where it means "oration" or more generally the power of oratory) or the works of Jerome (where it means "prayer"), including quantified measures of its syntactic usage. These methods will also let users search a text not only by word form, but also by word sense, syntactic subcategorization and selectional preference.
-
We are building a workflow that leads from page image to actionable data. Humanists need access to the earliest phases of processing - we need to be able to define the page layouts of editions and commentaries and to recognize languages such as classical Greek for which general-purpose optical character recognition (OCR) engines provide little support. An application programming interface (API) that provides access to the searching or other services does little good if the crucial data has already been lost.
This project will result in three basic deliverables. First, we will produce a testbed of image books with editions, commentaries and translations of the major classical authors, often in multiple editions, that survive from antiquity. We will make this testbed available as a part of the Open Content Alliance (OCA), where it will be freely available. Second, we will provide documentation and evaluate methods for each stage of the workflow. Third, we will provide the code and data sets that we produce under a creative commons license. Data sets in this case will include the textual data that we have been able to extract, with automatically added markup. This markup will include automatically suggested corrections as well as original OCR output (allowing for flexible searching).
-
The Ancient Greek Treebank Project: Alpheios Project: (2008: $865,290). This project will enable us to create a treebank - a large collection of syntactically parsed sentences - for ca. one million words of Ancient Greek texts. Treebanks are fundamental datasets that provide not only reading support for students of Classical texts (for example, noting the subject of the sentence and which adjectives modify which nouns), but also provide the basic quantitative data on which to build larger linguistic and general philological arguments (see our call for *research opportunities*). The majority of the texts will consist of Homer, the tragedians and Plato, with selections from several other Classical authors as well. This work complements our ongoing work on creating a Latin treebank.