EPSRC logo

Details of Grant 

EPSRC Reference: EP/E035698/1
Title: Accurate and Efficient Parsing of Biomedical Text
Principal Investigator: Clark, Dr S
Other Investigators:
Researcher Co-Investigators:
Project Partners:
Department: Computer Science
Organisation: University of Oxford
Scheme: First Grant Scheme
Starts: 08 October 2007 Ends: 07 June 2010 Value (£): 211,031
EPSRC Research Topic Classifications:
Artificial Intelligence Bioinformatics
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:  
Summary on Grant Application Form
Natural Language Processing is a branch of Artificial Intelligence concerned with using computers to automatically process and understand natural languages. Natural language refers to languages such as English, French, German, etc., rather than artificial computer programming languages. There are a number of reasons why this will be an important technology in the 21st century. First, computers are gaining increasing importance in our society, and being able to communicate with them in a natural way, using spoken and written language, will become more desirable. Second, we are producing very large amounts of online electronic information; we require tools which can automatically process this information, to summarise it, to answer questions about it, to translate it, to find relevant documents within it. The staggering rise of Google demonstrates the importance of this kind of technology.The proposed research concerns the processing of a particular kind of text, namely the scientific articles produced by the biological research community. Biology produces an enormous number of new articles each year, far too many for any one individual to keep up to date with. Automatic computer tools are required which can process this information. For example, a biologist might want to know whether there is a paper on the Web answering a particular question about some gene.Sophisticated text processing, such as translating a document from one language to another, summarising documents, or answering questions, requires sophisticated language processing tools. A very useful tool for these kinds of tasks is a parser , which automatically determines the grammatical structure of a sentence and how the words in the sentence are related. For example, it would determine the verbs in the sentence, and how the nouns are related to the verbs. This information is needed if a computer is to be able to understand the text.The Natural Language Processing community now has very good parsing technology. However, the existing parsers are good at analysing certain kinds of text, such as newspapers, but not so good at other kinds of text, such as biology research papers. The reason is that the parsers have learned about language from linguistic resources created by humans, and the resources are based on newspaper text. Creating these resources from scratch for biology would take too long, and so the proposed research will investigate ways in which parsers tuned for newpaper text can be ported to handle biological text.
Key Findings
A parser is a computer program for determining automatically the grammatical structure of a sentence, which can then be used to represent the meaning of the sentence in a way which is amenable to further analysis by the computer. We found that an existing parser, which performs well for newspaper text, can be adapted to work well for text found in biomedical research papers. Automatic analysis of such papers is vital to help researchers in the biological sciences deal with the deluge of information available. We also found that there are some grammatical constructions which all state-of-the-art parsers perform badly on - constructions which are vital for representing meaning - and we argue that automatic evaluation metrics which measure how well such parsers are performing should take into account these constructions.

Potential use in non-academic contexts
See above.
Impacts
No information has been submitted for this grant.
Sectors submitted by the Researcher
Information & Communication Technologies
Project URL:  
Further Information:  
Organisation Website: http://www.ox.ac.uk