This dissertation centers around the question whether syntactic differences between languages can be detected automatically, and if so, how. With the enormous number of natural languages and... Show moreThis dissertation centers around the question whether syntactic differences between languages can be detected automatically, and if so, how. With the enormous number of natural languages and dialects, the very high level of variation they exhibit between one another, and the technically infinite number of possible sentences per language or dialect, systematic manual comparison is a hugely daunting task. The field would therefore significantly benefit from the (partial) automatization of the process, as it would increase the scale, speed, systematicity and reproducibility of research.Over the course of five chapters it is shown through case studies involving English, Dutch, German, Czech and Hungarian that correct hypotheses on syntactic differences between languages can be generated automatically from parallel corpora through the use of the minimum description length principle, counting mismatches between part-of-speech pattern occurrences, word alignment and mapping annotation from an annotated language onto another unannotated language. The tools developed for the purposes of this research work well and can aid a linguist significantly in their search for differences or similarities, but do not replace the human researcher. Show less
Simulating human language understanding on the computer is a great challenge. A way to approach it is to represent natural language meanings in logic, and to use logical provers to determine what... Show moreSimulating human language understanding on the computer is a great challenge. A way to approach it is to represent natural language meanings in logic, and to use logical provers to determine what does and does not follow from a text. What logic is best to use and how natural language meanings are best represented in it are far from trivial questions. This thesis focuses on semantic representation in deep parsing. It describes the Delilah parser and generator for Dutch, which computes semantic representations for sentences, discussing several issues and proposing some further improvements to the system. A style of logical form is developed that is optimized for inference in mainly two ways. One is the implementation of event semantics for verbs and nominalizations and with underlying states for intersective adjectives and their corresponding abstract nouns. This makes many entailments follow straightforwardly. The second is the introduction of Flat Logical Form, as an alternative to first-order logic representations. In Flat Logical Form, crucial information on quantification, monotonicity, and embedding is annotated locally on the variables of the formula, where it does not complicate the formula's structure. Both moves make the representations rich in information and at the same time easy to process for purposes of automated reasoning. Such automated reasoning with access to detailed semantic information is expected to contribute to the retrieval of free narrative text. Show less