About the exploration of data mining techniques using structured features for information extraction

Jungermann, Felix

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Morik, Katharina	-
dc.contributor.author	Jungermann, Felix	-
dc.date.accessioned	2012-06-26T13:31:03Z	-
dc.date.available	2012-06-26T13:31:03Z	-
dc.date.issued	2012-06-26	-
dc.identifier.uri	http://hdl.handle.net/2003/29487	-
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-4813	-
dc.description.abstract	The World Wide Web is a huge source of information. The amount of information being available in the World Wide Web becomes bigger and bigger every day. It is impossible to handle this amount of information by hand. Special techniques have to be used to deliver smaller excerpts of information which become manageable. Unfortunately, these techniques like search engines, for instance, just deliver a certain view of the informations original appearance. The delivered information is present in various types of les like websites, text documents, video clips, audio files and the like. The extraction of relevant and interesting pieces of information out of these files is very complex and time-consuming. Special techniques which allow for an automatic extraction of interesting informational units are analyzed in this work. Such techniques are based on Machine Learning methods. In contrast to traditional Machine Learning tasks the processing of text documents in this context needs certain techniques. The structure of natural language contained in text document poses constraints which should be respected by the Machine Learning method. These constraints and the specially tuned methods respecting them are another important aspect in this work. After defining all needed formalisms of Machine Learning which are used in this work, I present multiple approaches of Machine Learning applicable to the fields of Information Extraction. I describe the historical development from first approaches of Information Extraction over Named Entity Recognition to the point of Relation Extraction. The possibilities of using linguistic resources for the creation of feature sets for Information Extraction purposes are presented. I show how Relation Extraction is formally defined, and I additionally show what kind of methods are used for Relation Extraction in Machine Learning. I focus on Relation Extraction techniques which benefit on the one hand from minimum optimization and on the other hand from efficient data structure. Most of the experiments and implementations described in this work were done using the open source framework for Data Mining RapidMiner. To apply this framework on Information Extraction tasks I developed an extension called Information Extraction Plugin which is exhaustively described. Finally, I present applications which explicitly benefit from the collaboration of Data Mining and Information Extraction.	en
dc.language.iso	en	de
dc.subject	data mining	en
dc.subject	information extraction	en
dc.subject	machine learning	en
dc.subject	named entity recognition	en
dc.subject	rapid miner	en
dc.subject	relation extraction	en
dc.subject	structured features	en
dc.subject	tree kernel	en
dc.subject.ddc	004	-
dc.title	About the exploration of data mining techniques using structured features for information extraction	en
dc.type	Text	de
dc.contributor.referee	Jannach, Dietmar	-
dc.date.accepted	2012-06-05	-
dc.type.publicationtype	doctoralThesis	de
dcterms.accessRights	open access	-
Appears in Collections:	LS 08 Künstliche Intelligenz

Files in This Item:

File	Description	Size	Format
Dissertation.pdf	DNB	4.58 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

This item is protected by original copyright rightsstatements.org