Enterprise Information Integration

Enterprise Information Integration

On Discovering Links Using Genetic Programming

  • Auteur: Cimmino, Andrea; Corchuelo, Rafael
  • Éditeur: Dykinson
  • ISBN: 9788413247748
  • eISBN Pdf: 9788413247748
  • Lieu de publication:  Madrid , Spain
  • Année de publication: 2020
  • Pages: 114

Both established and emergent business rely heavily on data, chiefly those that wish to become game changers. The current biggest source of data is the Web, where there is a large amount of sparse data. The Web, where there is a large amount of sparse data. To realise this vision, it is required that the resources in different data sources that refer to the same real-world entities must be linked which is the key factor for such a unified view. Link discovery is a trending task that aims at finding link rules that specify whether these links must be established or not. Currently there are many proposals in the literature to produce these links, especially based on meta-heuristics. Unfortunately creating proposals based on meta-heuristics is not a trivial task, which has led to a lack of comparison between some well-established proposals. On the other hand, it has been proved that these link rules fall short in cases in which resources that refer to different real-world entities are very similar or vice versa. In this dissertation, we introduce several proposals to address the previous lacks in the literature. On the one hand we, introduce Eva4LD, , which is a generic framework to build generic programming proposals for link discovery; which are a kind of meta-heuristics proposals. Furthermore, our framework allows to implement many proposals in the literature and compare their results fairly. On the other hand, we introduce Teide, which applies effectively the link rules increasing significantly their precision without dropping their recall significantly. Unfortunately, Teide does not learn link rules, and applying all the provided link rules is computationally expensive. Due to this reason we introduce Sorbas, which learns what we call contextual link rules.

  • Cover
  • Title page
  • Copyright page
  • Contents
  • Acknowledgements
  • Abstract
  • Resumen
  • 1 Introduction
    • 1.1 Research context
    • 1.2 Related work
      • 1.2.1 Link discovery in relational databases
      • 1.2.2 Link discovery in the Web of Data
      • 1.2.3 Ontology matching methodologies
      • 1.2.4 Genetic programming based proposals
      • 1.2.5 Discussion
    • 1.3 Research rationale
    • 1.4 Summary of contributions
    • 1.5 Collaborations
    • 1.6 Structure of this dissertation
  • 2 Eva4LD: A Genetic Framework
    • 2.1 Introduction
    • 2.2 Preliminaries
    • 2.3 Template
      • 2.3.1 Variation point: CREATE
      • 2.3.2 Variation point: SELECT
      • 2.3.3 Variation point: CROSSOVER
      • 2.3.4 Variation point: MUTATE
      • 2.3.5 Variation point: REPLACE
      • 2.3.6 Variation point: STOP
      • 2.3.7 Variation point: EVALUATE
    • 2.4 Implementations
      • 2.4.1 CREATE implementations
      • 2.4.2 SELECT implementations
      • 2.4.3 CROSSOVER implementations
      • 2.4.4 MUTATE implementations
      • 2.4.5 REPLACE implementations
      • 2.4.6 STOP implementation
      • 2.4.7 EVALUATE implementations
    • 2.5 Experimental analysis
      • 2.5.1 Experimental environment
      • 2.5.2 Experimental results
      • 2.5.3 Statistical analysis
    • 2.6 Summary
  • 3 Teide: Bootstrapping Link Rules
    • 3.1 Introduction
    • 3.2 Bootstrapping process
      • 3.2.1 Filtering links
      • 3.2.2 Computing neighbours similarity
      • 3.2.3 Selecting links
    • 3.3 Experimental analysis
      • 3.3.1 Experimental environment
      • 3.3.2 Experimental results
      • 3.3.3 Statistical analysis
    • 3.4 Conclusions
  • 4 Sorbas: Learning Context-Aware Link Rules
    • 4.1 Introduction
    • 4.2 Learning process
      • 4.2.1 Computing correspondences
      • 4.2.2 Computing similarity
      • 4.2.3 Illustration
    • 4.3 Experimental analysis
      • 4.3.1 Experimental environment
      • 4.3.2 Experimental results
      • 4.3.3 Statistical analysis
    • 4.4 Conclusions
  • 5 Conclusions
  • A: Experimental Environment
    • A.1 Computing facility
    • A.2 Linking scenarios
    • A.3 Genetic programming setups
  • B: Running Examples
    • B.1 Researchers
    • B.2 Researchers with context
  • Bibliography

SUBSCRIBE TO OUR NEWSLETTER

By subscribing, you accept our Privacy Policy