WEB-page structure and text monitoring with a robot of search-engine

Computer Systems and Software
Authors:
Abstract:

The algorithm for the dynamic monitoring of Web-page structure and text is developed. The algorithm is implemented as a robot of search-engine. Document structure changes are estimated as tree-edit distance. Vector model is in use to estimate the changes into the text. Semantics hierarchy to be obtained from the HTML source code is not an efficient tool for the case the structure is changed significantly.