Detecting model of the author’s copy among cluster of similar web-documents

Information and Signal Processing

In article the problem of revealing of an author's copy among cluster web-documents is considered. The method of definition of the original web-document on the basis of calculation of authorship, completeness of subjects of all web-resource, the quotability principle and other additional estimations of value of a web-resource is offered. As a result of work the general formula with the calculated factors is received, allowing to define the original in cluster doubles.