Development of software applications in the Smart-Campus system

Loading...
Thumbnail Image

Date

2018

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

One of the urgent tasks of the modern information society is to find documents that are partially and completely similar to each other. The ability to determine the similarity between documents can improve quality of clustering documents by content, search engines by removing unnecessary information, identifying copyright infringements, filtering search and mail spam. Huge amount of data makes its direct solution (by pairwise comparing the texts of documents) practically impossible at reasonable time. Therefore, faster document comparison algorithms are required. At the same time, algorithms that successfully work for a specific statement of the problem are inapplicable or give poor results for other problems. This master's thesis explains the work of an application that searches similarities between documents. Documents can be in pdf, doc, docx, jpeg, png or jpg format and should be in Russian, English or Kazakh. If the document format is jpeg, png or jpg, then the application converts text information from image to text using OpenCV library, then performs a search from the database. Using this application, users can add documents to the application database and check documents for duplicates. For example, teachers can add student courseworks, theses, scientific articles to the database, and then check the work of other students for duplicates.

Description

Keywords

courseworks, theses, scientific articles, database, students

Citation

Collections