TEXT BASED DOCUMENT SIMILARITY MEASURE

dc.contributor.authorShnibekov Zh.
dc.date.accessioned2023-03-31T09:19:34Z
dc.date.available2023-03-31T09:19:34Z
dc.date.issued2013
dc.description.abstractDo you have a shortage of data? Not very likely. A consequence of the pervasive use of computers is that most data originate in digital form. If we trade a stock or write a book or buy a product online, these events evolve electronically. Since so many paper transactions are now in paperless digital form, lots of “big” data are available for further analysis. The concept of data mining, finding valuable patterns in data, is an obvious response to the collection and storage of large volumes of data. Data mining is no longer an emerging technology awaiting further development. Although its application is far from universal, the techniques of data mining are highly developed and for some forms of analysis are entering a mature phase. We would like to say “Give us data and we will findthepatterns.”Unfortunately, data-mining methods expect a highly structured format for data, necessitating extensive data preparation. Either we have to transform the original data, or the data are supplied in a highly structured format.Data-mining methods learn from samples of past experience. If we speak to specialists in predictive data mining, their data will be in numerical form. These people are the “numbers guys.” The “text miners” do not expect an orderly series of numbers. They are happy to look at collections of documents, where the contents are readable and their meaning is obvious. This is our first distinction between data and text mining: numbers versus text. That doesn’t mean that these are two distinct concepts. Both are based on samples of past examples. The composition of the examples is very different, yet many of the learning methods are similar. That’s because the text will be processed and transformed into a numerical representation.
dc.identifier.citationTEXT BASED DOCUMENT SIMILARITY MEASURE, Shnibekov Zhasulan, 2013
dc.identifier.urihttps://repository.sdu.edu.kz/handle/123456789/269
dc.publisherSuleyman Demirel University
dc.subjectText document
dc.subjectdata analysis methods
dc.titleTEXT BASED DOCUMENT SIMILARITY MEASURE
dc.typeArticle
dspace.entity.type

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Shnibekov Zhasulan.pdf
Size:
325.92 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: