TEXT BASED DOCUMENT SIMILARITY MEASURE

Shnibekov Zh.

TEXT BASED DOCUMENT SIMILARITY MEASURE

dc.contributor.author	Shnibekov Zh.
dc.date.accessioned	2023-03-31T09:19:34Z
dc.date.available	2023-03-31T09:19:34Z
dc.date.issued	2013
dc.description.abstract	Do you have a shortage of data? Not very likely. A consequence of the pervasive use of computers is that most data originate in digital form. If we trade a stock or write a book or buy a product online, these events evolve electronically. Since so many paper transactions are now in paperless digital form, lots of “big” data are available for further analysis. The concept of data mining, finding valuable patterns in data, is an obvious response to the collection and storage of large volumes of data. Data mining is no longer an emerging technology awaiting further development. Although its application is far from universal, the techniques of data mining are highly developed and for some forms of analysis are entering a mature phase. We would like to say “Give us data and we will findthepatterns.”Unfortunately, data-mining methods expect a highly structured format for data, necessitating extensive data preparation. Either we have to transform the original data, or the data are supplied in a highly structured format.Data-mining methods learn from samples of past experience. If we speak to specialists in predictive data mining, their data will be in numerical form. These people are the “numbers guys.” The “text miners” do not expect an orderly series of numbers. They are happy to look at collections of documents, where the contents are readable and their meaning is obvious. This is our first distinction between data and text mining: numbers versus text. That doesn’t mean that these are two distinct concepts. Both are based on samples of past examples. The composition of the examples is very different, yet many of the learning methods are similar. That’s because the text will be processed and transformed into a numerical representation.
dc.identifier.citation	TEXT BASED DOCUMENT SIMILARITY MEASURE, Shnibekov Zhasulan, 2013
dc.identifier.uri	https://repository.sdu.edu.kz/handle/123456789/269
dc.publisher	Suleyman Demirel University
dc.subject	Text document
dc.subject	data analysis methods
dc.title	TEXT BASED DOCUMENT SIMILARITY MEASURE
dc.type	Article
dspace.entity.type

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Shnibekov Zhasulan.pdf
Size:: 325.92 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

3. Articles and Papers

TEXT BASED DOCUMENT SIMILARITY MEASURE

Files

Original bundle

License bundle

Collections

Find us

Call us

Mail us

Useful Links

Follow us