Randomized and approximized algorithms

Kudabayev T.

Randomized and approximized algorithms

Files

Temirlan Kudabayev.pdf (14.48 MB)

Date

2022

Authors

Kudabayev T.

Abstract

In modern times, as the number of websites expands, so does the volume of data. Automated apps that can scan and extract the essential information, process it, and save it in a user-friendly format are in high demand among users. These programs are called web scrapers. The immense popularity of these applications compels online service owners to take additional steps to decrease the number of bots in their Internet service traffic. If Web Services Protection identifies the user as a bot, the server will block the user entirely. The primary objective of this master’s thesis is to create a Node.js parser using the Selenium tool. In addition, in order to replicate human activities, our parser will utilize randomized algorithms. In this thesis, we will examine the work of server protection in greater detail; we will examine four common indicators by which the server identifies that the user is a parser, as well as the server’s methods for preventing bots. We will analyze how using the Selenium tool and the introduction of randomized algorithms will help us bypass the blocking from the server. To obtain the results, we will parse five subcategories of the chosen website and evaluate the stability of our software based on four parameters: the number of pages parsed, the amount of data processed, the total number of errors and blocks, and the rate of data parsing per unit of time. In order to do a qualitative analysis, we will compare these indicators using the same parser but without the application of randomized algorithms.

Keywords

web scrapers, applications, bots, Internet service traffic, Web Services Protection, algorithms

URI

https://repository.sdu.edu.kz/handle/123456789/1595

Collections

Masters

Full item page

Randomized and approximized algorithms

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Find us

Call us

Mail us

Useful Links

Follow us