Upload
hadan
View
214
Download
0
Embed Size (px)
Citation preview
Project
DescriptionOur Universal Web Crawler carries out the Internet surfing of more than 40 web portals of different trade companies and inputs data about their
products into the database. Information for each product contains its code, model, description, all technical features listed on the site, price (if it
is showed on the site), measurement unit for bought items count, etc. The ASP.NET technology is used for interface building. The program
language is C# (.NET Framework 3.5). The interface allows setting all parameters of crawling, scheduling and running. It contains the information
about the crawler’s last run. To monitor the process the detail logging is available. You can determine where the crawler’s work was incorrect and
rectify the situation. Our database contains information about several millions of products and this number increases every day. You can look at
the summary data for any category, manufacturer and period as well as the detailed information about goods. With the purpose of data saving
and processing SQL Server 2008 R2 Enterprise is used. There are two databases. The first one is used as an Online Transaction Processing
service and the second is used as a Data Warehouse. The data transfer to the Data Warehouse is implemented by transactional replication.
The SQL Server Agent jobs are extensively used for different purposes such as database maintenance, summary tables filling, notification about
the current state of different processes, etc. The CLR-stored procedures and functions are used for tasks which cannot be realized in
Transact-SQL, for example, using regular expressions, downloading data from outer sources (the Internet or local networks), etc. The SQL Server
Reporting Services are applied to generate a number of reports about both the current state of the system and the macro activities of different
products categories, manufacturers and sellers, price changes during any period, comparison of prices in different companies, analysis of price
index dynamics, etc. The multidimensional structures and data mining models in SQL Server Analysis Service are used to get the main trends in
pricing for different products categories, manufacturers and sellers. The SQL Server Integration Services packages are used very much for
database maintenance and other tasks such as data uploading onto the FTP-server, etc.
Universal web crawling system
Page 2
Confidential InformationSSA LTD Databases Portfolio
www.ssa-outsourcing.com
Web sites, portals
21 3 ...
...
N
Crawling Core Instance 1
XPATH, XML,Regular Expressions,
Proxy Servers, Anonimizers ect.
Logs viewer Database
ReportingServices
Admin panel Logs viewerAdmin panel
Crawling Core Instance M
XPATH, XML,Regular Expressions,
Proxy Servers, Anonimizers ect.
Tools / TechnologiesThe ASP.NET technology is used for interface building.
The program language is C# (.NET Framework 3.5)
With the purpose of data saving and processing SQL Server 2008 R2 Enterprise is used.
Techniques and ApproachesTransformation of the incorrect HTML marker of the document into a well-structured XML document using SgmlReader
Organizing the crawl of all website pages or the given part to retrieve the necessary data
If needed, realizing the input of data into the required fields (e-mail, ZIP-code, login/password, etc.)
Management of the Internet information resources scanning
Automatic process initialization for the execution of the crawlers work
Tuning configuration and management of crawlers work through the user’s interface in addition to providing different types of reports based on
the results of crawlers work
Selection of the necessary information from a given Internet information resource
Handling HTML frames
Handling complex AJAX constructions Handling custom javascript constructions
Handling any exception pages such as 404 – website not found, “Site under reconstruction”, etc.
Escaping black lists
Using anonymizers
Using custom proxy servers
Using Regular expressions and XPath for extraction of textual and graphical information.
DurationMore than 5 years (current)
Team SizePM, 1 .Net developer, 1 Database developer
Cooperation ModelDedicated Team
Customer’s Feedback
Screenshots
“Excellent provider. Very happy with their service, professionalism, and support. Highly recommended.”
Nathan Krol, Stanley, USA
Page 3
Confidential InformationSSA LTD Databases Portfolio
www.ssa-outsourcing.com
Project
DescriptionThe system allows organizing a fully automated cycle of placing bets on any sport on the betfair.com site. In this case the system functionality
description is given based on horse racing.
HRBS carries out all the necessary work related to search and fetching data on all races in Great Britain and Ireland for the nearest few days. It
is made possible due to the use of so-called crawlers or search robots.
The system fetches both race cards and statistical information about horses, trainers and jockeys. Taken into account are also the pedigree and
current real coefficients of every runner, going conditions and courses.
HRBS includes a parser, which enables the system to prepare all statistical info for further decision-making by the predictor.
The predictor evaluates a few tens of parameters for every runner and builds an internal rating for each of them, which allows generating
predictions for both winners and runners that are most likely to lose.
Once the predictions are generated, Free Betfair API can be used for automatic bet placing on betfair.com. There are a great number of various
strategies for generating predictions and bet making. Each of these strategies is realized within a so-called Betfair bot. To ensure the stable work
of bots and monitor the process of bet making as well as to warn about some faults and errors the Bots controller is used.
The bots are fairly configurable. For each of them the user can set stop loss and stop won, limits for the number of runners, types of races,
distance, going conditions, etc.
Horse racing betting system
Page 4
Confidential InformationSSA LTD Databases Portfolio
www.ssa-outsourcing.com
Web Data Sources
Betfair.com
Database
Data analyser
Results predictor
Webmonitor
Crawling controle panel
Horse pedigreecrawler
Racing cardscrawler
Statisicscrawler
Resultscrawler / RRS
reader
Betfair API
Betfair bots / robots
21 3 ... N Botscontroller
The system also contains a component allowing to check the races results in real-time mode and return the stakes results – won or loss.
The work of the whole system is logged and can be controlled in real-time mode with the help of the Web monitor, which in addition to providing
the possibility to collect statistical info about the performance of each bot also allows calculating the profit on each of them. HRBS includes a web
service for connection with other applications like Secret Horse, Horse Reminder, etc.
Page 5
Confidential InformationSSA LTD Databases Portfolio
www.ssa-outsourcing.com
Tools / TechnologiesC++, C#.Net, ASP.NET, MS SQL Server 2008, DotNetNuke
Duration2 years
Team SizePM, database developer, ASP.Net developer, QA engineer
Cooperation ModelTime and Materials
Customer’s Feedback
Screenshots
"We have been very pleased with SSA. They listen to our requirements and provide solutions that meet our needs. We have been very impressed
with their technical knowledge, attention to detail and ability to deliver on schedule."
Company name under NDA