24
Luminati Provides Web-Transparency Web Scraping Proxy Management Workshop

Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Luminati Provides Web-Transparency

Web Scraping Proxy Management Workshop

Page 2: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Consumers opt-in to the network in return for free partner's application usage

Luminati developed a global P2P network 35M+ consumers willing to help

Page 3: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

How do we get users active consent?

Page 4: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

How does it work?

We use a peer’s IP address only when a device meets 3 conditions:

Page 5: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Businesses can now view the web, as these 35M global consumers can see it

Page 6: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Luminati Proxy Networks Available

Page 7: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Crawling Network Architecture

Page 8: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Luminati Proxy Manager

Page 9: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Luminati Proxy Manager

Page 10: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

RobotDetection

Page 11: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION Techniques for Bot Detection

● IP reputation

● Browser headers and cookies

● Device fingerprints

● User behaviour and history

● IP leaks

Page 12: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION IP Reputation

● Type

● Request rate

● Account association

● Blacklisted IPs

● Inconsistencies

Page 13: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION Browser Fingerprints

● User uniqueness on the web

● Users become more unique as the entropy level increases

Page 14: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION Browser Fingerprint Examples

Page 15: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION

Desktop <> Mobile Android <> iOS

User Agent Uniqueness

Page 16: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION Audio Fingerprints

AudioContext properties:

Page 17: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION

Image from http://getwallpapers.com

Symptoms: blocked <> cloaked <> recaptcha

Page 18: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION How to Prevent Getting Blocked or Cloaked

● Request rate

● Country and city discovery

● Managing headers and fingerprints

● Internet protocol version (i.e HTTP/2)

● Persistence

Page 19: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION How to Overcome Common Blockades

● By using different IPs, geo’s and networks

○ Waterfall routing

● Auto retry and banning IPs

○ Optimize IP cooling period

● New IP and fingerprints

○ Error code, ReCaptcha, cloaked

Page 20: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION Waterfall Routing

Target Website

Page 21: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION Luminati’s Unblocker

Curl --proxy <username>:<password>@unblock.zproxy.lum-superproxy.io:22225 https://example.com

Just make a simple request and let us handle the rest!

Automatic RetryAutomatically retries request upon a failed response

Network RotationRoute through multiple networks automatically (waterfall)

Manages HeadersAutomatic header management based on site requirements

Manages CookiesIP priming and cookie management based on overall request load

Country DiscoveryChooses the right country IP based on your request or target site

Detection and MatchingEnsures the response is of the right content type

Page 22: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

ROBOT DETECTION Luminati’s Unblocker

Curl --proxy <username>:<password>@unblock.zproxy.lum-superproxy.io:22225 https://example.com

Just make a simple request and let us handle the rest!

Automatic RetryAutomatically retries request upon a failed response

Network RotationRoute through multiple networks automatically (waterfall)

Manages HeadersAutomatic header management based on site requirements

Manages CookiesIP priming and cookie management based on overall request load

Country DiscoveryChooses the right country IP based on your request or target site

Detection and MatchingEnsures the response is of the right content type

Page 23: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple

Proxy Manager: Url: http://zagent1745.luminati.io:22999

Page 24: Luminati Provides Web-Transparencyluminati.io/static/London-Workshop.pdf · Curl --proxy :@unblock.zproxy.lum-superproxy.io:22225 Just make a simple