PartTimeHackerman/ProxyScraper


Simple public proxy scraper.

Download


Step 1. Add the JitPack repository to your build file

Add it in your root settings.gradle at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url 'https://jitpack.io' }
		}
	}

Add it in your settings.gradle.kts at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url = uri("https://jitpack.io") }
		}
	}

Add to pom.xml

	<repositories>
		<repository>
		    <id>jitpack.io</id>
		    <url>https://jitpack.io</url>
		</repository>
	</repositories>

Add it in your build.sbt at the end of resolvers:

 
    resolvers += "jitpack" at "https://jitpack.io"
        
    

Add it in your project.clj at the end of repositories:

 
    :repositories [["jitpack" "https://jitpack.io"]]
        
    

Step 2. Add the dependency

	dependencies {
		implementation 'com.github.PartTimeHackerman:ProxyScraper:'
	}
	dependencies {
		implementation("com.github.PartTimeHackerman:ProxyScraper:")
	}
	<dependency>
	    <groupId>com.github.PartTimeHackerman</groupId>
	    <artifactId>ProxyScraper</artifactId>
	    <version></version>
	</dependency>

                            
    libraryDependencies += "com.github.PartTimeHackerman" % "ProxyScraper" % ""
        
        

                            
    :dependencies [[com.github.PartTimeHackerman/ProxyScraper ""]]
        
        

Readme


ProxyScraper

Build Status codebeat badge Codacy Badge

ProxyScraper is fairly simple public proxy scraper that allows the following:

  • gathering sites that contain proxy,
  • scraping proxies from sites,
  • checking if proxy is working, what anonimity it's providing and with what speed it's working.

Gathering new sites (from google):

SitesScraper sitesScraper = new SitesScraper();
sitesScraper.scrapeSites();
List<Site> gathered = sitesScraper.getSites();

Gathering sites (from provided sites):

Integer depth = 2;
Integer threads = 20;
List<Site> sitesList = new ArrayList<>(Collections.singleton(new Site("http://sample.xyz", ScrapeType.UNCHECKED)));
LinkGatherConcurrent linkGather = new LinkGatherConcurrent(depth, new Pool(threads, 0L));
List<Site> gathered = linkGather.gatherSites(sitesList);

Scraping sites:

Integer threads = 20;
Integer timeout = 10000; // in millis.
Integer limit = 0; // no limit
Integer check = true; // check proxy right after scraping
Integer browsers = 5; // no. PhantomJS browsers
Integer ocrs = 1; // no. tesseract ocrs

Scraper scraper = new Scraper(threads, timeout, limit, check, browsers, ocrs);
List<Site> sitesList = new ArrayList<>(Collections.singleton(new Site("http://sample.xyz", ScrapeType.UNCHECKED)));
List<Proxy> proxies = scraper.getProxyScraper().scrapeList(sitesList);

Checking proxies:

...
List<Proxy> proxyList = new ArrayList<>(Collections.singleton(new Proxy("192.168.1.1", 8080)));
Scraper scraper = new Scraper(threads, timeout, limit, check, browsers, ocrs);
scraper.getProxyChecker().checkProxies(proxyList);

Before running You must add phantomjs.exe to executable path or set path to it like Browser.setPhantomJsPath("C:/some/path/phantomjs.exe");

You can also use branch which already has PhantomJS bundled.

There's also branch with view made in JavaFx.