Step 1. Add the JitPack repository to your build file
Add it in your root settings.gradle at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url 'https://jitpack.io' }
}
}
Add it in your settings.gradle.kts at the end of repositories:
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
mavenCentral()
maven { url = uri("https://jitpack.io") }
}
}
Add to pom.xml
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Add it in your build.sbt at the end of resolvers:
resolvers += "jitpack" at "https://jitpack.io"
Add it in your project.clj at the end of repositories:
:repositories [["jitpack" "https://jitpack.io"]]
Step 2. Add the dependency
dependencies {
implementation 'com.github.PartTimeHackerman:ProxyScraper:'
}
dependencies {
implementation("com.github.PartTimeHackerman:ProxyScraper:")
}
<dependency>
<groupId>com.github.PartTimeHackerman</groupId>
<artifactId>ProxyScraper</artifactId>
<version></version>
</dependency>
libraryDependencies += "com.github.PartTimeHackerman" % "ProxyScraper" % ""
:dependencies [[com.github.PartTimeHackerman/ProxyScraper ""]]
ProxyScraper is fairly simple public proxy scraper that allows the following:
Gathering new sites (from google):
SitesScraper sitesScraper = new SitesScraper();
sitesScraper.scrapeSites();
List<Site> gathered = sitesScraper.getSites();
Gathering sites (from provided sites):
Integer depth = 2;
Integer threads = 20;
List<Site> sitesList = new ArrayList<>(Collections.singleton(new Site("http://sample.xyz", ScrapeType.UNCHECKED)));
LinkGatherConcurrent linkGather = new LinkGatherConcurrent(depth, new Pool(threads, 0L));
List<Site> gathered = linkGather.gatherSites(sitesList);
Scraping sites:
Integer threads = 20;
Integer timeout = 10000; // in millis.
Integer limit = 0; // no limit
Integer check = true; // check proxy right after scraping
Integer browsers = 5; // no. PhantomJS browsers
Integer ocrs = 1; // no. tesseract ocrs
Scraper scraper = new Scraper(threads, timeout, limit, check, browsers, ocrs);
List<Site> sitesList = new ArrayList<>(Collections.singleton(new Site("http://sample.xyz", ScrapeType.UNCHECKED)));
List<Proxy> proxies = scraper.getProxyScraper().scrapeList(sitesList);
Checking proxies:
...
List<Proxy> proxyList = new ArrayList<>(Collections.singleton(new Proxy("192.168.1.1", 8080)));
Scraper scraper = new Scraper(threads, timeout, limit, check, browsers, ocrs);
scraper.getProxyChecker().checkProxies(proxyList);
Before running You must add phantomjs.exe to executable path or set path to it like Browser.setPhantomJsPath("C:/some/path/phantomjs.exe");
You can also use branch which already has PhantomJS bundled.
There's also branch with view made in JavaFx.