侧边栏壁纸
博主头像
峰峰火火

一条咸鱼罢了

  • 累计撰写 121 篇文章
  • 累计创建 87 个标签
  • 累计收到 59 条评论

目 录CONTENT

文章目录

爬虫框架之microsoft playwright

峰峰火火
2024-03-22 / 0 评论 / 0 点赞 / 134 阅读 / 380 字 / 正在检测是否收录...
温馨提示:
若内容或图片失效,请留言反馈。部分素材来自网络,若不小心影响到您的利益,请联系我们删除。

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API. Playwright is built to enable cross-browser web automation that is ever-green, capable, reliable and fast.

Link to Github

Linux macOS Windows
Chromium 124.0.6367.8
WebKit 17.4
Firefox 123.0

Headless execution is supported for all browsers on all platforms. Check out system requirements for details.

Looking for Playwright for Python, .NET, or Java?

模拟用户操作浏览器,适用于网页图片懒加载,通过构建网络请求地址比较复杂时,可以使用这种操作替换,防止直接调接口被拦截

可以使用无头模式后台执行任务

Java 爬取百度图片例子

import cn.hutool.core.io.FileUtil;
import com.microsoft.playwright.*;

import java.nio.charset.StandardCharsets;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class TestWeb {
    static Set<String> set = new HashSet<>();
    public static void main(String[] args) throws InterruptedException {
        try (Playwright playwright = Playwright.create()) {
            Browser browser = playwright.chromium().launch(new BrowserType.LaunchOptions()
                    .setHeadless(true));
            BrowserContext context = browser.newContext(
                    new Browser.NewContextOptions());
            Page page = context.newPage();

            page.navigate("https://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&dyTabStr=MCwzLDEsMiw1LDYsNCw4LDcsOQ%3D%3D&word=%E7%81%AB%E5%BD%B1%E5%BF%8D%E8%80%85");
            int i = 0;
            // 滚轮下滑10次
            while (i++ < 10) {
                List<ElementHandle> elementHandles = page.querySelectorAll("img.main_img");
                for (ElementHandle handle : elementHandles) {
                    set.add(handle.getAttribute("src"));
                }
                page.evaluate("window.scrollBy(0, 1500)");
            }
            for (String s : set) {
                if (s.contains("base64")) {
                    continue;
                }
                FileUtil.appendString("<img src=\"" +s + "\">" + "\n", Paths.get("images.txt").toFile(), StandardCharsets.UTF_8);
            }
            browser.close();
        }
    }
}
0

评论区