SEO 性能分析工具:通过复杂的网页抓取实现 AI 驱动的 SEO 洞察
我建造了什么
了解 SEO 性能分析工具:一个全面的 SEO 分析平台,将复杂的网页抓取与 AI 驱动的洞察相结合。此工具可帮助 SEO 专业人员和内容创建者通过以下方式优化其网站:
主要特点:
演示
**现场演示**:SEO 性能分析工具
**源代码**:GitHub 存储库
截图
如何使用 Bright Data
1. 使用抓取浏览器进行复杂的网页抓取
该工具利用 Bright Data 的 Scraping Browser 来处理复杂且 JavaScript 密集的网站:
# lighthouse.py def get_lighthouse(target_url: str): sbr_connection = ChromiumRemoteConnection(SBR_WEBDRIVER, 'goog', 'chrome') driver = Remote(sbr_connection, options=ChromeOptions()) try: # Navigate to PageSpeed Insights encoded_url = f"https://pagespeed.web.dev/analysis?url={target_url}" driver.get(encoded_url) # Challenge 1: Wait for dynamic content loading WebDriverWait(driver, 60).until( EC.presence_of_element_located((By.CLASS_NAME, "lh-report")) ) # Challenge 2: Handle tab switching for desktop analysis desktop_tab = WebDriverWait(driver, 20).until( EC.element_to_be_clickable((By.ID, "desktop_tab")) ) actions = ActionChains(driver) actions.move_to_element(desktop_tab).click().perform() # Challenge 3: Verify report content changed WebDriverWait(driver, 20).until( lambda driver: driver.find_element(By.CLASS_NAME, "lh-report").text != report_text )
**克服的挑战**:
2. Web Unlocker 用于竞争对手分析
使用 Bright Data 的 Web Unlocker 可靠地访问竞争对手的内容:
# compare_pages.py - Competitor Content Access def fetch_html_content(url: str) -> tuple: try: # Ensure the URL has a proper scheme if not url.startswith(('http://', 'https://')): url = 'https://' + url # Brightdata API configuration api_url = "https://api.brightdata.com/request" headers = { "Content-Type": "application/json", "Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}" } payload = { "zone": "web_unlocker1", "url": url, "format": "raw" } # Make request to Brightdata API response = requests.post(api_url, json=payload, headers=headers) if response.status_code == 200: html_content = response.text soup = BeautifulSoup(html_content, 'html.parser') tags = soup.find_all(['h1', 'h2', 'h3', 'p']) collected_html = ''.join(str(tag) for tag in tags) return url, collected_html except Exception as e: print(f"Error fetching HTML content from {url}: {e}") return url, None
3. 用于发现竞争对手的 SERP API
集成 Bright Data 的 SERP API 来识别主要竞争对手:
# compare_pages.py - Competitor Discovery def get_top_competitor(keyword: str, our_domain: str) -> str: try: url = "https://api.brightdata.com/request" # Challenge: Get real-time SERP results and find relevant competitor encoded_keyword = requests.utils.quote(keyword) payload = { "zone": "serp_api1", "url": f"https://www.google.com/search?q={encoded_keyword}", "format": "raw" } headers = { "Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) if response.status_code == 200: # Parse search results with BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser') all_data = soup.find_all("div", {"class": "g"}) # Find first relevant competitor for result in all_data: link = result.find('a').get('href') if (link and link.find('https') != -1 and link.find('http') == 0 and our_domain not in link): return link except Exception as e: st.error(f"Error finding competitor: {str(e)}") return None
人工智能集成管道
技术堆栈
附加提示资格
该项目符合两个提示:
团队投稿
此投稿由 Kenan Can 创建
感谢您审阅我的提交内容!让我们利用网页抓取和 AI 的强大功能让 SEO 分析更加智能。