SEO 性能分析工具:通过复杂的网页抓取实现 AI 驱动的 SEO 洞察
我建造了什么
了解 SEO 性能分析工具:一个全面的 SEO 分析平台,将复杂的网页抓取与 AI 驱动的洞察相结合。此工具可帮助 SEO 专业人员和内容创建者通过以下方式优化其网站:
主要特点:
演示
**现场演示**:SEO 性能分析工具
**源代码**:GitHub 存储库
截图
如何使用 Bright Data
1. 使用抓取浏览器进行复杂的网页抓取
该工具利用 Bright Data 的 Scraping Browser 来处理复杂且 JavaScript 密集的网站:
# lighthouse.py
def get_lighthouse(target_url: str):
sbr_connection = ChromiumRemoteConnection(SBR_WEBDRIVER, 'goog', 'chrome')
driver = Remote(sbr_connection, options=ChromeOptions())
try:
# Navigate to PageSpeed Insights
encoded_url = f"https://pagespeed.web.dev/analysis?url={target_url}"
driver.get(encoded_url)
# Challenge 1: Wait for dynamic content loading
WebDriverWait(driver, 60).until(
EC.presence_of_element_located((By.CLASS_NAME, "lh-report"))
)
# Challenge 2: Handle tab switching for desktop analysis
desktop_tab = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.ID, "desktop_tab"))
)
actions = ActionChains(driver)
actions.move_to_element(desktop_tab).click().perform()
# Challenge 3: Verify report content changed
WebDriverWait(driver, 20).until(
lambda driver: driver.find_element(By.CLASS_NAME, "lh-report").text != report_text
)**克服的挑战**:
2. Web Unlocker 用于竞争对手分析
使用 Bright Data 的 Web Unlocker 可靠地访问竞争对手的内容:
# compare_pages.py - Competitor Content Access
def fetch_html_content(url: str) -> tuple:
try:
# Ensure the URL has a proper scheme
if not url.startswith(('http://', 'https://')):
url = 'https://' + url
# Brightdata API configuration
api_url = "https://api.brightdata.com/request"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}"
}
payload = {
"zone": "web_unlocker1",
"url": url,
"format": "raw"
}
# Make request to Brightdata API
response = requests.post(api_url, json=payload, headers=headers)
if response.status_code == 200:
html_content = response.text
soup = BeautifulSoup(html_content, 'html.parser')
tags = soup.find_all(['h1', 'h2', 'h3', 'p'])
collected_html = ''.join(str(tag) for tag in tags)
return url, collected_html
except Exception as e:
print(f"Error fetching HTML content from {url}: {e}")
return url, None3. 用于发现竞争对手的 SERP API
集成 Bright Data 的 SERP API 来识别主要竞争对手:
# compare_pages.py - Competitor Discovery
def get_top_competitor(keyword: str, our_domain: str) -> str:
try:
url = "https://api.brightdata.com/request"
# Challenge: Get real-time SERP results and find relevant competitor
encoded_keyword = requests.utils.quote(keyword)
payload = {
"zone": "serp_api1",
"url": f"https://www.google.com/search?q={encoded_keyword}",
"format": "raw"
}
headers = {
"Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
# Parse search results with BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
all_data = soup.find_all("div", {"class": "g"})
# Find first relevant competitor
for result in all_data:
link = result.find('a').get('href')
if (link and
link.find('https') != -1 and
link.find('http') == 0 and
our_domain not in link):
return link
except Exception as e:
st.error(f"Error finding competitor: {str(e)}")
return None人工智能集成管道
技术堆栈
附加提示资格
该项目符合两个提示:
团队投稿
此投稿由 Kenan Can 创建
感谢您审阅我的提交内容!让我们利用网页抓取和 AI 的强大功能让 SEO 分析更加智能。