博客已经更新了半年多了,前期做了些seo优化,但是发现好像作用不大,google和必应都没怎么收录,最多就收录了主页,去看了一下爬虫日志,频率也非常低,基本就没怎么爬,没办法,是时候要主动提交了。

博客是hexo框架,文章资源保存在github,push提交会触发cloudflare pages部署,基于这个思路,本来想着有没有什么插件可以实现自动提交,找了一圈,发现网上说得比较多的是:

  • hexo-seo-submit

  • hexo-submit-urls-to-search-engine

按照官方文档部署后,触发测试,发现并没有提交成功,我这里猜测是因为这两个框架是在hexo d得时候才进行触发的,但是我的部署逻辑是不会进行hexo d的操作的,在cloudflare端,直接hexo g后,取用public目录部署就完了。没办法,得想一想其他办法,所以就想直接使用github actions自动提交触发,而且这个方法也不错,如果使用差价,你在提交到google时,因为是在国内,不能直接访问,所以还要配置代理,太蛮烦了,而且我也没海外服务器,直接用guthub不香吗?不废话了,直接给教程。


获取必要key和配置

必应

  1. 注册、登陆必应新站长平台 Bing Webmaster Tools
  2. 添加网站
  3. 进入网站管理页面,设置,API 访问,API 密钥 ,记下 API 密匙

谷歌

可以参考官方教程:Indexing API 快速入门 | Google 搜索中心 | Google for Developers

这里不赘诉了,重要的文件是导出的json文件,这个要保留好。

github actions配置

在博客源码目录下面,找到.github文件,进去,然后创建workflows文件夹,然后在这个文件夹创建bing-submission.yml和google-indexing.yml文件,结构如下:

  • 主目录
    • .github
      • workflows
        • bing-submission.yml
        • google-indexing.yml

bing-submission.yml内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
name: Bing URL Submission

on:
schedule:
# 每天 UTC 时间 00:00 执行(北京时间 08:00)
- cron: '0 0 * * *'
# 允许手动触发
workflow_dispatch:

jobs:
submit-urls:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install requests

- name: Run Bing submission script
env:
BING_API_KEY: ${{ secrets.BING_API_KEY }}
run: python bing_submission.py

google-indexing.yml内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
name: Submit URLs to Google Indexing API

on:
workflow_dispatch: # 允许手动触发
schedule:
# 根据需要设置定时触发,例如每天UTC 12:00运行
- cron: '0 12 * * *'

env:
CREDENTIALS_JSON: ${{ secrets.GOOGLE_CREDENTIALS_JSON }}

jobs:
submit-urls:
name: Submit URLs to Google Indexing API
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install google-api-python-client oauth2client requests

- name: Create credentials file
run: echo "$CREDENTIALS_JSON" > service-account.json

- name: Run Google Indexing Script
run: python google_indexing.py # 确保脚本名与实际一致
env:
# 如有需要,可在此设置其他环境变量
SITEMAP_URL: 'https://www.flyday.top/sitemap.xml'

- name: Upload results as artifact
uses: actions/upload-artifact@v4
with:
name: submission-results
path: submission_results.txt

注意到这里有两个环境变量,也就是key,我们通过环境变量形式导入,而不直接写入文件,注意安全。

进入github仓库,点击顶部settings,找到左边侧栏Secrets and variables下面的actions,点击New repository secret,分别添加如下:

name 内容
BING_API_KEY 必应key,前面获取的
GOOGLE_CREDENTIALS_JSON 复制前面导出的json文件内容

到这里,actions配置就完成了。

提交脚本配置

直接在仓库根目录下创建如下两个脚本:

  • bing_submission.py
  • google_indexing.py

bing_submission.py内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import requests
import re
import random
import os
from typing import List

# 配置 - 从环境变量读取敏感信息
sitemap_url = "https://www.xx.top/sitemap.xml" # 替换自己的sitemap
bing_api_url = "https://ssl.bing.com/webmaster/api.svc/json/SubmitUrlbatch"
apikey = os.getenv("BING_API_KEY") # 从环境变量读取
site_url = "https://www.xx.top"# 替换自己的
n = 10 # 每次随机选取的URL数量

def fetch_sitemap_urls(sitemap_url: str) -> List[str]:
"""提取 Sitemap 中的链接"""
response = requests.get(sitemap_url)
response.raise_for_status()
sitemap_content = response.text
urls = re.findall(r"<loc>(.*?)</loc>", sitemap_content)
return urls

def submit_urls_to_bing(api_url: str, apikey: str, site_url: str, url_list: List[str]):
"""提交链接到 Bing API"""
headers = {
"Content-Type": "application/json; charset=utf-8",
}
payload = {
"siteUrl": site_url,
"urlList": url_list
}
params = {
"apikey": apikey
}
response = requests.post(api_url, headers=headers, params=params, json=payload)
response.raise_for_status()
return response.json()

def main():
try:
# 验证 API Key
if not apikey:
print("Error: BING_API_KEY environment variable is not set")
return

# 获取sitemap中的所有URL
print("Fetching URLs from sitemap...")
all_urls = fetch_sitemap_urls(sitemap_url)
print(f"Fetched {len(all_urls)} URLs from sitemap.")

if not all_urls:
print("No URLs found in sitemap.")
return

# 随机选择n个URL(如果可用URL数量不足n,则选择所有可用URL)
urls_to_submit = random.sample(all_urls, min(n, len(all_urls)))
print(f"Randomly selected {len(urls_to_submit)} URLs for submission.")

# 打印要提交的URL(用于调试)
for url in urls_to_submit:
print(f" - {url}")

# 提交到Bing
print("Submitting URLs to Bing...")
response = submit_urls_to_bing(bing_api_url, apikey, site_url, urls_to_submit)
print(f"URLs submitted successfully: {response}")

except Exception as e:
print(f"An error occurred: {e}")
# 退出码非0,让Action失败
raise

if __name__ == "__main__":
main()

google_indexing.py内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
from googleapiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
import requests
import xml.etree.ElementTree as ET
import time
import random

def get_urls_from_sitemap(sitemap_url):
"""从sitemap获取所有URL"""
try:
print(f"正在获取sitemap: {sitemap_url}")
response = requests.get(sitemap_url, timeout=30)
response.raise_for_status()

# 解析XML
root = ET.fromstring(response.content)

# 命名空间处理
namespaces = {'ns': 'http://www.sitemaps.org/schemas/sitemap/0.9'}

urls = []
for url_element in root.findall('ns:url', namespaces):
loc = url_element.find('ns:loc', namespaces)
if loc is not None and loc.text:
urls.append(loc.text.strip())

print(f"从sitemap成功获取 {len(urls)} 个URL")
return urls

except requests.exceptions.RequestException as e:
print(f"网络请求失败: {e}")
return []
except ET.ParseError as e:
print(f"XML解析失败: {e}")
return []
except Exception as e:
print(f"解析sitemap失败: {e}")
return []

def publish_batch(urls_batch, credentials_file):
"""处理一批URL"""
successful = []

if not urls_batch:
return successful

requests_dict = {url: "URL_UPDATED" for url in urls_batch}

SCOPES = ["https://www.googleapis.com/auth/indexing"]

try:
credentials = ServiceAccountCredentials.from_json_keyfile_name(credentials_file, scopes=SCOPES)
service = build('indexing', 'v3', credentials=credentials)

def index_api_callback(request_id, response, exception):
if exception is not None:
print(f'请求失败 - ID: {request_id}, 错误: {exception}')
else:
successful_url = response['urlNotificationMetadata']['url']
successful.append(successful_url)
print(f'提交成功: {successful_url}')

batch = service.new_batch_http_request(callback=index_api_callback)

for url, api_type in requests_dict.items():
batch.add(service.urlNotifications().publish(
body={"url": url, "type": api_type}))

print("正在执行批量提交...")
batch.execute()

except Exception as e:
print(f"API调用失败: {e}")

return successful

def publish():
"""主函数:从sitemap获取URL并随机选择200个提交到Google Indexing API"""
# 配置参数
sitemap_urls = [
'https://www.xxx.top/sitemap.xml', # 替换自己的
# 可以添加多个sitemap
]

credentials_file = 'service-account.json' # 替换为你的JSON密钥文件路径
max_urls_to_submit = 200 # 最大提交URL数量
batch_size = 100 # 每批处理的URL数量
delay_between_batches = 2 # 批次之间的延迟(秒)

# 获取所有URL
all_urls = []
for sitemap_url in sitemap_urls:
urls = get_urls_from_sitemap(sitemap_url)
all_urls.extend(urls)

if not all_urls:
print("没有从sitemap获取到任何URL,程序退出")
return

print(f"总共获取到 {len(all_urls)} 个URL")

# 随机选择URL
if len(all_urls) > max_urls_to_submit:
selected_urls = random.sample(all_urls, max_urls_to_submit)
print(f"随机选择了 {len(selected_urls)} 个URL进行提交")
else:
selected_urls = all_urls
print(f"URL数量不足 {max_urls_to_submit},将提交全部 {len(selected_urls)} 个URL")

# 分批处理
all_successful = []
total_batches = (len(selected_urls) + batch_size - 1) // batch_size

print(f"开始分批处理,共 {total_batches} 批,每批最多 {batch_size} 个URL")

for i in range(0, len(selected_urls), batch_size):
batch_num = i // batch_size + 1
batch_urls = selected_urls[i:i + batch_size]

print(f"\n=== 处理第 {batch_num}/{total_batches} 批 ===")
print(f"本批包含 {len(batch_urls)} 个URL")

successful = publish_batch(batch_urls, credentials_file)
all_successful.extend(successful)

print(f"本批成功提交 {len(successful)} 个URL")

# 如果不是最后一批,添加延迟
if batch_num < total_batches and delay_between_batches > 0:
print(f"等待 {delay_between_batches} 秒后处理下一批...")
time.sleep(delay_between_batches)

# 输出最终统计
print(f"\n=== 处理完成 ===")
print(f"总共成功提交: {len(all_successful)}/{len(selected_urls)} 个URL")

if len(all_successful) < len(selected_urls):
failed_count = len(selected_urls) - len(all_successful)
print(f"失败: {failed_count} 个URL")

if __name__ == "__main__":
print("开始从sitemap获取URL并随机选择200个提交到Google Indexing API...")

publish()

print("程序执行完毕")

最后直接把所有修改提交到仓库即可,每日会自动触发,当然第一次可以手动触发测试,进入仓库,点击顶部的actions,可以在侧边All workflows看到我们新增的两个action:

  • Bing URL Submission

  • Submit URLs to Google Indexing API

任选一个,点击Run workflow执行即可:

到这里就没了,当然具体的提交逻辑可以自行修改提交脚本,这里需要注意的是:

  • bing每日提交是有上限的,大部分人每日好像只有10额度【离谱】
  • google index api每日免费额度是200,别超了,好像超出是付费的

不过也很奇怪,我网站没有收录,但是访问量也不错,有点奇怪,目前还没仔细去分析流量:

流量排名靠前的国家/地区

国家/地区 流量
Netherlands 2,004
United States 1,488
China 995
Korea, South 450
Japan 419