Bing引擎每日有1W條的鏈接提交額度,對於做搜索引擎網站流量的站長來說,肯定是每天能盡量多的提交鏈接給它。
版本一:每天自動從數據庫里面提取10000條數據,生成鏈接,分成500條一份進行提交(Bing的限制),按照正序、偏移量的方法提交可以保證一個鏈接都不會重複而且簡單,但是缺點是最先提交的都是老舊的數據,如果數據庫裡面有幾百萬條數,那就傻眼了,程序自動提交都得1年時間。
版本二:優先提交昨日更新的最新數據,多餘的提交額度用於提交正序老舊鏈接。
先看下官網給出的JSON提交示例,
POST /webmaster/api.svc/json/SubmitUrlbatch?apikey=sampleapikeyedecc1ea4ae341cc8b6 HTTP/1.1
Content-Type: application/json; charset=utf-8
Host: ssl.bing.com
{
"siteUrl":"http://example.com",
"urlList":[
"http://example.com/url1","http://example.com/url2"
]
}
HTTP/1.1 200 OK
Content-Length: 10
Content-Type: application/json; charset=utf-8
{
"d":null
}
全程使用面向過程的方法
引入相關庫
import pymysql
import requests
import datetime
import math
定義Bing後台的host及生成的apikey
my_host = "domain.com"
api_key = "api_key"
偏移量
# 每次取500条,每次偏移500
limit_amount = 500
offset_amount = 500
獲取今日鏈接提交剩餘額度,該函數來源於Jaynet
# 首先尝试今日额度 查询额度
def search_bing_quota():
headers = {
"Content-Type": "application/json; charset=utf-8",
"Host": "ssl.bing.com",
}
quota_url = f"https://ssl.bing.com/webmaster/api.svc/json/GetUrlSubmissionQuota?siteUrl=http://{my_host}&apikey={api_key}"
try:
quota_res = requests.get(quota_url, headers=headers)
quota_json = quota_res.json()
if quota_res.status_code == 200:
print("Search bing quota Success")
DailyQuota = quota_json['d']['DailyQuota']
MonthlyQuota = quota_json['d']['MonthlyQuota']
print(f"Daily quota:{DailyQuota}, Monthly quota:{MonthlyQuota}.")
return int(DailyQuota)
else:
print("Search bing quota Faild")
print(quota_json)
except Exception as err:
print("Search bing quota Faild")
print(err)
後台提交url的路徑
# bing SubmitUrl
url = f'https://ssl.bing.com/webmaster/api.svc/json/SubmitUrlbatch?apikey={api_key}'
headers = {'content-type': 'application/json'}
# 待组装的url前缀
data_host = f"https://{my_host}"
data_prefix_url = f"https://www.{my_host}/number/"
鏈接數據庫,查詢數據庫昨日更新的數據數量
# 首先获取昨日更新的数据
conn = pymysql.connect('rds','zxf','password','database')
table_name = "table_name"
cursor = conn.cursor()
def getYesterday():
today=datetime.date.today()
oneday=datetime.timedelta(days=1)
yesterday=today-oneday
return yesterday
yesterday_number_count_sql = f"SELECT number FROM `{table_name}` WHERE `up_date` LIKE '{getYesterday()}%'"
cursor.execute(yesterday_number_count_sql)
res = cursor.fetchall()
print(f"昨日更新数量:{len(res)}")
這裡面WHERE `up_date` LIKE '{getYesterday()}%'語句是為了方便利用更新時間的索引,例如 like "2020-03-01%"的會將所有更新日期以2020-03-01開頭的數據查找出來(可以利用索引)。
接下來,在昨日更新數量不為0的前提下,作3種判斷,如果昨日更新數量大於500小於10000,大於或等於10000,小於500
if len(res) != 0:
# 如果昨天更新的数据大于500,那么要进行分段提交
if 10000> len(res) > 500:
fre_times = len(res)/500
fre_times_ceil = math.ceil(fre_times) # 9.044 >>10
for i in range(fre_times_ceil):
yesterday_url_list =[] # 每次提交都清空列表
number_search_sql = f"SELECT number FROM `{table_name}` WHERE `up_date` LIKE '{getYesterday()}%' LIMIT {limit_amount} OFFSET {offset_amount*i}"
cursor.execute(number_search_sql)
res = cursor.fetchall()
for number in res:
yesterday_url_list.append(data_prefix_url + number[0])
#组装成json数据
data = {
'siteUrl': data_host,
"urlList":yesterday_url_list
}
# 必须要在这里面提交url,否则外面提交链接列表都被清空了
r = requests.post(url, json=data, headers=headers)
print(r.text) # {"d":null} is successful!
elif len(res) >=10000:
for i in range(20):
yesterday_url_list =[] # 每次提交都清空列表
number_search_sql = f"SELECT number FROM `{table_name}` WHERE `up_date` LIKE '{getYesterday()}%' LIMIT {limit_amount} OFFSET {offset_amount*i}"
cursor.execute(number_search_sql)
res = cursor.fetchall()
for number in res:
yesterday_url_list.append(data_prefix_url + number[0])
data = {
'siteUrl': data_host,
"urlList":yesterday_url_list
}
r = requests.post(url, json=data, headers=headers)
else: # 小于500直接提交就行了
yesterday_url_list =[]
number_search_sql = f"SELECT number FROM `{table_name}` WHERE `up_date` LIKE '{getYesterday()}%' LIMIT {limit_amount}"
cursor.execute(number_search_sql)
res = cursor.fetchall()
for number in res:
yesterday_url_list.append(data_prefix_url + number[0])
data = {
'siteUrl': data_host,
"urlList":yesterday_url_list
}
r = requests.post(url, json=data, headers=headers)
else:
print("昨日更新数量为0!")
接下來查詢剩餘可提交額度,按次數計入到偏移量中
# 再次获取剩余额度,除以500得到可提交的次数,多余的都不要了
remain_times = search_bing_quota()/500
remain_times = int(remain_times)
print(f"今日剩余可提交次数: {remain_times}")
如果低於500條,那麼就直接丟棄不提交了
# 如果可剩余数量为0,则后面不用走了
if remain_times != 0:
# 读取上一次的偏移量
with open("last_time_offset_index.txt", 'r', encoding="utf-8") as fp:
x= fp.read().split(",")
right_index = int(x[1])
print(f"right_index, {right_index}")
# 每次提交都偏移可用次数
offset_range = range(right_index, right_index + remain_times)
try:
for i in offset_range:
urlList = [] # 用于组装url的列表
sql_offset = f"""SELECT number FROM {table_name} LIMIT {limit_amount} OFFSET {offset_amount*i}"""
cursor.execute(sql_offset)
res = cursor.fetchall()
for number in res:
urlList.append(data_prefix_url + number[0])
#组装成json数据
data = {
'siteUrl': data_host,
"urlList":urlList
}
# 与 get 请求一样,r 为响应对象
r = requests.post(url, json=data, headers=headers)
except:
print("something wrong!")
finally:
# 关闭连接
cursor.close()
conn.close()
# 记录本次的偏移量
with open("last_time_offset_index.txt", 'w', encoding="utf-8") as fp:
fp.write(f"{right_index},{right_index + remain_times}")
因為想要掛在服務器上自動運行,那麼肯定是需要記住每天偏移量為多少,我的方法是新建了一個last_time_offset_index.txt文件,裡面存放了每個前一天的索引,例如79,89這樣的數字。
在每次運行完程序之後,都對這個偏移量數字進行修改寫入,方便下一次的更新讀取。
這個程序還有很多可以修改的地方,例如,封裝函數面向對象;例如昨日更新數據超過10000條時的程序邏輯沒有關閉游標、數據庫鏈接(因為我對數據庫里面的數據更新比較了解,知道大概率是不會出現這種情況);又例如寫入到last_time_offset_index文件中索引量其實只用寫入一個數字即可;又比如剩餘的500條以內的提交額度可以不用浪費等等,時間關係,就先寫到這吧。
參考:
https://docs.microsoft.com/en-us/dotnet/api/microsoft.bing.webmaster.api.interfaces.iwebmasterapi.submiturlbatch?view=bing-webmaster-dotnet
https://docs.microsoft.com/en-us/bingwebmaster/getting-started