最近学钢琴,也经常会用到曲谱,但网上大多数曲谱不清晰,或者清晰的要vip。因此研究下某曲谱网站,进行爬取vip才能下载的曲谱并组合为pdf。
可以在http://123.206.217.190:8888试用效果
下面的是python3.x代码,在window可直接本地运行,在linux做一些注释中的修改。
#coding:utf-8
import requests
from bs4 import BeautifulSoup
import os
import sys
import io
from PIL import Image
from reportlab.lib.pagesizes import A4, landscape
from reportlab.pdfgen import canvas
import time
import random
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8')
#输入弹琴吧所需琴谱的网址
#把网址变成手机访问的网址
req = requests.Session()
url=""
#url = "http://www.tan8.com/yuepu-58546.html"
state=True
while state:
url = input("输入弹琴吧钢琴曲网址:\n")
if url.find("-m.html")==-1:
url = url.replace(".html","-m.html")
imgdir = "tmpimgtan8/"
if url.find("-m.html")==-1:
print("请输入正确网址")
else:
imgdir = "tmpimgtan8/"
state=False
if not os.path.exists(imgdir):
os.mkdir("tmpimgtan8")
#爬下来解析出mp3,图片地址
#保存MP3,图片
resp = req.get(url)
soup=BeautifulSoup(resp.text,"lxml")
#windows可以用这个中文名做文件名
title = soup.find_all("title")[0].text.replace(" ","").replace("/","")
#linux用下面的随机数做文件名
#title = str(int(random.random()*8999)+1000)
mp3 = soup.find_all("source")[0]["src"]
mreq = req.get(mp3)
print(title)
with open(title+".mp3","wb") as f:
f.write(mreq.content)
f.close()
picul = soup.find_all("ul",{"class":"swiper-wrapper"})[0]
images = picul.find_all("img")
for i in images:
imgurl = req.get(i['src'])
with open(imgdir+".".join(i['src'].split(".")[-2:]),"wb") as f:
f.write(imgurl.content)
f.close()
files=os.listdir(imgdir)
if "Thumbs.db" in files:
files.remove("Thumbs.db")
#把图片连接成pdf
f_pdf = title+".pdf"
(w, h) = landscape(A4)
c = canvas.Canvas(f_pdf, pagesize = (h,w))
for file in files:
c.drawImage(imgdir+file,0,0,h,w)
c.showPage()
os.remove(imgdir+file)
c.save()
try:
os.rmdir("tmpimgtan8")
except:
print("请手动删除 tmpimgtan8")
同时,还顺手写了个web服务的代码。
可以到https://github.com/webgjc/blog的tan8/查看。
版权声明:本文为原创文章,转载请注明出处和作者,不得用于商业用途,请遵守
CC BY-NC-SA 4.0协议。
赞赏一下
支付宝打赏
微信打赏