最近学钢琴,也经常会用到曲谱,但网上大多数曲谱不清晰,或者清晰的要vip。因此研究下某曲谱网站,进行爬取vip才能下载的曲谱并组合为pdf。
可以在http://123.206.217.190:8888试用效果
下面的是python3.x代码,在window可直接本地运行,在linux做一些注释中的修改。
#coding:utf-8 import requests from bs4 import BeautifulSoup import os import sys import io from PIL import Image from reportlab.lib.pagesizes import A4, landscape from reportlab.pdfgen import canvas import time import random sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8') #输入弹琴吧所需琴谱的网址 #把网址变成手机访问的网址 req = requests.Session() url="" #url = "http://www.tan8.com/yuepu-58546.html" state=True while state: url = input("输入弹琴吧钢琴曲网址:\n") if url.find("-m.html")==-1: url = url.replace(".html","-m.html") imgdir = "tmpimgtan8/" if url.find("-m.html")==-1: print("请输入正确网址") else: imgdir = "tmpimgtan8/" state=False if not os.path.exists(imgdir): os.mkdir("tmpimgtan8") #爬下来解析出mp3,图片地址 #保存MP3,图片 resp = req.get(url) soup=BeautifulSoup(resp.text,"lxml") #windows可以用这个中文名做文件名 title = soup.find_all("title")[0].text.replace(" ","").replace("/","") #linux用下面的随机数做文件名 #title = str(int(random.random()*8999)+1000) mp3 = soup.find_all("source")[0]["src"] mreq = req.get(mp3) print(title) with open(title+".mp3","wb") as f: f.write(mreq.content) f.close() picul = soup.find_all("ul",{"class":"swiper-wrapper"})[0] images = picul.find_all("img") for i in images: imgurl = req.get(i['src']) with open(imgdir+".".join(i['src'].split(".")[-2:]),"wb") as f: f.write(imgurl.content) f.close() files=os.listdir(imgdir) if "Thumbs.db" in files: files.remove("Thumbs.db") #把图片连接成pdf f_pdf = title+".pdf" (w, h) = landscape(A4) c = canvas.Canvas(f_pdf, pagesize = (h,w)) for file in files: c.drawImage(imgdir+file,0,0,h,w) c.showPage() os.remove(imgdir+file) c.save() try: os.rmdir("tmpimgtan8") except: print("请手动删除 tmpimgtan8")
同时,还顺手写了个web服务的代码。
可以到https://github.com/webgjc/blog的tan8/查看。
版权声明:本文为原创文章,转载请注明出处和作者,不得用于商业用途,请遵守
CC BY-NC-SA 4.0协议。
赞赏一下