群晖NAS教程(十一)、利用Docker搭建个人图书馆(calibre-web)
为了更好的浏览体验,欢迎光顾勤奋的凯尔森同学个人博客http://www.huerpu.cc:7000
个人图书馆(calibre-web)是超级棒的一个个人图书馆软件,可以像群晖的video station那样使用挂刷器来管理和观看图书,特别棒。
一、下载calibre-web镜像
在群晖docker套件中搜索calibre,选择第二个下载,之所以选这个,是因为它拥有图书格式转换的功能。
然后慢慢等待镜像下载好。
二、配置calibre启动参数
在群晖上创建文件夹config和books,分别挂载到docker镜像的磁盘/calibre-web/config和/books,不清楚的可以看镜像介绍,很easy。这里在群晖创建的两个文件夹,权限一定要给足,让任何人都可以访问,如果在群晖上操作不方便,可以考虑ssh连接群晖使用chmod -R 777来操作。
映射端口:
在启动环境变量增加两个PUID和PGID,这个有点难说明白,就默认按照截图写吧,表示登录admin用户的权限。
把初始化的metadata.db文件拷贝到上面步骤创建的config文件夹下,我的是e-book/library/config。
如果不知道metadata.db怎么获得,可以在win11上安装calibre,安装过程中会提示选择书库的目录,就在这个下面。
安装好启动即可,默认用户名密码是:admin/admin123。
数据库配置,如果你是安装我的步骤选的,把metadata.db复制到了config目录下,那么这里填写/calibre-web/config就行。
下面我们简单配置一下calibre,选择右上角的设置按钮,然后选功能配置,把启用上传打上勾。
三、配置豆瓣挂刷器
在群晖docker套件中,打开calibre的后台命令窗口,找到scholar.py文件,我们在同目录下增量豆瓣的py文件。
执行命令如下:
#回到上一级目录
cd ../
# 寻找scholar.py文件
find -name scholar.py
# 进入到scholar.py文件所在路径
cd ./app/cps/metadata_provider/
# 新增一个Newdouban.py文件
vi Newdouban.py
Newdouban.py内容如下,文件是从calibre原始镜像仓库拷贝过来的:
import re
import time
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from urllib.parse import urlparse, unquote
from lxml import etree
from functools import lru_cache
from cps.services.Metadata import Metadata
DOUBAN_SEARCH_JSON_URL = "https://www.douban.com/j/search"
DOUBAN_BOOK_CAT = "1001"
DOUBAN_BOOK_CACHE_SIZE = 500 # 最大缓存数量
DOUBAN_CONCURRENCY_SIZE = 5 # 并发查询数
DEFAULT_HEADERS = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36'
}
PROVIDER_NAME = "New Douban Books"
PROVIDER_ID = "new_douban"
class NewDouban(Metadata):
__name__ = PROVIDER_NAME
__id__ = PROVIDER_ID
def __init__(self):
self.searcher = DoubanBookSearcher()
super().__init__()
def search(self, query, generic_cover=""):
if self.active:
return self.searcher.search_books(query)
class DoubanBookSearcher:
def __init__(self):
self.book_loader = DoubanBookLoader()
self.thread_pool = ThreadPoolExecutor(max_workers=10, thread_name_prefix='douban_async')
def calc_url(self, href):
query = urlparse(href).query
params = {item.split('=')[0]: item.split('=')[1] for item in query.split('&')}
url = unquote(params['url'])
return url
def load_book_urls(self, query):
url = DOUBAN_SEARCH_JSON_URL
params = {"start": 0, "cat": DOUBAN_BOOK_CAT, "q": query}
res = requests.get(url, params, headers=DEFAULT_HEADERS)
book_urls = []
if res.status_code in [200, 201]:
book_list_content = res.json()
for item in book_list_content['items'][0:DOUBAN_CONCURRENCY_SIZE]: # 获取部分数据,默认5条
html = etree.HTML(item)
a = html.xpath('//a[@class="nbg"]')
if len(a):
href = a[0].attrib['href']
parsed = self.calc_url(href)
book_urls.append(parsed)
return book_urls
def search_books(self, query):
book_urls = self.load_book_urls(query)
books = []
futures = [self.thread_pool.submit(self.book_loader.load_book, book_url) for book_url in book_urls]
for future in as_completed(futures):
book = future.result()
if book is not None:
books.append(future.result())
return books
class DoubanBookLoader:
def __init__(self):
self.book_parser = DoubanBookHtmlParser()
@lru_cache(maxsize=DOUBAN_BOOK_CACHE_SIZE)
def load_book(self, url):
book = None
start_time = time.time()
res = requests.get(url, headers=DEFAULT_HEADERS)
if res.status_code in [200, 201]:
print("下载书籍:{}成功,耗时{:.0f}ms".format(url, (time.time() - start_time) * 1000))
book_detail_content = res.content
book = self.book_parser.parse_book(url, book_detail_content.decode("utf8"))
return book
class DoubanBookHtmlParser:
def __init__(self):
self.id_pattern = re.compile(".*/subject/(\\d+)/?")
def parse_book(self, url, book_content):
book = {}
html = etree.HTML(book_content)
title_element = html.xpath("//span[@property='v:itemreviewed']")
book['title'] = self.get_text(title_element)
share_element = html.xpath("//a[@data-url]")
if len(share_element):
url = share_element[0].attrib['data-url']
book['url'] = url
id_match = self.id_pattern.match(url)
if id_match:
book['id'] = id_match.group(1)
img_element = html.xpath("//a[@class='nbg']")
if len(img_element):
cover = img_element[0].attrib['href']
if not cover or cover.endswith('update_image'):
book['cover'] = ''
else:
book['cover'] = cover
rating_element = html.xpath("//strong[@property='v:average']")
book['rating'] = self.get_rating(rating_element)
elements = html.xpath("//span[@class='pl']")
book['authors'] = []
book['publisher'] = ''
for element in elements:
text = self.get_text(element)
if text.startswith("作者"):
book['authors'].extend([self.get_text(author_element) for author_element in element.findall("..//a")])
elif text.startswith("译者"):
book['authors'].extend([self.get_text(author_element) for author_element in element.findall("..//a")])
elif text.startswith("出版社"):
book['publisher'] = self.get_tail(element)
elif text.startswith("出版年"):
book['publishedDate'] = self.get_tail(element)
elif text.startswith("丛书"):
book['series'] = self.get_text(element.getnext())
summary_element = html.xpath("//div[@id='link-report']//div[@class='intro']")
book['description'] = ''
if len(summary_element):
book['description'] = etree.tostring(summary_element[-1], encoding="utf8").decode("utf8").strip()
tag_elements = html.xpath("//a[contains(@class, 'tag')]")
if len(tag_elements):
book['tags'] = [tag_element.text.strip() for tag_element in tag_elements]
book['source'] = {
"id": PROVIDER_ID,
"description": PROVIDER_NAME,
"link": "https://book.douban.com/"
}
return book
def get_rating(self, rating_element):
return float(self.get_text(rating_element, '0')) / 2
def get_text(self, element, default_str=''):
text = default_str
if len(element) and element[0].text:
text = element[0].text.strip()
elif isinstance(element, etree._Element) and element.text:
text = element.text.strip()
return text if text else default_str
def get_tail(self, element, default_str=''):
text = default_str
if isinstance(element, etree._Element) and element.tail:
text = element.tail.strip()
return text if text else default_str
原来旧的文件应该是douban.py,但现在貌似有点小问题:
import requests
from cps.services.Metadata import Metadata
class Douban(Metadata):
__name__ = "Douban Books"
__id__ = "douban"
doubanUrl = "http://YOUR_NAS_IP:8085"
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36'
}
def search(self, query, generic_cover=""):
if self.active:
val = list()
result = requests.get(self.doubanUrl + "/v2/book/search?q=" + query.replace(" ", "+"), headers=self.headers)
for r in result.json()['books']:
v = dict()
v['id'] = r['id']
v['title'] = r['title']
v['authors'] = r.get('authors', [])
v['description'] = r.get('summary', "")
v['publisher'] = r.get('publisher', "")
v['publishedDate'] = r.get('pubdate', "")
v['tags'] = [tag.get('name', '') for tag in r.get('tags', [])]
rating = r['rating'].get('average', '0')
if not rating:
rating = '0'
v['rating'] = float(rating) / 2
if r.get('image'):
v['cover'] = r.get('image')
else:
v['cover'] = generic_cover
v['source'] = {
"id": self.__id__,
"description": self.__name__,
"link": "https://book.douban.com/"
}
v['url'] = "https://book.douban.com/subject/" + r['id']
val.append(v)
return val
然后按esc,输入wq保存退出,重启容器。
这个时候就可以看到我们增加的豆瓣的挂刷器了,因为我是弄好之后截的图,所以图片是有图书封面的:
四、路由器设置calibre的端口转发到外网
名称 | 协议 | 外部端口 | 内部IP地址 | 内部端口 |
---|---|---|---|---|
library-calibre | TCP | 8089 | 192.168.31.19 | 8089 |
名称可以随便填,协议选择TCP,外部端口是外网访问的端口,内部IP地址是群晖的本地IP,内部端口是容器暴露给宿主机的端口号,为了方便我们全部设置成了统一的8089。
然后就可以愉快的外网访问了。