mht文件图片解析工具(兼容Chrome/Blink)

之前写过一个mht文件的解析工具,不过当时解析的文件都是ie生成的。没有测试过chrome解析的文件。今天在github上看到一个反馈:https://github.com/obaby/mht-image-extractor/issues/1 qq浏览器保存的文件无法提取,chrome保存的文件会直接崩溃。下载附件的文件解析后发现,这两个文件的文件格式与ie的文件格式并不一致,文件头改成了如下的内容:

From: 
Snapshot-Content-Location: https://mp.weixin.qq.com/s?__biz=MzU1NzQ3MTg5OQ==&mid=2247483652&idx=1&sn=a16979f8b088cb60fb63f210536d5288&chksm=fc3400f0cb4389e698a5a3ce1bf6a6ab3ff6f547bb4db409893850b0c502053d1fea40f70fda&sessionid=0&scene=126&subscene=0&clicktime=1599463540&enterid=1599463540&ascene=3&devicetype=android-28&version=27001237&nettype=ctnet&abtest_cookie=AAACAA%3D%3D&lang=zh_CN&exportkey=AUPVIV8Yt1hvPJ2dYKFWhvM%3D&pass_ticket=eTzcuEu%2BGavsf30E3HDErOhtb18ThPDhge008pIBzY7AFq0IuG1LUgojTpufwqUZ&wx_header=1
Subject: =?utf-8?Q?=E6=B1=89=E6=9C=8D=E4=B8=A8=E5=BD=BC=E5=B2=B8=E8=8A=B1=E5=BC=80?=
Date: Sun, 20 Sep 2020 00:50:44 -0000
MIME-Version: 1.0
Content-Type: multipart/related;
    type="text/html";
    boundary="----MultipartBoundary--Bx5ubV1DnfL8hvvsySfZL6MQeLa58tWkfwrQGpothO----"

而ie保存的文件头则是如下格式的:

Content-Type: multipart/related; start=op.mhtml.1267442701515.fe60c16c115c15f9@169.254.195.209; boundary=----------pMKI1vNl6U7UKeGzbfNTyN Content-Location: http://a.10xjw.com/feizhuliu/89905.html
Subject: =?utf-8?Q?=E8=B6=85=E7=BE=8E=E4=B8=9D=E6=8E=A7=E5=A7=90=E5=A6=B9=E8=8A=B1=E7=A7=92=E6=9D=80=E4=BD=A0=E6=B2=A1=E9=97=AE=E9=A2=98[26P]-=2037kxw.com=20-=20=E4=B8=AD=E5=9B=BD=E6=9C=80=E5=A4=A7=E7=9A=84=E8=89=B2=E6=83=85=E5=88=86=E4=BA=AB=E7=BD=91=E7=AB=99?= MIME-Version: 1.0
Continue Reading

Porn Data Anaylize — Spark安装

spark默认使用的Python版本为2,可以修改.bashrc文件让spark默认使用python3。修改.bashrc增加如下行:

# anaconda
export ANACONDA_HOME=/home/dbuser/anaconda3/
export PATH=$ANACONDA_HOME:$PATH
# spark
export PYSPARK_PYTHON=/home/dbuser/anaconda3/bin/python3

然后重新启动pyspark就是3了,anaconda下的python文件版本也是2。

Continue Reading

ModuleNotFoundError: No module named ‘cryptography.hazmat.bindings._padding’

ubuntu下执行命令出现如下错误:

ubuntu@ip-172-31-11-253:~/nineuu_spider/Maomi$ python3 maomi.py
Traceback (most recent call last):
  File "maomi.py", line 8, in 
    from cryptography.hazmat.primitives import padding
  File "/usr/lib/python3/dist-packages/cryptography/hazmat/primitives/padding.py", line 13, in 
    from cryptography.hazmat.bindings._padding import lib
ModuleNotFoundError: No module named 'cryptography.hazmat.bindings._padding'

但是实际上cryptography模块已经安装了:

如果要解决这个错误,安装paramiko即可:

pip3 install paramiko
Continue Reading

OSX pip3 install mysqlclient

在家办公于是将工作环境换到了mac,配置环境的时候安装mysqlclient出现了下面的错误:

(venv_home_mini) obaby@Obabys-Mac-mini taichigameserver % pip3 install mysqlclient
Collecting mysqlclient
  Using cached https://files.pythonhosted.org/packages/d0/97/7326248ac8d5049968bf4ec708a5d3d4806e412a42e74160d7f266a3e03a/mysqlclient-1.4.6.tar.gz
    Complete output from command python setup.py egg_info:
    /bin/sh: mysql_config: command not found
    /bin/sh: mariadb_config: command not found
    /bin/sh: mysql_config: command not found
    Traceback (most recent call last):
      File "", line 1, in 
      File "/private/var/folders/gf/qbbv4crd5m9f1vkz5066dvyw0000gn/T/pip-install-m5gv0sbr/mysqlclient/setup.py", line 16, in 
        metadata, options = get_config()
      File "/private/var/folders/gf/qbbv4crd5m9f1vkz5066dvyw0000gn/T/pip-install-m5gv0sbr/mysqlclient/setup_posix.py", line 61, in get_config
        libs = mysql_config("libs")
      File "/private/var/folders/gf/qbbv4crd5m9f1vkz5066dvyw0000gn/T/pip-install-m5gv0sbr/mysqlclient/setup_posix.py", line 29, in mysql_config
        raise EnvironmentError("%s not found" % (_mysql_config_path,))
    OSError: mysql_config not found
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/gf/qbbv4crd5m9f1vkz5066dvyw0000gn/T/pip-install-m5gv0sbr/mysqlclient/

网上搜了一下说安装mysql-connector-c  ,但是安装之后依然报错。于是尝试安装mysql,安装mysql之后解决该问题,其实出问题的地方在于这个命令找不到,如果直接在终端中执行mysql_config如果能够执行成功,那么只需要创建个软连接就可以解决这个问题。

(venv_home_mini) obaby@Obabys-Mac-mini taichigameserver % mysql_config
Usage: /usr/local/bin/mysql_config [OPTIONS]
Compiler: AppleClang 11.0.0.11000033
Options:
        --cflags         [-I/usr/local/Cellar/mysql/8.0.19/include/mysql ]
        --cxxflags       [-I/usr/local/Cellar/mysql/8.0.19/include/mysql ]
        --include        [-I/usr/local/Cellar/mysql/8.0.19/include/mysql]
        --libs           [-L/usr/local/Cellar/mysql/8.0.19/lib -lmysqlclient -lssl -lcrypto]
        --libs_r         [-L/usr/local/Cellar/mysql/8.0.19/lib -lmysqlclient -lssl -lcrypto]
        --plugindir      [/usr/local/Cellar/mysql/8.0.19/lib/plugin]
        --socket         [/tmp/mysql.sock]
        --port           [0]
        --version        [8.0.19]
        --variable=VAR   VAR is one of:
                pkgincludedir [/usr/local/Cellar/mysql/8.0.19/include/mysql]
                pkglibdir     [/usr/local/Cellar/mysql/8.0.19/lib]
                plugindir     [/usr/local/Cellar/mysql/8.0.19/lib/plugin]

安装命令:

brew install mysql

如果没有安装homebrew,通过下面的命令安装:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"  

ubuntu 18.04 pip3 install mysqlclient

不知道是阿里云的问题还是ubuntu本身的问题,今天安装mysqlclient提示:

/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
collect2: error: ld returned 1 exit status
error: command ‘x86_64-linux-gnu-gcc’ failed with exit status 1

网上搜了一下没有发现类似的错误信息,于是转换思路直接搜索: /usr/bin/ld: cannot find -lssl 在这篇文章看到了解决方案:

https://blog.51cto.com/eminzhang/1285705

Continue Reading

韩国美女模特爬虫

对于美女的热爱无法自拔 😆 ,经常会去搜索一些美女图片,下载下来,然后找时间慢慢欣赏。主要用途是用作电脑桌面手机桌面,通常会百度或者bing去搜索下找到图片下载。相对来说能够直接用作桌面的图片并不多,多数是尺寸问题,并不是十分合适。但是即使不能直接用,可以用ps修改下图片尺寸,或者欣赏也是好的啊。 🙂 

以前曾经从一个网站mzitu.com 爬了一些图片,但是最近访问的时候却发现网站挂了~~

Continue Reading