Install Scrapy on OSX|mac OSX 上安装Scrapy[SYN:gejoin.com]

2016/01/092016/01/09 by GIGI WANG ♥ 1 Comment

注：文章同步自我的[SYN:gejoin.com]

顺利的话，一句命令搞定：
sudo pip install Scrapy
OSX还是需要额外一些包或升级的，包括：

cssselect, queuelib, six, w3lib, lxml, Twisted, characteristic, pyasn1, pyasn1-modules, service-identity

幸运的是pip或者easy_install 可以自动为你安装，但是其中的问题：

Found existing installation: six 1.4.1
DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
Uninstalling six-1.4.1:

…

OSError: [Errno 1] Operation not permitted: ‘/tmp/pip-qeBchm-uninstall/System/Library/Frameworks/Python.framework/
Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info’

显然升级six是出错了,使用sudo
pip install six --upgrade 或者 pip uninstall six
都会遇到同样的错误。无论什么用户，多大权限。想必下载包安装也会同样问题。我只能这么认为：这都是pip的错。
好吧，那就easy_install试试吧。

sudo easy_install –upgrade six
Searching for six
Reading https://pypi.python.org/simple/six/
Best match: six 1.10.0
Downloading https://pypi.python.org/packages/source/s/six/>six-1.10.0.tar.gz#md5=34eed507548117b2ab523ab14b2f8b55
Processing six-1.10.0.tar.gz

Installed /Library/Python/2.7/site-packages/six-1.10.0-py2.7.egg
Processing dependencies for six
Finished processing dependencies for six

继续使用easy_install 安装Scarpy吧:…

sudo easy_install Scrapy

看起来挺顺利..
可是….

In file included from src/lxml/lxml.etree.c:323:
src/lxml/includes/etree_defs.h:14:10: fatal error: ‘libxml/xmlversion.h’ file not found
#include “libxml/xmlversion.h”
^
1 error generated.
Compile failed: command ‘cc’ failed with exit status 1
/tmp/easy_install-U7v3Lb/lxml-3.5.0/temp/xmlXPathInitxO27oS.c:1:10: >fatal error: ‘libxml/xpath.h’ file not found
#include “libxml/xpath.h”
^
1 error generated.

看来libxml2 需要额外安装的

xcode-select --install

以下就顺利完成安装

sudo pip install Scrapy
OR
sudo pip easy_install Scrapy

Fedora 20正式发布，带来哪些更新呢？

2013/12/18 by GIGI WANG

Fedora 20 已于2013年12月17日正式发布。正如我们大家所知，Fedora 是一款基于 Linux 的操作系统，一个红帽赞助的社区项目。The four foundations of freedom, features, friends, and first are the core values of the Fedora community.历经十余年的发展，Fedora已成为最具知名度的发行版之一。

Fedora 20 带来了那些更新呢？看官方 Fedora 20 发行注记中的介绍，更新内容包括了针对系统管理员，桌面应用，开发者，特定爱好者的若干更新。

其中，主要是一些应用工具的升级，以及增加更多应用资源。
对于开发者，增加了Developer Assistant，Perl升级到5.18，更新了 python-setuptools ，Gcc以及一些Web开发资源。
对于桌面使用者，桌面很炫，说实话没有太大的更新，过多的Linux的桌面本身就有些鸡肋了..管理方面改变不大。3D打印，这个玩意儿，不大懂..言多必失..
其它的还需要补充...

使用python对excel进行读写

2013/07/122013/07/12 by GIGI WANG ♥ 1 Comment

工作周报或是日报，月报，是每一个IT民工的痛吧，一天忙碌下来还要被催着写周报...还好，我们的周报数据是
从项目管理平台(类似mantis）上导出来的。已经有前辈使用Perl写了将导出的excel数据进行筛选生成既定的
EXCEL格式的文档。

大家都是懒人... 这个导出周报加上代码维护的工作交给了我。
我所要做的工作整个步骤是：登录网站（用户名/密码）->
->列表页面->输入条件->导出为||->使用Perl将导出文件进行处理->提交周报->...
如果将"导出为"之前的手工步骤也省略掉的话,那才真正实现一步完成..由于对Perl还不熟，所有步骤就选用python
重新写了！那就直接动手做吧:
分析：模块上包括登录WEB，下载文件，导出到EXCEL。
前两个模块都很简单，网上资料也很多：
1.登录，由于是登录后下载，所以需要模拟浏览器，使用cookies。这里就贴出测试的Demon

def get_srcfile(begindt,fridaydt):

    cj= cookielib.CookieJar()
    opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    opener.addheaders = [('User-agent','Mozilla/5.0 (compatible; MSIE 7.0; Windows NT 5.1)')]
    params = urllib.urlencode({'username':'username',              
                           'password':'passwd',
                            'Cookies_Time': 1,
                             'IsLogin':True})
    #  with cookies
    login_page='http://www.xxx.com/login'
    login_data = urllib.urlencode({u"username":u'wangzhe2',u"password":u'wangzhe2'})   
    opener.open(login_page,login_data)
     ......
    opener.close()

2.下载文件：
减少文字量，参看这里吧：http://outofmemory.cn/code-snippet/83/sanzhong-Python-xiazai-url-save-file-code

3.读下载的文件，并导入生成的excel
这里不得不提到xlwt,xlrd,xlutils这几个包。
xlrd：读excel文件，从名字可以看出来，xls read 。地址：https://pypi.python.org/pypi/xlrd
xlwt：写excel文件，xls write，可以控制Excel中单元格的格式。地址：https://pypi.python.org/pypi/xlwt
xlutils：xlwt对于读取和生成Excel文件都非常容易处理，但是对于已经存在的Excel文件进行修改就存在问题了，
xlutils（依赖于xlrd和xlwt）提供复制excel文件内容和修改文件的功能。
下载地址：https://pypi.python.org/pypi/xlutils
具体参见文档： http://www.python-excel.org/ http://pythonhosted.org/xlutils/

不得不说在使用的时候遇到了问题，我需要的是操作office 2007+ 的xlsx文件，而且要用到excel的数据验证和下拉选框
（Data validation and drop down lists），这样问题显而易见了，这个包肯定不能用了。
于是，必须要重新选择其他方法了，这时Google给出了这个最佳答案： XlsxWriter!
https://pypi.python.org/pypi/XlsxWriter
通过介绍可以看得出来，XlsxWriter可以操作xlsx文件，单元格写入文本，数字，公式，日期等不同格式；格式化表格，
图表，合并单元格等等。，最主要满足Data validation and drop down lists操作的需求。

XlsxWriter is a Python module for creating Excel XLSX files.
XlsxWriter supports the following features:

100% compatible Excel XLSX files.
Write text, numbers, formulas, dates to cells.
Write hyperlinks to cells.
Full cell formatting.
Multiple worksheets.
Charts.
Page setup methods for printing.
Merged cells.
Defined names.
Autofilters.
Data validation and drop down lists.
Conditional formatting.
Worksheet PNG/JPEG images.
Rich multi-format strings.
Cell comments.
Document properties.
Worksheet cell protection.
Freeze and split worksheet panes.
Worksheet Tables.
Sparklines.
Outlines and Grouping.
Memory optimisation mode for writing large files.
Standard libraries only.
Python 2.6, 2.7, 3.1, 3.2 and 3.3 support.

Linux下编译Python/C API问题

2012/07/032012/07/03 by GIGI WANG

在Linux下编译python c api时遇到类似下面的错误:

 undefined reference to `Py_Initialize

当然，如果你在windows平台下，使用IDE可能不会遇到这样问题。但是在linux，unix下呢，要自己动手写Makefile呢？
猜测这可能是缺少某些库，Google一下可以找到答案，这里文章会给出原因和解决方案，但是在多一下废话给刚刚接触python C API 的童鞋们，这不是所谓的技术文章，只是希望众多刚刚步入python大门遇到此类问题的一个参考。当然我也是菜鸟...

Linux下安装python，当前的发行版通常已经安装了python，但是可能版本等原因，如果需要安装，建议源码编译安装：

到这里下载所对应的版本：http://python.org/解压，cd到解压后的python(X.X.X).

# ./configure
#  make
#  make install

这样编译安装完成，在Terminal下敲python：

Python 2.7.3 (default, Jul  3 2012, 18:01:45) 
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

OK,现在来试试Python/C API.网上的例子很多，随便拿来一个最简单的,假设保存为main.c：

//This is A sample.
#include "Python.h"

int main()
{
        Py_Initialize();
        printf("This is a C-Python Program.\n");
        PyRun_SimpleString("print(\"Hello,Python\")");
        Py_Finalize();
        return 0;
}

写一个Makefile,因为python的安装目录都采取了默认：

ALL	= ./tc
CC	= gcc
RM	= rm
LIBS	=  -lpthread -lm -ldl -lutil 
INCL	= /usr/local/include/python2.7

OBJ	= main.o
all:$(ALL)

./tc : $(OBJ) 
	$(CC)  $(OBJ) -I$(INCL) -L$(LIBS) -L/usr/localb/ -lpython2.7 -o $@

clean:
	$(RM) $(OBJ) 
	$(RM) $(ALL)

好了，编译，运行：

This is a C-Python Program
Hello,Python

那么，文章开头提到的问题呢，没忘记呢。是因为在编译时忘记链接这些库：

-lpthread -lm -ldl -lutil

不要忘了,还有，注意Python.h的路径，P是大写！

吉吉在这里-其实不想说

Tag: Python

Install Scrapy on OSX|mac OSX 上安装Scrapy[SYN:gejoin.com]