<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.3" -->
<rss version="0.92">
<channel>
	<title>Dryice Liu's Blog</title>
	<link>http://dryice.name/blog</link>
	<description></description>
	<lastBuildDate>Tue, 31 Aug 2010 12:01:11 +0000</lastBuildDate>
	<docs>http://backend.userland.com/rss092</docs>
	<language>en</language>
	
	<item>
		<title>feedparser.text content type</title>
		<description>I need to change this line from


true_encoding = http_encoding or 'us-ascii'


to


true_encoding = http_encoding or xml_encoding or 'us-ascii'


for those buggy sites that don't obey the standard. And set content type to text/* but don't offer a charset, set their encoding in the xml file.

Share This
 </description>
		<link>http://dryice.name/blog/python/feedparsertext-content-type/</link>
			</item>
	<item>
		<title>feedparser.whitespace</title>
		<description>According to the XML spec http://www.w3.org/TR/REC-xml/#NT-EncodingDecl whitespace is allowed around the quotes of encoding Here is a simple patch:


--- /usr/ports/textproc/py-feedparser/work/feedparser/feedparser.py.old	Sat Jul  2 16:17:11 2005
+++ /usr/ports/textproc/py-feedparser/work/feedparser/feedparser.py	Sat Jul  2 16:18:25 2005
@@ -2101,7 +2101,7 @@
else:
# ASCII-compatible
pass
-        xml_encoding_match = re.compile('^&#60;\?.*encoding=[\'"](.*?)[\'"].*\?&#62;').match(xml_data)
+      ...</description>
		<link>http://dryice.name/blog/python/feedparserwhitespace/</link>
			</item>
	<item>
		<title>feedparser.encoding</title>
		<description>Looks Feedparser was written with Python 2.3. With python 2.4, the CJKcodecs is included in the official release. So the line


import cjkcodecs.aliases


should be changed to


import encodings.aliases


Share This
 </description>
		<link>http://dryice.name/blog/python/feedparserencoding/</link>
			</item>
</channel>
</rss>

