My Feed is Private KTHXBYE
Posted by ark, ,
This post has been superseded by my post on feedFixer.


Did you know... even if your site is completely blocked via robots.txt if you publish a feed and someone subscribes to it in bloglines or other web based readers, all your content in that feed is considered fair game (at least to bloglines it is, see this excellent explanation from feedburner). But don't worry help is at hand, there are at least TWO standards on how to block people from indexing your rss content. The leader appears to be bloglines own: http://www.bloglines.com/about/specs/fac-1.0 and there's also a w3c one that no one links to so I'm guessing it's dead in the water? http://www.w3.org/TR/access-control/. Bloglines really shows why w3c is generally unusable. w3c doc, completely unreadable and no simple examples, old and busted. bloglines doc, small, simple, easy examples to follow, new hotness.

Sadly Blogger.com which I use to manage my blogs doesn't allow me to turn this on for my feeds so I had to write something that did it for me, since I was changing my feeds anyway. I figured I might as well start using feedburner.com as well to get better stats on how many people are reading my blogs.

Hopefully this change will go completely unnoticed, I'm redirecting via .htaccess with a RedirectPermanent so I think the feed readers should update their stuff and stop hitting the old urls soon?

I wrote a small program to fix a feed and make it private. I present to you feedFixer, you run it, it takes a feed from one file and writes the same feed (but with private flags in it) to another file.

I like how it's configured, you have a file in ~/.feedFixerrc that is plain python, this file get's evaluated when feedFixer is run. An example file looks like this:

# -*- Python -*-
global BASE_DIR, FEEDS

BASE_DIR = os.path.expanduser("~/html")
FEEDS = (("blog/atom.public.xml", "blog/atom.private.xml"),
)
you can add many feeds. It won't run if the destination file is older than the source file, I just run it from a crontab every 10 minutes with this....

1-51/10 * * * * feedFixer >feedfixer-log.txt 2>&1
Strangely enough this blog is the one I actually want the content indexed for and searchable. It's my other personal blogs (me, baby, house) that I don't want indexed.

Comments