Using the WordPress Rest API plugin you can easily get a JSON payload containing data from your blog.
If you use SSL, you likely will need to use Python 3, as this includes many bug fixes.
First, load the page text:
url = 'https://www.garysieling.com/blog/wp-json/wp/v2/posts?per_page=10&page=18'
import urllib3
http = urllib3.PoolManager(10)
response = http.request('GET', url)
Then parse it as JSON:
import json
jsonData = response.data.decode('utf-8')
posts = json.loads(jsonData)
From there, the blog post text is readily available:
post = posts[1]["content"]["rendered"]
title = posts[1]["title"]["rendered"]
Then, you can easily rip out all the HTML tags (see this stackoverflow post for the source of this solution)
from html.parser import HTMLParser
class MLStripper(HTMLParser):
def __init__(self):
self.reset()
self.strict = False
self.convert_charrefs= True
self.fed = []
def handle_data(self, d):
self.fed.append(d)
def get_data(self):
return ''.join(self.fed)
def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()
clean = strip_tags(post)
Unfortunately, the summarization library does not support python 3. There is ap atch for this, and you can install it directly from github, like so:
pip install https://github.com/voneiden/PyTeaser/archive/py3.zip
Once you do this, you can get a summary for the given post (Lessons Learned from 0 to 40,000 Readers).
from pyteaser import Summarize
" ".join(Summarize(title, clean))
This results in the following text, which gives a decent summary of the article:
'Since then, a bit over 40,000 people have read articles I’ve written, not a huge number in the grand scheme of things, but enough to draw a few lessons. Posts I’ve made received more votes, even though they are self posts, because they are at least relevant. In practice, I’ve written on wider subjects – anything within “full stack web development” is fair game, trying to focus on new, or popular tech – Scala, DevOps (Vagrant/Chef/Virtualization), Hadoop, R, and scraping. It’s the only thing I’ve written that seems to have received significant attention on Google+ (19 events on G+, 24 on Twitter). I’ve written several articles which have been posted to Twitter by 20+ people.'