Jane Austen, line by line every 15 minutes. The image you see is by William Blake of Mrs. Q. (Mrs. Harriet Quentin), a 1820 engraving based upon a portrait by Francois Huet Villiers from the British Museum.
Pure Austen
Wednesday, 01 January 2020 11:39 Hrs
❝ Jane Austen, line by line every 15 minutes.

In 2017 I needed a challenge. From this germ of an idea @pureausten was born.

Did The Bard really wear an earing?

Inspiration

I looked to "The Bard" for inspiration. [0] The twitter feed @iam_shakespear, live tweets the entire works of Shakespeare, one line at a time every ten minutes. All one hundred and twenty thousand lines of the Bard. Impressive.

Limitations

The problems tweeting a book on twitter are numerous. The core challenges: Find a collection of books that an audience might read; Break them down into one hundred and twenty characters chunks; Then tweet at regular intervals. How do I tweet without a keyboard? [1]

Objectives

The key objectives I want to achieve through this Art enabled technology is brevity [2] and culture. Culture isn't something you can easily create or enforce with technology and everybody deserves the right to read in peace. The Internet can be a hostile environment for people, so the idea of ^social programming^ to create a a positive culture is a priority.

Secondary objectives I will try to achieve:

  • Be regular and reliable
  • Be free from annoying distractions.
  • Have zero tolerance of abuse, jerks and dickheads [3]
  • Regular messaging of updates or technical problems
  • Transmit an authors entire catalogue
  • Reply in third person impersonal [4]
  • Limit following readers
  • Have fun

Who will read @pureausten if it's not fun?

The Boring Technical Bits

❝ Skip this section if you don't like computers.

Prior to starting, I needed to consider the technical infrastructure: networking, computing and software.

Networking

❝ 24/7 Internet access (Easy)

Access to the Internet via my 24/7 NBN router. I have the hardware wired to a router to the Internet. Networking is independent from both computing and software. If the power drops, I can resume on a back-up system using my laptop and mobile phone.

Computer

❝ A computer that is cheap, uses little power and is reliable (Easy)

I need access to a computer with:

  • Low cost
  • Low power requirement
  • Cheap and reliable data storage
  • Uses open source / free software infrastructure
  • Networking using Cat-5 cable
  • Reliabile operation

A thing of beauty and simplicity A thing of beauty and simplicity A thing of beauty and simplicity

My choice in computing is the commodity RaspberryPi (RPi). I chose the RPi 3B with the BCM2837B0 chipset. [5],[6]

  • Cost: AUD $50 / GBP £35
  • Version: RPi/3B
  • Power: Very low power requirements (5V DC, 3 Amps).
  • Processor: Very fast 64 bit, A8 4xCore, running at 1.4GHz.
  • Memory: 1Gb RAM.
  • Software infrastructure: Rasbian Operating System.
  • Reliability: Low power requirements, no fan required thus able to run the machine 24 hours a day, seven days a week.
  • Storage: Commodity SD cards.
  • Networking: Gigabyte Ethernet (cable), Wireless Bluetooth (WiFi)
  • Hackability: 2 x USB 2.0 and USB 3.0 ports

The RPi is the bees-knees of computing. Cheap, fast and programmable. Exactly what's required. Did I mention it's cheap?

Software

❝ I need a piece of software that parses books into tweets (Hard)

I need to create software that inputs a text file, processes that file into lines less than one hundred and twenty characters. I need another piece of software that tweets lines to twitter. Finally I need code to use as a timer to tweet at regular time intervals.

The following is required:

1 Reading: Input a book

2 Parsing: Process book into single lines of text

3 Communication: Tweet exactly one line

4 Timing: Tweet one line of text at fixed time intervals

5 When a book finishes, repeat the entire process.

Simple isn't it?

I started work on 17th August 2017 taking about a week of work to complete. Apart from some minor adjustments, no changes to code has been made since 2017.

I found step two the most difficult. Step two reads a single book from a file then shreds (parses) the book, paragraph by paragraph into lines. Below is a simplified description of process. A more detailed explanation can be found in Table One. [7]

a) STEP 1: Read book from a text file: 2 python files 1/3Kb

b) STEP 2: Process book paragraph by paragraph into a file of single lines: 7 python files, a shell script 1.5Kb

c) STEP 3: Tweet exactly one line, repeat until end of file: 2 python files, a shell script

d) STEP 4: At a fixed time interval, tweet a single line, one shell script

e) STEP 5: New book setup is a manual process

Formatting a book into a file of single lines, regardless of the line length, is balancing act between brevity and readability. Most problems with the current software can be found in STEP 2 and this is where I will focus effort into the next major update.

Parsing 101

❝ Parse: intransitive verb. Computers. To analyze or separate (input, for example) into more easily processed components.

When I parse the text of a book, I first break the body of the text into paragraphs. Then I chop up (parse) each paragraph into words and punctuation. Let's take a look at the first two parpagraphs of "Pride and Predjudice".

❝ It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.

❝ However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of some one or other of their daughters.

Imagine the parser chopping the paragraph into words and placing them into individual boxs. This is what the result looks like.

['It', 'is', 'a', 'truth', 'universally', 'acknowledged,', 'that', 'a', 'single', 'man', 'in', 'possession', 'of', 'a', 'good', 'fortune,', 'must', 'be', 'in', 'want', 'of', 'a', 'wife.']

The second step is to build a new line of text, one word at a time. Let us try this with the first paragraph, counting the number of characters that make up the sentence at the same time.

Char	Line
Count
-------------------------------------
002		It 
005		It is
007		It is a 
013		It is a truth
025		It is a truth universally
...		...
...		boring computing stuff
...		...
-------------------------------------
117	It is a universally acknowledged
    that a single man in possesion of
    a good fortune, must be in want 
    of a wife. 
-------------------------------------

Do you recognise this line?

It is a universally acknowledged that a single man in possesion of a good fortune, must be in want of a wife.

Notice the line is less than the one hundred and twenty characters in length? We can use this line as it is. Let's repeat the process with the second paragraph?

['However', 'little', 'known', 'the', 'feelings', 'or', 'views', 'of', 'such', 'a', 'man', 'may', 'be', 'on', 'his', 'first', 'entering', 'a', 'neighbourhood,', 'this', 'truth', 'is', 'so', 'well', 'fixed', 'in', 'the', 'minds', 'of', 'the', 'surrounding', 'families,', 'that', 'he', 'is', 'considered', 'as', 'the', 'rightful', 'property', 'of', 'some', 'one', 'or', 'other', 'of', 'their', 'daughters.']

Look at all these boxes. Do you think this paragraph has more than one hundred and twenty characters in it? The character count for the second paragraph is two hundred and thirteen. [8] When I wrote the software for @pureausten, the maximum length of a line was one hundred and twenty characters, so if I naively break the paragraph at one hundred and twenty characters characters I get:

❝ However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the

I can do better. What if I look at the punctuation as well? If I parse the punctuation I will be able to clip the sentence at a punctution mark, thus resulting in greater readability and a character count less than one hundred and twenty.

❝ However little known the feelings or views of such a man may be on his first entering a neighbourhood,

This is a simplistic example of the kinds of problems you face trying to parse large passages of text into readable tweets. Now you might say, "Why can't we just have 240 characters?" Technically you can. In my experience you get longer, harder to read passages of text and still run into problems. [9]

Launch

❝ How did I come up with the handle?

I've been a twitter user for a long time, [10] I know what makes an easy to use handle. I wanted the authors name somewhere in the handle and I also want some way to distinguish my service from other Austen focused sites.

A twitter handle like @SupercalifragilisticexpialidociousAusten is less effective than a short pithy one. I originally tried @QuintessentialAusten. Can you imagine typing that name every time you wanted to read Austen? I can. I'd curse every time I tried to spell ^Quintessential^. I settled on Pure.

@pureausten was launched on 23rd July, 2017. [11]

Authors

❝ Austen wasn't my first pick.

I didn't start using authors. My first effort involved live tweeting an ancestor going to war in WW1 in the Dardenelles. [12] Austen wasn't my first pick. When I stated looking for authors, Kipling, London and Vern came to mind first. I'm a fan of JFK [13] so I started, tweeting historic speechs of John F. Kennedy. I chose Austen on a whimsy as I started the first run of Kipling and War of the Worlds.

Now I was running four sites, one manually the others on automatic:

  • @SapperRenshaw: live tweeting an ancestor going to war in WW1.
  • A defunct Jules Vern site for Science Fiction.
  • @JFKSpoken: historic speaches of JFK.
  • @PureAusten: the novels of Jane Austen.

It was chaos, so I pruned Verne, stopped JFK and concentrated on @pureausten.

Future

❝ Free, readable and fun

I want to spend my future with @pureausten improving the quality of the text being tweeted.

Bug Fix One

What will the bug fixes do? Well maybe I'll fix the section of code that chops a line with a period. Thus making titles such as "Mr." a terminator instead of continuing "Mr. Darcy ..." as you would expect. Here is an example:

“I did not thoroughly understand what you were telling your brother,” cried Emma, “about your friend Mr.

Bug Fix Two

Here is another small but annoying example. Multiple spaces in a line of text.

the shocking rudeness of Mr. Darcy .

Can you see it? There is a space before the period. Annoying, trivial and fixable. Have you seen the "Pure Austen Technical Difficulties Test-Pattern" before?

Bug Fix Three

The Pure Austen Technical Difficulties Test-Pattern.

The Pure Austen "Technical Difficulties Test-Pattern" appears when twitter tells me there is a duplicate tweet. Most times this is reported as a HTTP 403 error. It is annoying because tweeting halts, sometimes for hours before I notice. Fixing this defect is a priority.

Questions?

If you have any questions contact me on twitter at peterrenshaw or by email me at quintessential at seldomlogical com.

Reference

[0] William Shakespeare, 'This was long thought to be the only portrait of William Shakespeare that had any claim to have been painted from life, until another possible life portrait, the Cobbe portrait, was revealed in 2009. The portrait is known as the 'Chandos portrait' after a previous owner, James Brydges, 1st Duke of Chandos. It was the first portrait to be acquired by the National Portrait Gallery in 1856. The artist may be by a painter called John Taylor who was an important member of the Painter-Stainers' Company.'

The complete works of William Shakespeare are tweeted from a twitter account @iam_shakespear and has served as an inspiration and template for @pureausten.

[1] More problems? Where do I find books in a digital form? What about copyright? How difficult is it to break down a book into legible tweets? At the time I wrote the code twitter imposed a one hundred and twenty characters was the tweet limit. This impacts the choice of computing hardware.

[2] Brev•i•ty n. Concise expression; terseness.

[Last Accessed Wednesday first January, 2020]

wordnik.com/words/brevity

[3] Dick•head n. vulgar, colloquial, pejorative, coarse A jerk; a mean or rude person.

[Last Accessed Wednesday 8th January, 2020]

wordnik.com/words/dickhead

[4] This isn't about me, it's about the subject matter.

[5] RaspberryPi. A cheap and reliable piece of RISC computer hardware, utilising the latest system on a chip (SOC) design found in modern mobile phones that are reliable, low cost and have very low power requirements. The new version four RaspberryPi machines are even more powerful, however they consume more power and may require cooling measures. That is one reason I've stuck with earlier releases.

[Last Accessed Monday 6th January 2020]

raspberrypi.org/products/raspberry-pi-4-model-b/specifications

[6] And will not be upgrading any time soon. The latest machine is higher specification but with much higher power and cooling requirements. It is also double the price. You can read more about the RaspberryPi BCM2837B0 chipset / hardware below.

[Last Accessed Monday 8th January 2020]

raspberrypi.org/blog/raspberry-pi-3-model-bplus-sale-now-35

raspberrypi.org/documentation/hardware/raspberrypi/bcm2837b0/README.md

raspberrypi.org/documentation/hardware/raspberrypi/bcm2837/README.md

[7] Table One A break down of software created to process files for @pureausten, by size, number of lines, number of characters, the function of the file and computer language used. Describes software created to process files for @pureausten, broken down by size, number of lines, number of characters, the function of the file and finally the computer language used.

  • Size: the size of the source file code in Kilobytes
  • Line: the number of lines in the source code
  • Characters: the number of characters in the source code
  • Function: description of the source code function
  • Type: computer language file extension type
  • py: Python 3.N
  • sh: Bash 3.2

Key explanation: A break down of sofware to read a file, process it, then tweet the line at a regular interval.

============================================== 
Size	Line	Char.  Function	Type
----------------------------------------------
STEP 1
Input a book
----------------------------------------------
016		039     00400	initialise	py
319		954     10638	process		py


STEP 2
Process a book into single
lines of 120 characters
----------------------------------------------
035		0091    00843	setup		py
285		0801    07455	configure	py
118		0337    03088	prepare book    py
193		0507    05683	punctuation	py
372		1245    11345   extraction line py
402		1154    12664   parse book	py   
118		0337    03078   parse book	py
012		0025    00152   parse book 	sh


STEP 3
Tweet exactly one line
----------------------------------------------
22		054     0535	verify conf	py
57    	178     1692	verify		py
43    	095     0865	tweet manually	sh


STEP 4
At a fixed time interval
tweet a single line
----------------------------------------------
21		57      351	timer		sh


TOTAL
Size Kb    Line    Characters	Type
============================================== 
2013        5874   58789		14 files 
============================================== 

[8] Where do you think one hundred and twenty characters stops at?

Char.	Word
Len
------------------------------
007		However
006		little
005		known
003		the
008		feelings
002		or
005		views
002		of
004		such
001		a
003		man
003		may
002		be
002		on
003		his
005		first
008		entering
001		a
014		neighbourhood,
004		this
005		truth
002		is
002		so
004		well
005		fixed
002		in
003		the
005		minds
002		of
003		the
======== 120 characters ========
011		surrounding
009		families,
004		that
002		he
002		is
010		considered
002		as
003		the
008		rightful
008		property
002		of
004		some
003		one
002		or
005		other
002		of
005		their
010		daughters.
------------------------------
213      TOTAL
------------------------------

[9] If you are really observant, you will notice that I have not parsed the punctuation. I do this for simplicity of discussion.

[10] From the launch date on 23rd July 2017 to 1st January 2020 the software behind @pureausten has tweeted seventy six thousand times.

There have been failures, but not many in the software operation. Failure mostly occurs with the interaction with the twitter API. The 403 error (Duplicate tweet) is mostly to blame and is something I'm thinking how to prevent.

[11] A twitter user since November 2006.

[12] While WW1 started in 1914 and finished in 1918, the AIF paper trail for Alfred continued till 1927. The effort continues. Sapper Renshaw (@sapperrenshaw) traces an ancestor, the first Australian Born Renshaw, Albert Edward RENSHAW from quiting his day job at his fathers Engineering firm in Melbourne, to his mortal wounding on the shore of ANZAC COVE on what we now know as ANZAC Day. The idea? Live tweet using historical records.

[13] There is something in the cadence of Kennedys speaches that lends itself to shredding. The sentences contain short but complex ideas, formed one line at a time. It reminds me a lot of Austen.

―~♞~―

My neek verification.

bio Another Scrappy Startup ☮ ♥ ♬ ⌨

short <seldomlogical.com/pureausten.html>

contact Peter Renshaw / 🐘 — 🎨

← HOME ↖ UP TOP ↑ BEST →