"Some are born great,
some achieve greatness,
and others have greatness thrust
upon them", quoth William Shakespeare.
Or did he?
Some people question whether Shakespeare
really wrote the works that bear his name,
or whether he even existed at all.
They speculate that Shakespeare
was a pseudonym for another writer,
or a group of writers.
Proposed candidates
for the real Shakespeare
include other famous playwrights,
politicians and even some prominent women.
Could it be true that the greatest writer
in the English language
was as fictional as his plays?
Most Shakespeare scholars
dismiss these theories
based on historical
and biographical evidence.
But there is another way to test
whether Shakespeare's famous lines
were actually written by someone else.
Linguistics, the study of language,
can tell us a great deal about the way
we speak and write
by examining syntax, grammar,
semantics and vocabulary.
And in the late 1800s,
a Polish philosopher
named Wincenty Lutosławski
formalized a method known as stylometry,
applying this knowledge to investigate
questions of literary authorship.
So how does stylometry work?
The idea is that each writer's style
has certain characteristics
that remain fairly uniform
among individual works.
Examples of characteristics include
average sentence length,
the arrangement of words,
and even the number of occurrences
of a particular word.
Let's look at use of the word thee
and visualize it as a dimension, or axis.
Each of Shakespeare's works
can be placed on that axis,
like a data point, based on the number
of occurrences of that word.
In statistics, the tightness
of these points
gives us what is known as the variance,
an expected range for our data.
But, this is only a single characteristic
in a very high-dimensional space.
With a clustering tool
called Principal Component Analysis,
we can reduce the multidimensional space
into simple principal components
that collectively measure the variance
in Shakespeare's works.
We can then test the works
of our candidates
against those principal components.
For example,
if enough works of Francis Bacon
fall within the Shakespearean variance,
that would be pretty strong evidence
that Francis Bacon and Shakespeare
are actually the same person.
What did the results show?
Well, the stylometrists who carried
this out have concluded
that Shakespeare is none other
than Shakespeare.
The Bard is the Bard.
The pretender's works just don't match up
with Shakespeare's signature style.
However, our intrepid
statisticians did find
some compelling evidence
of collaborations.
For instance, one recent study concluded
that Shakespeare worked with playwright
Christopher Marlowe on "Henry VI,"
parts one and two.
Shakespeare's identity is only one of
the many problems stylometry can resolve.
It can help us determine
when a work was written,
whether an ancient text is a forgery,
whether a student has committed plagiarism,
or if that email you just received
is of a high priority or spam.
And does the timeless poetry
of Shakespeare's lines
just boil down to numbers and statistics?
Not quite.
Stylometric analysis may reveal what makes
Shakespeare's works structurally distinct,
but it cannot capture the beauty of
the sentiments and emotions they express,
or why they affect us the way they do.
At least, not yet.