PUBLISHER'S NOTE: It has been called a "parlour game, a whodunit, and would perhaps make an interesting board game - or even a reality TV show. (Any suggestions for moderator?) The mystery: Who is the author of the bombshell New York Times anonymous op-ed page? As a former editorial Board member of the Toronto Star - and op-ed writer - I am fascinated by this story - and all of the pseudo-linguistic analysis clogging the media. (Who can believe the sheer power of a 750-word piece?) The Toronto Star hits the subject head on in an Associated Press article published on September 7, 2018. I was caught by the claim of one of the interviewees that “The science is very good..."It’s not quite DNA. It’s actually considered by some scientists to be the second-most accurate form of forensic identification we have, because it is so good.”...". So I did some basic research and came up with a very thorough analysis in the New Yorker (July 27, 2012) quoting "the pioneer of forensic linguistics" as saying: "I won’t claim that we have anything remotely like DNA in this work...but we are a whole lot better than a lot of the crazy schemes that cops are being taught.” Oh, oh! (That's not so comforting!) Apparently Betting shops are now in the 'who's the op-ed author' game. I'm not into placing any bets on the basis of what has all the smell of a not-quite ripened 'science.' (Besides, I'm not really a betting man!) Beware!
Harold Levy. Publisher; The Charles Smith Blog.
-----------------------------------------------------------
PASSAGE OF THE DAY: (Toronto Star story): "Experts use a combination of language use, statistics and computer science to help figure out who wrote documents that are anonymous or possibly plagiarized. They’ve even solved crimes and historical mysteries that way. Some call the field forensic linguistics; others call it stylometry or simply doing “author attribution.”The field is suddenly at centre stage after an unidentified “senior administration official” wrote in the Times that he or she was part of a “resistance” movement working from within the administration to curb Trump’s most dangerous impulses. “My phone has been ringing off the hook with requests to do that analysis and I just don’t have the time,” says Duquesne University computer and language scientist Patrick Juola."
-------------------------------------------------------
STORY: "Words on trial: Can professional word sleuths unveil mystery author of White House opinion piece?," by AP reporter Seth Borenstein, published by The Toronto Star on September 6, 2018.
PHOTO CAPTION: "One political scientist estimates there are about 50 people in the Trump administration who could have written an anonymous opinion piece published in the New York Times this week."
PHOTO CAPTION: "One political scientist estimates there are about 50 people in the Trump administration who could have written an anonymous opinion piece published in the New York Times this week."
GIST: "Language detectives say the key clues to who wrote an anonymous New York Times opinion piece
slamming President Donald Trump may not be the odd and glimmering
“lodestar,” but the itty-bitty words that people usually read right
over: “I,” “of” and “but.” And lodestar? That could be a red herring meant to throw sleuths off track, some experts say. Experts
use a combination of language use, statistics and computer science to
help figure out who wrote documents that are anonymous or possibly
plagiarized. They’ve even solved crimes and historical mysteries that
way. Some call the field forensic linguistics; others call it stylometry
or simply doing “author attribution.”The
field is suddenly at centre stage after an unidentified “senior
administration official” wrote in the Times that he or she was part of a
“resistance” movement working from within the administration to curb
Trump’s most dangerous impulses. “My phone has been ringing off
the hook with requests to do that analysis and I just don’t have the
time,” says Duquesne University computer and language scientist Patrick
Juola. Robert Leonard, a Hofstra University linguistics professor
who has helped solve murders by examining language, says if experts
could get the right number of writing samples from officials whose
identities are known, “an analysis could certainly be done.”One
political scientist figures there are about 50 people in the Trump
administration who fit the Times’ description as a senior administration
official and could be the author. The key would be to look at how they
write, the words they use, what words they put next to each other,
spelling, punctuation and even tenses, experts say. “Language is a
set of choices. What to say, how to say and when to say it,” Juola
said. “And there’s a lot of different options.” One
of the favourite techniques of Juola and other experts is to look at
what are called “function words.” These are words people use all the
time but that are hard to define. Some examples are “of,” “with,” “the,”
“a,” “over” and “and.” “We all use them but we don’t use them in
the same way,” Juola says. “We don’t use them in the same frequency.”
Same goes with apostrophes and other punctuation. For example, do
you say “different from” or “different than?” asks computer science and
data expert Shlomo Argamon of the Illinois Institute of Technology. Women tend to use first- and second-person pronouns more — “I,” “me” and “you” — and more present tense verbs, Argamon said. Men use “the,” “of,” “this” and “that” more often, he said. “You
look for clues and you try to assess the usefulness of those clues,”
Argamon said. But he is less optimistic that the Trump opinion piece
case will be cracked for various reasons, including the New York Times’
editing for style and possible efforts to fool language detectives with
words that someone else likes to use, such as “lodestar.” Mostly, he’s
pessimistic because to do a proper comparison, samples from all suspects
have to be gathered and have to be similar, such as all opinion columns
as opposed to novels, speeches or magazine stories. Rachel
Greenstadt at Drexel University studies when people try to throw off
investigators with words they don’t normally use or purposeful
misspellings. She said her first instinct is that the word “lodestar” —
one Vice-President Mike Pence has used several times — is “a red
herring.” It seems too deliberate. “Most people are still looking
for sound-bite-sized features like lodestar instead of trying to get a
handle on the whole picture,” says Hofstra’s Leonard. Greenstadt
says language analysis “could kind of contribute to the picture” of who
wrote the Times opinion piece, but she adds that “by itself, I’d be
concerned to use it.” Still, with the right conditions, words matter. Juola
testified in about 15 trials and handled even more cases that never
made it to court. His biggest case was in 2013, when a British newspaper
got a tip that the book The Cuckoo’s Calling by Robert Galbraith was really written by Harry Potter author J.K. Rowling. In about an hour, Juola fed two Rowling books, The Cuckoo’s Calling
and six other novels into his computer, analyzed the language patterns
with four different systems and concluded that Rowling did it. A couple of days later, Rowling confessed. It
was far from the first time that language use fingered the real
culprit. The Unabomber’s brother identified him because of his
distinctive writing style. Field pioneers helped find a kidnapper who
used the unique term “devil strip” for the grassy area between the
sidewalk and road. The phrase is only used in parts of Ohio. Even in politics, words are poker tells. In 1996, the novel Primary Colors
about a Clintonesque presidential candidate set Washington abuzz trying
to figure out who was the anonymous author. An analysis by a Vassar
professor and other work pointed to Newsweek’s Joe Klein and he finally
admitted it. But the literary sleuthing goes back to the founding
of the republic. Historians had a hard time figuring out which specific
Federalist Papers were written by Alexander Hamilton and which were by
James Madison. A 1963 statistical analysis figured it out: One of the
many clues came down to usage of the words “while” and “whilst.” Madison
used “whilst”; Hamilton preferred “while.” Juola says experts in
the field can generally tell introverts from extroverts, men from women,
education level, age, location, almost everything but astrological
sign. “The science is very good,” Juola said. “It’s not quite DNA.
It’s actually considered by some scientists to be the second-most
accurate form of forensic identification we have, because it is so
good.”
https://www.thestar.com/news/world/2018/09/06/can-professional-word-sleuths-unveil-author-of-mystery-white-house-opinion-piece.html
https://www.thestar.com/news/world/2018/09/06/can-professional-word-sleuths-unveil-author-of-mystery-white-house-opinion-piece.html
------------------------------------------------------------------
PASSAGE OF THE DAY: New Yorker article: "Butters said, “Forensic linguistics has not come to a place where we are mature enough to answer a lot of these questions.” Carole Chaski, the executive director of the Institute for Linguistic Evidence and the president of Alias Technology, in Georgetown, Delaware, which markets linguistic software, agrees. Chaski has been working to perfect a computer algorithm that identifies patterns hidden in syntax. With enough linguistic material to work with, she says, she can run the program and draw accurate linguistic conclusions. Her goal is to develop a standard “validated tool” that police, civil investigators, and linguists can turn to when testifying in crucial cases, such as a capital murder trial. “If this is real, these tools should be so reliable that I can automate them and somebody can use them,” she says. Chaski foresees a time when forensic-linguistic “technicians” will do what DNA technicians in crime labs do: “They learn how to run a piece of software or run a Southern blot”—a standard DNA test—“through electrophoresis and then go, ‘Here are my results.’ ”In Chaski’s view, a trail of words can be parsed to reveal its author, but that work is best done quantitatively, through brute computational force, not qualitatively, by subjective scholars. Forensic linguistics, she believes, should not be limited to a few highly credentialled experts who have been approved by the courts to testify. She warned me of the recklessness of an “academic” and an “ex-cop” hanging out a shingle, and said their methodology was “fraught with error.” In the small world of forensic linguistics, it was obvious that she meant Leonard and Fitzgerald."
ARTICLE: "Words on Trial Can linguists solve crimes that stump the police?," by Jack Hitt, published by The New Yorker on July 23, 2012.
ILLUSTRATION CAPTION: "
ILLUSTRATION CAPTION: "
GIST: "The pioneer of
forensic linguistics is widely considered to be Roger Shuy, a retired
Georgetown University professor and the author of such fundamental
textbooks as “Language Crimes: The Use and Abuse of Language Evidence in
the Courtroom.” Shuy is now eighty-one years old and lives in Montana.
When I asked him to describe the origins of forensic linguistics, he
referred me to an Old Testament story. After a confusing battle with the
Ephraimites, the Gileadites were able to identify the enemy by asking
them each to pronounce the Hebrew word “shibboleth.” If they pronounced
the first syllable in the Ephraimic dialect, “sib,” instead of in the
Gilead dialect, “shib,” they were killed. According to Judges 12:6, some
forty-two thousand Ephraimites failed that first linguistic test. The
field’s more recent origins might be traced to an airplane flight in
1979, when Shuy found himself sitting next to a lawyer. By the end of
the flight, Shuy had a recommendation as an expert witness in his first
murder case. Since then, he’s been involved in numerous cases in which
forensic analysis revealed how meaning had been distorted by the process
of writing or recording. In a bribery trial in the nineteen-eighties,
two Nevada brothel commissioners were caught on tape in a crucial
exchange. When they were offered a bribe, one turned to the other and,
according to the police transcript, said, “I would take a bribe,
wouldn’t you?” Shuy analyzed the tape and, on the stand, testified that
the defendant had actually said the opposite: “I wouldn’t take a bribe,
would you?” The tape was scratchy. Moreover, in conversational speech,
the “n’t” of a contraction is barely vocalized. It was hard to hear—or,
rather, easy to hear what the listener was primed to hear. But two facts
were indisputable, Shuy noted: both versions of the sentence had
exactly eight syllables, and the pause fell just before the last two
syllables. Thus, Shuy testified, only one reading of the sentence made
sense: “I wouldn’t take a bribe, would you?” The trial resulted in a
hung jury. Shuy has become famous in his discipline for some of
the field’s finest Holmesian aperçus. Early in his career, the police in
Illinois approached him regarding a notorious kidnapping case; they had
several suspects, and they hoped his reading of the ransom notes might
help narrow down the list of suspects. In each note, the kidnapper
demanded money in a semiliterate rant: “No kops! Come alone!!,” followed
by a terse instruction—“Put it in the green trash kan on the devil
strip at the corner 18th and Carlson.” Shuy studied the letters and then
asked, “Is one of your suspects an educated man born in Akron, Ohio?”
The cops were stunned. There was one who matched that description
perfectly, and when confronted he confessed. As Shuy subsequently
explained, “kop” and “kan” most likely were intentional misspellings by
someone posing as illiterate. And he knew from his research that the
patch of grass between the sidewalk and the street—sometimes known as
the “tree belt,” “tree lawn,” or “sidewalk buffer”—is called the
“devil’s strip” only in Akron, Ohio. In recent years, following
Shuy’s lead, a growing number of linguists have applied their techniques
in criminal cases, such as Chris Coleman’s, and even in major
commercial lawsuits. An upcoming suit
between Apple and Microsoft, slated to go before the Trademark Trial and
Appeal Board, features two stars of the field, Rob Leonard and Ronald
Butters, a retired Duke University linguist. At issue is: What part of
speech is the phrase “app store”? Leonard, siding with Apple, contends
that it is a proper noun, which is to say a trademarked expression that
should be capitalized. Butters’s work upholds Microsoft’s view: the term
consists of two common nouns and is not proprietary at all. Butters
is a past president of the International Association of Forensic
Linguists, which has some two hundred and fifty members. Most of them,
he said, are in the United States, England, and Spain, but interest has
spread to Australia, Japan, and China. Today, one can study forensic
linguistics at several schools, and last year Leonard inaugurated the
first graduate program in forensic linguistics, at Hofstra. For those
earning a master’s degree, the field offers job prospects outside the
courtroom. Immigration and Customs Enforcement hires language detectives
to assist agents in evaluating asylum seekers. In such cases, forensic
linguists interview applicants to verify that their accents and their
use of idiom and slang match those of the country they claim to have
fled. Increasingly in the courtroom, however, forensic linguists
have been asked to weigh in on matters of “author identification”—not to
determine the grammatical significance of certain words but to identify
who said or wrote them. This trend has widened an old schism in the
field. Given the stakes in, say, the Coleman case—a felony murder
potentially involving the death sentence—some linguists hold the view
that Leonard is taking forensic linguists into groundbreaking territory.
Others, including Butters, wonder if he isn’t leading them over a
cliff. When
I visited Leonard one afternoon at Hofstra, he was reviewing a range of
cases: another murder involving the killer’s letters; a libel suit that
turned on a single, ambiguous sound; an attempt to identify a potential
assassin of a prominent politician; and a Whirlpool Corporation lawsuit
involving the meaning of the word “steam.” In a modest office walled
with books, I found Leonard working at a laptop. He was noticeably
kempt, in pressed slacks and a crisp blue button-down shirt—a Sam Spade
of semantics. His hair was surprisingly dark for a man in his sixties;
his eyes were playful and his smile fetching, a little bit show biz. Long
before he emerged as one of the foremost language detectives in the
country, Leonard had achieved a different kind of celebrity. As an
undergraduate at Columbia in the nineteen-sixties, he and his brother
George revolutionized the school’s a-cappella group by having everyone
dress as faux Brooklyn thugs (white T-shirts, greased-back hair) and
sing up-tempo arrangements of such nineteen-fifties doo-wop classics as
“Duke of Earl” and “At the Hop.” They named the group Sha Na Na and
became wildly popular. One of their hits was “Teen Angel,” which Leonard
sang at Woodstock just before Jimi Hendrix, who had invited Sha Na Na,
débuted his version of “The Star-Spangled Banner.” By 1970,
Leonard the heartthrob had to choose between academia and show business.
“All of our good friends were dying of drug overdoses,” he said. “I
just decided to move on.” Leonard finished his undergraduate studies at
Columbia; William Labov, a prominent linguist who had introduced him to
the field, helped him earn a fellowship. Leonard pursued a scholarly
career until 2000, when he heard Shuy give a lecture urging linguists to
apply their training in the real world—especially in the courtroom, as
language detectives. Leonard struck up a professional friendship with
Shuy and has been consulting on cases ever since. As we sat in his
office, Leonard described his recent involvement in the tabloid saga of
Natalee Holloway. In 2005, after graduating from high school in
Alabama, Holloway went with her friends on a chaperoned trip to Aruba
and disappeared. The case remains unsolved. The chief suspect is a young
Dutchman named Joran van der Sloot, who pleaded guilty in 2012 to
charges of murdering a twenty-one-year-old woman in Peru. In Aruba, two
young brothers, Deepak and Satish Kalpoe, were initially arrested (they
and van der Sloot had partied with Holloway the night before she
disappeared), but were released in the first weeks of the investigation.
After being the subjects of a television exposé, the brothers are suing
Dr. Phil McGraw and CBS for defamation. The Kalpoe legal team has hired
Leonard as their expert witness in a lawsuit that could turn on the
pronunciation of a single syllable. The “Dr. Phil” show promoted
the exposé by claiming, “You are going to find out what he”—Deepak—“says
he did with Natalee the night she disappeared.” An announcer adds,
“What he said brought Natalee’s mother to tears.” On the show, viewers
listen to the audio of Deepak being secretly videotaped by a private
investigator named Jamie Skeeters and making an astonishing confession:
SKEETERS: I’m sure she had sex with all of you.Leonard examined the uncut version of the exchange. In it, Kalpoe denies having sex with Holloway. “Simple” refers to the fact that, from his point of view, the evening was uneventful:
KALPOE: She did. You’d be surprised how simple it was.
SKEETERS: I’m sure she had sex with all of you, and . . . good . . .Watching an unedited piece of footage doesn’t require a linguistics expert, but Leonard realized that there were other issues at play. During the covert interview, the microphone generated a great deal of confusing ambient sound. Moreover, the hidden camera captured only the top of Kalpoe’s head, so his face and lips weren’t visible. Amid the muffled noises, and before Kalpoe speaks, there is an odd sound—sha!—which Kalpoe appears to make just before “No.” When I met with Leonard, he had been concentrating on this sound. An expert hired by the opposing counsel was taking the position that the sha! might not be a throat-clearing or some other stray sound, as Leonard contends, but a “voiceless vowel with ‘r’ coloration.” Leonard explained: “Vowels are the most open of sounds, and when you come off a vowel and cease saying it you switch the vocal apparatus to pronounce the next sound.” In some words, like “forth,” the “r” gets full phonetic treatment, but in many words, like “bird” and “sure,” the “r” isn’t fully voiced and instead becomes a shadow of the vowel, just because it’s easier to say that way. If the lawyers for “Dr. Phil” can show that the first word Kalpoe spoke in that sentence was “sure,” and that there is no audible “n’t” at the end of “did,” then the transcript of Kalpoe’s first utterance changes from “No, she didn’t” to its opposite, “Sure, no, she did.” Like all linguists, Leonard starts from the position that meaning is delicately contingent, and that the most common way we compensate for this frailty is “redundancy.” We say the same thing more than once, or in more than one way. In his written report to the court on this case, Leonard notes that the original video of the meeting between Deepak Kalpoe and Skeeters shows Deepak “shaking his head ‘no’ from side to side,” as if to deny the accusations. The program, though, aired only a still photo of Deepak. The case has yet to go to trial, but when it does, Leonard says, he will argue that there is enough redundancy in the semiotic detritus of these sounds to conclude that Kalpoe’s meaning is clear: he is stating that he did not have sex with Holloway that night. It may be that the changes made to the edited interview were deliberately damaging, but forensic linguists offer another possibility: that a subtle presumption of guilt unconsciously overwhelmed the editing process and inverted the meaning of the exchange. Such inversions, linguists say, happen far more often than we might like to believe. According to Leonard, words serve as catalysts, setting off sparks of potential meaning that the listener organizes into more specific meaning by observing facial expressions, body language, and other redundant cues. We then employ another powerful tool: prior experience and the storehouse of narratives that each of us carries—what linguists call “schema.” To every exchange we bring unconscious scripts; as any given sentence unspools, we readjust the schema to make better sense of what we are hearing. One afternoon at Hofstra, Leonard explained to the twenty students in his introductory course how this works. He wrote a sentence on the board: “John was on his way to school last Friday and was really worried about the math lesson.” He quizzed the students on what they might presume about this story. John is a student, one called out; he is either on a bus or walking. “So we can just close our eyes and imagine John the schoolboy on the bus,” Leonard said. “But are we all imagining John with the same height, the same hair color?” Nothing in the sentence signals any of that information, yet each of us supplies our own variant, which awaits further verbal data for confirmation. Leonard wrote another sentence beneath the first: “Last week, he had been unable to control the class.” Who is John now? “A teacher!” someone shouted. And how is John getting to school? “A car!” Leonard wrote a third sentence: “It was not fair for the math teacher to leave him in charge.” Instantly, the students revelled in John’s new identity as a janitor or a substitute teacher. Meaning, Leonard noted, is constantly bent by expectation, and can be grossly distorted. Indeed, one of Shuy’s first studies, of the Abscam trials of the nineteen-eighties, reveals just how easily the meaning of linguistic evidence can be twisted by a background assumption of guilt. Abscam was an F.B.I. sting operation in which nine United States congressmen were lured to meetings with a government agent posing as an Arab oil sheikh with “Abdul Enterprises.” The initial meeting was described as a legitimate business deal. At one point, though, the agent playing the sheikh would offer the congressmen an outright bribe. Their conversations were videotaped, and some of the evidence was breathtakingly unambiguous. Representative John Jenrette, of South Carolina, accepted the money cheerfully and chirped on tape, “I’ve got larceny in my blood!”
KALPOE: No, she didn’t.
SKEETERS: O.K., well, I mean, good. If she did, fine.
KALPOE: You’d be surprised how simple it was that night. [cartoon id="a16677"]
The sting resulted in seven indictments.
Toward the end came the trial of Senator Harrison (Pete) Williams, of
New Jersey. Shuy listened to those tapes and became convinced that the
Senator was innocent. Whenever the sheikh raised the issue of bribery
or illegality, Williams steered the conversation to legal ground. At one
point, the sheikh put the bribe directly to Williams: “I would like to
give you . . . some money for, for permanent residence.” The first four
words of Williams’s reply were “No, no, no, no.” A prosecution
memo at the time stated that there was no case against Williams, but the
judge, who, in his ruling, decried “the cynicism and hypocrisy of
corrupt public officials,” set it aside; Williams was found guilty and
sentenced to three years in prison. Shuy later noted that, with such
attitudes prevalent, the schema of the “corrupt congressman” overwhelmed
even the plainest facts pointing to Williams’s innocence. After the
trial, the lead juror confessed that had he known all the facts he would
not have found Williams guilty. The Senator was forced to resign his
seat, though he declared his innocence at every opportunity. He was the
first senator in eighty years to go to prison; President Bill Clinton
refused to pardon him. Shortly
after the Unabomber case was cracked, in 1996, forensic linguistics
gained another public boost. Donald Foster, an English professor at
Vassar, employed the most basic forensic technique—tallying word
frequency—to unmask the anonymous author of “Primary Colors,” the
best-selling novel about Clinton’s first Presidential campaign: Joe
Klein. Foster analyzed dozens of pages of writing from several suspects,
including Time’s Walter Shapiro and the former Deputy
Treasury Secretary Roger Altman. He compiled a concordance that showed
how frequently each writer used certain words, compared this information
with a database of word frequency in the novel, and was able to
identify the author.
For a short time, the potential of forensic
linguistics seemed limitless. With enough raw data and computing power, a
trail of words might betray its author as reliably as a set of
fingerprints identifies an individual. Foster, though, was a professor
of literature, not a linguist; he was not trained to use the forensic
methods that Shuy had mastered, such as listening for unconscious
semantic patterns and looking for distinctive phrases or unusual
colloquialisms. Overconfident, Foster went on to identify a suspect in
the JonBenét Ramsey murder case, only to learn that he had already been
cleared by the police. In the days after September 11, 2001, Foster
falsely implicated the bioweapons expert Steven Hatfill as the person
who had sent several anthrax-laden letters around the country; the
accusation wrecked Hatfill’s career and resulted in a settled lawsuit.
Foster then recanted a previous claim linking a 1612 poem of dubious
provenance to Shakespeare; another academic had shown that the analysis
was fatally flawed. Foster has since retreated to his campus in
Poughkeepsie. Foster’s disgrace left most forensic linguists
feeling cautious. Now that Leonard’s work is bringing the field back
into the realm of author identification, some are worried. Ronald
Butters, the Duke linguist, provided expert testimony for the defense at
the Coleman trial and challenged every aspect of Leonard’s testimony as
“linguistically meaningless.” Butters argued that even though certain
linguistic oddities, such as using “U” for “you” or consistently
misplacing the apostrophe in contractions, seemed distinctive, there
weren’t enough examples to be statistically significant. Moreover,
Butters told me, it can be tricky to compare different genres of even a
single person’s writing. Reading, say, a routine office e-mail alongside
rants spray-painted on a wall makes about as much sense as comparing
the prose in one of Wallace Stevens’s insurance riders with the cadences
of his poem “Sunday Morning.” “Really bad linguistic testimony is
when you go to court and say you’re pretty sure that this person wrote
that, and yet you’re comparing apples and oranges,” Butters said.
Leonard argues that he never claims to name a specific author but simply
presents comparative evidence for the jury. Butters, Leonard said, “is a
specialist in trademark cases, so I’m not sure what his experience was
in authorship cases, and they are two quite different applications of
linguistics.” On the stand, Butters admitted that he hadn’t read all
Leonard’s research on the evidence; his challenge was focussed on
Leonard’s methodology and its purported usefulness in the identification
of individual authors. Butters said, “Forensic linguistics has not come
to a place where we are mature enough to answer a lot of these
questions.” Carole Chaski, the executive director of the Institute
for Linguistic Evidence and the president of Alias Technology, in
Georgetown, Delaware, which markets linguistic software, agrees. Chaski
has been working to perfect a computer algorithm that identifies
patterns hidden in syntax. With enough linguistic material to work with,
she says, she can run the program and draw accurate linguistic
conclusions. Her goal is to develop a standard “validated tool” that
police, civil investigators, and linguists can turn to when testifying
in crucial cases, such as a capital murder trial. “If this is real,
these tools should be so reliable that I can automate them and somebody
can use them,” she says. Chaski foresees a time when forensic-linguistic
“technicians” will do what DNA technicians in crime labs do: “They
learn how to run a piece of software or run a Southern blot”—a standard
DNA test—“through electrophoresis and then go, ‘Here are my results.’ ”
In Chaski’s view, a trail of words can be parsed to reveal its author, but that work is best done quantitatively, through brute computational force, not qualitatively, by subjective scholars. Forensic linguistics, she believes, should not be limited to a few highly credentialled experts who have been approved by the courts to testify. She warned me of the recklessness of an “academic” and an “ex-cop” hanging out a shingle, and said their methodology was “fraught with error.” In the small world of forensic linguistics, it was obvious that she meant Leonard and Fitzgerald. Leonard said that Chaski’s computerized approach made him “want to take a nap.” His methods and findings are all transparent, he noted, whereas her algorithm is a proprietary “black box.” He does not believe that computer software can eliminate the need for human interpretation. “Even those algorithms have to be coded by humans,” he said; any good linguist will depend on both quantitative and qualitative analysis. “One thing we have learned about language is that it is a very human form of communication. You have to have human intelligence, human powers of inference, and human encyclopedic knowledge of the world” to make sense of it. At the end of the day, the scientific findings depend on human interpretation, Leonard said. Computers can crunch reams of words, but only people can decide what the words mean. Shuy told me that he, too, initially had doubts about author identification. “That is how I felt until Rob Leonard started working,” he said. “Rob has come up with this competing-hypothesis approach.” In the same way that DNA technicians will report only the statistical likelihood that the killer’s DNA and the DNA found on the murder weapon are the same, Leonard creates a number of opposing hypotheses and presents the evidence in light of them. In the Coleman trial, Leonard did not declare that Coleman was the author of the red graffiti and the threatening e-mails; rather, he testified that the language in them “is consistent with” the language in Coleman’s writings. “I don’t know any forensic linguists who will claim that they can find the answer for you,” Shuy said. “Our role is to analyze the data and give it to the triers of the facts, who have to evaluate it or issue the ultimate decision of innocence or guilt. We don’t go that far, and shouldn’t.” Shuy also noted that it was Leonard who popularized a safeguard against comparing unrelated documents, called a Community of Practice filter. For instance, Coleman’s use of “U” for “you” would be of no use in a pool of text messages, but as an unusual abbreviation in an e-mail it becomes another point of data. Recently, Leonard used this technique to question a charge, levelled at a jailed gang member, of murdering a prison guard. Prosecutors had linked the prisoner, Jarvis Masters, to a note that ultimately led to the guard’s murder, based on misspellings such as “has’nt” and “is’nt” and the use of “no” for “know.” But in his research Leonard learned that the way Masters’s gang, the Black Guerrilla Family, disciplined its members was to make them copy propaganda by hand. All the gang members had picked up the oddities pinned on Masters, Leonard determined. “Thus, when we examine the corpus of non-murder documents written by other B.G.F. members,” Leonard said, “we discover the features that may at first seem to educated writers like the prosecution to be randomly incorrect, highly idiosyncratic features were not random at all but systemic features of the B.G.F. community.” On some level, extracting meaning from linguistic evidence is what we all do intuitively every day. Forensic professionals go about the same work, with better tools and a heightened sense of how easily meaning can be misconstrued. As one forensic-linguistics firm, Testipro, puts it in its online promotional pitch, the field is “the basis of the entire legal system. Both Judges and Juries are using informal or unconscious FL”—forensic linguistics—“every time they weigh a witness statement or testimony document.” The field is bound to thrive on the ever-growing piles of what Shuy calls “data.” Our embrace of personal media—e-mails, text messages, voice mail, tweets—has created an avalanche of tossed-off language, an evidentiary trail that linguists are getting better and better at following. Shuy believes that forensic linguistics can do for language crimes, such as bribery, blackmail, and extortion, what DNA has done for violent crimes. It could offer a counterweight to the many old-school methods, like lineups and unrecorded police interrogations, that are heavily relied upon despite serious flaws. “I won’t claim that we have anything remotely like DNA in this work,” Shuy said, “but we are a whole lot better than a lot of the crazy schemes that cops are being taught.” Leonard offered a sobering statistic: eighty per cent of people who were later exonerated by DNA evidence had falsely confessed to their alleged crimes. “When I got into this business, I figured if there was an eyewitness or a confession, then case closed, the guy absolutely, one hundred per cent did it. But those are the two shakiest types of evidence, really.” He recalled many cases where a confession on paper turned out to be no confession at all. “The way humans perceive language is according to schemas, which lead to misperceptions as much as perceptions.” In a sense, investigators who try to extract evidence from confessions are acting as linguists, too, albeit poorly trained ones. A few weeks ago, Leonard finished testifying in the retrial of Brian Hummert, a Pennsylvania man charged with strangling his wife. After initial suspicions pointed to Hummert, the police received handwritten letters claiming that a serial killer, not Hummert, had committed the murder. Once again, the linguistic evidence was important to the case. The notes bore a resemblance to a series of stalker letters that preceded the killing and to the defendant’s writing. As an expert witness, Leonard testified about Hummert’s prose style, noting the rare use of what he calls “ironic repetition” in constructions such as “She tried to break it off, so I broke her neck.” And all the letters contained a linguistic habit that, Leonard testified, he had found nowhere else: a tendency to use contractions in negative statements (“I can’t”) but not in positive ones (“I am”). The jury was out for forty-five minutes and returned a verdict of guilty."
In Chaski’s view, a trail of words can be parsed to reveal its author, but that work is best done quantitatively, through brute computational force, not qualitatively, by subjective scholars. Forensic linguistics, she believes, should not be limited to a few highly credentialled experts who have been approved by the courts to testify. She warned me of the recklessness of an “academic” and an “ex-cop” hanging out a shingle, and said their methodology was “fraught with error.” In the small world of forensic linguistics, it was obvious that she meant Leonard and Fitzgerald. Leonard said that Chaski’s computerized approach made him “want to take a nap.” His methods and findings are all transparent, he noted, whereas her algorithm is a proprietary “black box.” He does not believe that computer software can eliminate the need for human interpretation. “Even those algorithms have to be coded by humans,” he said; any good linguist will depend on both quantitative and qualitative analysis. “One thing we have learned about language is that it is a very human form of communication. You have to have human intelligence, human powers of inference, and human encyclopedic knowledge of the world” to make sense of it. At the end of the day, the scientific findings depend on human interpretation, Leonard said. Computers can crunch reams of words, but only people can decide what the words mean. Shuy told me that he, too, initially had doubts about author identification. “That is how I felt until Rob Leonard started working,” he said. “Rob has come up with this competing-hypothesis approach.” In the same way that DNA technicians will report only the statistical likelihood that the killer’s DNA and the DNA found on the murder weapon are the same, Leonard creates a number of opposing hypotheses and presents the evidence in light of them. In the Coleman trial, Leonard did not declare that Coleman was the author of the red graffiti and the threatening e-mails; rather, he testified that the language in them “is consistent with” the language in Coleman’s writings. “I don’t know any forensic linguists who will claim that they can find the answer for you,” Shuy said. “Our role is to analyze the data and give it to the triers of the facts, who have to evaluate it or issue the ultimate decision of innocence or guilt. We don’t go that far, and shouldn’t.” Shuy also noted that it was Leonard who popularized a safeguard against comparing unrelated documents, called a Community of Practice filter. For instance, Coleman’s use of “U” for “you” would be of no use in a pool of text messages, but as an unusual abbreviation in an e-mail it becomes another point of data. Recently, Leonard used this technique to question a charge, levelled at a jailed gang member, of murdering a prison guard. Prosecutors had linked the prisoner, Jarvis Masters, to a note that ultimately led to the guard’s murder, based on misspellings such as “has’nt” and “is’nt” and the use of “no” for “know.” But in his research Leonard learned that the way Masters’s gang, the Black Guerrilla Family, disciplined its members was to make them copy propaganda by hand. All the gang members had picked up the oddities pinned on Masters, Leonard determined. “Thus, when we examine the corpus of non-murder documents written by other B.G.F. members,” Leonard said, “we discover the features that may at first seem to educated writers like the prosecution to be randomly incorrect, highly idiosyncratic features were not random at all but systemic features of the B.G.F. community.” On some level, extracting meaning from linguistic evidence is what we all do intuitively every day. Forensic professionals go about the same work, with better tools and a heightened sense of how easily meaning can be misconstrued. As one forensic-linguistics firm, Testipro, puts it in its online promotional pitch, the field is “the basis of the entire legal system. Both Judges and Juries are using informal or unconscious FL”—forensic linguistics—“every time they weigh a witness statement or testimony document.” The field is bound to thrive on the ever-growing piles of what Shuy calls “data.” Our embrace of personal media—e-mails, text messages, voice mail, tweets—has created an avalanche of tossed-off language, an evidentiary trail that linguists are getting better and better at following. Shuy believes that forensic linguistics can do for language crimes, such as bribery, blackmail, and extortion, what DNA has done for violent crimes. It could offer a counterweight to the many old-school methods, like lineups and unrecorded police interrogations, that are heavily relied upon despite serious flaws. “I won’t claim that we have anything remotely like DNA in this work,” Shuy said, “but we are a whole lot better than a lot of the crazy schemes that cops are being taught.” Leonard offered a sobering statistic: eighty per cent of people who were later exonerated by DNA evidence had falsely confessed to their alleged crimes. “When I got into this business, I figured if there was an eyewitness or a confession, then case closed, the guy absolutely, one hundred per cent did it. But those are the two shakiest types of evidence, really.” He recalled many cases where a confession on paper turned out to be no confession at all. “The way humans perceive language is according to schemas, which lead to misperceptions as much as perceptions.” In a sense, investigators who try to extract evidence from confessions are acting as linguists, too, albeit poorly trained ones. A few weeks ago, Leonard finished testifying in the retrial of Brian Hummert, a Pennsylvania man charged with strangling his wife. After initial suspicions pointed to Hummert, the police received handwritten letters claiming that a serial killer, not Hummert, had committed the murder. Once again, the linguistic evidence was important to the case. The notes bore a resemblance to a series of stalker letters that preceded the killing and to the defendant’s writing. As an expert witness, Leonard testified about Hummert’s prose style, noting the rare use of what he calls “ironic repetition” in constructions such as “She tried to break it off, so I broke her neck.” And all the letters contained a linguistic habit that, Leonard testified, he had found nowhere else: a tendency to use contractions in negative statements (“I can’t”) but not in positive ones (“I am”). The jury was out for forty-five minutes and returned a verdict of guilty."
The entire article can be read at:
https://www.newyorker.com/magazine/2012/07/23/words-on-trial
------------------------------------------------------------