Text Mining and Statistical Analysis of Crash Reports can Power Predictive Modeling of Distracted Driving Accidents
PHILADELPHIA, PA, March 20, 2012 -- Distracted driving has grabbed headlines recently as a consensus emerges around the risk posed by drivers using cell phones. A presentation at the Casualty Actuarial Society’s Ratemaking and Product Management Seminar held March 19-21 showed how new analytic methods are emerging that can provide statistical insight into distracted driving.
The presentation re-established a familiar fact: cell phone users are more likely to be in multicar accidents and in rear-end collisions than are non-users. But the approach that led to the conclusion, advanced by Philip S. Borba, a principal and senior consultant at the actuarial consulting firm of Milliman, differed from the usual methods.
Borba utilized a federal database of written crash reports to look for word patterns that indicated cell phone use. Then he measured how frequently cell phone use was linked to auto accidents. And while his conclusion is not necessarily ground-breaking, the technique can be.
It helps show how written reports, like claims adjusters’ narratives, can be used to hunt for quantitative information. The information can then be used to feed predictive models, the fast-growing area in which complex sets of variables are analyzed to inform underwriting and pricing business.
The point of Borba’s study: Modern methods let analysts unlock numerical data where little or none has been thought to exist. Then actuaries and other analysts have new sets of numbers to crunch, so they can continue to add value to their employers.
So instead of turning numbers into words - the heart of report writing - Borba’s study showed how to turn words into numbers.
Borba looked at 6,949 accident reports from a federal project analyzing crashes, the National Motor Vehicle Crash Causation Survey. In the study, researchers listened to police scanners, then scurried to sites of auto accidents - “almost like ambulance chasers,” Borba said. Once there, they interviewed everyone on the scene and then filed a separate standardized report.
The reports collected data two ways. First, they filled out a structured form, assigning a number or code to a series of characteristics, like date and time of accident, driver use of medication, presence of a cell phone and whether a cell phone was used. Then they wrote a brief report, usually describing the accident in about 400 words. The longest was about 1,200 words.
There were two ways to analyze how often a driver was using a cell phone. One way - having a computer count up the checked boxes - is straightforward and is a standard analytic tool. The other way - having a computer “read” the narrative and draw conclusions - is more cutting edge.
So the study was a good way to show how well the latter works.
The first step, Borba said, is to take a typical phrase describing cell phone use - “He was dialing his cell phone,” for example. That needs to be broken down to a series of N-grams. (An N-gram is a word or short series of words.)
“He was dialing his cell phone” has quite a few N-grams. Each word of the sentence - “he”, “was”, “dialing”, etc. - gets its own N-gram. Two word combinations such as “he was” get their own N-grams. So do three-, four- and five-word combinations. Then N-grams that don’t really add to the mix, like combinations including the word “and,” get dropped.
Even so, there are a lot of N-grams. The 6,949 reports generated 13.3 million N-grams. Harnessing computer power, Borba searched the written reports for signs of cell phone use.
He found that about 20% of narratives mentioned cell phone use. The vast majority also had the cell phone box ‘checked.’ A few didn’t. Fewer still had the box checked but had no mention of cell phones in the written report.
Still, the analysis showed that using N-grams to comb through written reports is an effective way to make numbers out of words.
From there, Borba ran three traditional statistical analyses - controlling for variables like weather and time of day. He found that cell phone use:
- Didn’t increase the likelihood of an injury in an accident.
- Did increase the likelihood of a multi-vehicle accident.
- Did increase the likelihood of a rear-end collision.
The latter two make intuitive sense, Borba said. Using a cell phone seems particularly dangerous when other cars are around. If the distraction causes you to swerve, you are in greater danger if there are other cars around.
And if a driver looks at his phone for a moment, he might miss the driver in front of him hitting the brakes.
The point of the exercise, though, wasn’t to establish a breakthrough in the link between cell phones and auto accidents. It was to show how an actuary could use the technique to mine data from written reports, such as a claims memo.
Still, Borba said he learned something: “Turn off your cell phone in urban areas.”
Paul D. Anderson, a consulting actuary, Milliman, Inc. moderated the session.
The Casualty Actuarial Society fulfills its mission to advance actuarial science through a focus on research and education. Among its 5,400 members are experts in property/casualty insurance, reinsurance, finance, risk management, and enterprise risk management.
Mike Boa, Director of Communications and Marketing