Actuary Applies Data Mining to Twitter Posts About Major Insurers
PHILADELPHIA, PA, March 21, 2012 -- Usually actuaries focus on insurance rates and loss reserves, not advertisements.
But according to one actuary:
- Jake, the State Farm agent whom a jealous housewife was convinced was up to no good with her husband – because he worked at 3 a.m. – was a big hit;
- The GEICO pig - the one that squeals WHEEE all the way home - is a polarizing porker. People either love him or hate him.
- A lot of people are confused about where NFL quarterback Aaron Rodgers gets his Discount Double-Check.
The actuary, Roosevelt Mosley Jr., a principal at Pinnacle Actuarial Resources, reached these conclusions by tracking Twitter messages mentioning major insurers, loading the information into databases and, like any actuary, crunching the data.
Mosley presented his findings at the Casualty Actuarial Society’s Ratemaking and Product Management Seminar in Philadelphia, held March 19-21. He also incorporated them into a research paper, Social Media Analytics: Data Mining Applied to Insurance Twitter Posts, for which he was a recipient of the 2012 Management Data and Information Prize from the CAS.
He said his work is an example of ways actuaries can help company management get value from the tsunami of information surging through cyberspace. The actuarial skill set - scrubbing data, building and interpreting models, explaining results - is well-suited to the work.
“There’s a wealth of information out there that we are just starting to draw upon,” he said.
Mosley focused on Twitter, a service where subscribers blurt out 140-character blurbs about anything they feel like, to whoever happens to feel like reading it. But his approach could apply just as well, he said, to Facebook, LinkedIn and other social media platforms.
First, he captured the data. His first dataset consisted of all 68,370 tweets including #Allstate from July 29, 2010, and Aug. 12, 2011. A second dataset captured 176,694 tweets that mentioned State Farm, Allstate, Geico, Esurance and #Progressive between Jan. 25 and Feb. 12, 2012.
Mosley decided to concentrate on the State Farm, Allstate and Geico tweets. Not many tweets mentioned Esurance, and tweets containing the word progressive often ventured far from insurance, into topics such as progressive politics.
Mosley removed punctuation and most symbols (keeping @ and #, which are codes used in Twitter that are important in understanding a message). Each tweet was its own row of data, and each word was a column. There were other columns of identifying information, like who sent it, when they sent it, and in a few cases, the longitude and latitude of the sender.
He had to correct spelling errors - which are rife in a medium that focuses on fast typing and forces one to squeeze a thought into a strict limit of 140 characters. There are algorithms he could employ to automatically fix most misspellings, but some had to be interpreted individually.
And he had to create a taxonomy of several hundred keywords that identified what a tweet was about, like the GEICO pig or State Farm’s Jake.
Using the datasets, Mosley could track how often, for example, Allstate was mentioned each month - usually 4,000 to 6,000 times, though the numbers spiked in spring when the company appeared to be recruiting agents.
Or he could learn that the number of tweets per hour roughly followed the work day, peaking in the early afternoon, trailing off during drive time, then surging again at night.
He found patterns in tweets, learning that State Farm’s Jake is a hit but that its spots featuring Green Bay Packer quarterback Aaron Rodgers sometimes got confused with Allstate. (Example tweet: “If Aaron Rodgers didn’t spend so much time at the local Allstate office maybe the Packers would have beat the Giants. HAHA”).
GEICO, on the other hand, suffered no crossover. Its ads were distinctive. But the pig crying WHEEE had enemies as well as friends.
“Either the pig was great,” Mosley said, “or it was ‘time for the pig to become bacon.’”
Mosley’s prize-winning research paper was published in the Winter 2012 CAS E-Forum, Volume 2. The E-Forum is an online research publication that can be accessed through the CAS Web Site.
The Management Data and Information Prize is made to the author(s) of the best papers submitted in response to a call for data management and data quality research papers conducted by the CAS Committee on Management Data and Information. Papers are judged on the basis of originality of ideas, understandability of complex concepts, contribution to the literature, and thoroughness of ideas expressed. A second paper was also recognized by the Committee with a 2012 prize: Beginner's Roadmap to Working with Driving Behavior Data by Jim Weiss and Jared Smollik.
The Casualty Actuarial Society fulfills its mission to advance actuarial science through a focus on research and education. Among its 5,400 members are experts in property/casualty insurance, reinsurance, finance, risk management, and enterprise risk management.
Mike Boa, Director of Communications and Marketing