A Review of Data Management: Databases and Organization, Fifth Edition
by Richard T. Watson [John Wiley & Sons Inc., 2005, $101.95]
Reviewed by David Hudson
Actuarial work relies on data. As such, ensuring appropriate data quality and availability is the concern of every actuary. The CAS research working party on Data Management and Information Educational Materials was formed to identify key educational resources on data issues for actuaries. The working party is reviewing the literature on the topic and this review is the second of several that will be published.
The introductory data management text, Data Management: Databases and Organization, focuses on the core skill of data modeling using SQL (structured query language) to implement the data models. The text also covers managerial perspective of data management, database architecture, emerging technologies, and data integrity.Overall, this text is very well written. The topics are self-contained, although the concepts of data modeling and SQL run throughout so those sections should not be skipped. Because of the book’s length (approximately 600 pages), it is probably best for actuaries to use the text as a reference book on particular topics. Watson divides his book into five sections. A brief synopsis of each follows.
Section 1, “The Managerial Perspective,” defines the concept of organizational memory, which includes not only computers, but also people, paper files, manuals, and reports. He also draws distinctions between data, information, and knowledge. According to Watson, “data are raw, unsummarized, and unanalyzed facts,” while “information is data that have been processed into a meaningful form.” Finally he states that “knowledge is the capacity to use information.” Watson makes the interesting point that the preceding perspectives on data and information are relative. One person’s information is another person’s data.
In section 2, “Data Modeling and SQL,” Watson considers data modeling and SQL skills as fundamental to data management. As such he devotes approximately half of the book to this topic. The style of this section is very straightforward and should be accessible to any actuary with some exposure to relational databases, such as Microsoft Access, SQL Server, or Oracle. Watson addresses in detail the basic building blocks of data modeling: modeling a single entity, one-to-many relationships, many-to-many relationships, one-to-one relationships, and recursive relationships.
The author repeatedly uses the same approach to explain new concepts, thus making the text easy to follow. First, he builds his examples using a standard data modeling diagramming syntax. Second, as each new modeling concept is introduced, a model is developed and then implemented in SQL. This is an effective technique for both data modeling and SQL since the concepts reinforce each other.
Watson also uses examples from standard relational databases such as Access and Oracle. While the book is not an Access reference and many advanced SQL features are not supported in Access, the text does give a good indication of the theoretical underpinnings about how a relational database product such as Access should be used. The text is filled with numerous exercises on both data modeling and SQL. It is a good primer for those actuaries that are interested in moving beyond Access.
The author thoroughly illustrates the concept of normalization as a method for increasing the quality of a database design. He goes through the development of six normal forms and describes the issues that these normal forms resolve. This is perhaps a little advanced for most actuaries, but it is interesting reading if one is willing to devote the effort.
Finally, Watson provides an “SQL playbook” that contains 61 sample queries that should handle most of the data manipulation tasks that an actuary may encounter.
Section 3, “Database Architectures and Implementations,” deals with more of the technical aspects of data management such as data structures and storage. It also provides a decent background on data processing architectures such as client/server technology. If nothing else, this section and Section 4 define much of the terminology that is used in many IT shops today. This is of great use to actuaries who need to understand the key concepts of various technologies to liaise with their IT departments.
Watson devotes a chapter in this section to object-oriented (OO) data management. He does a good job of describing the OO paradigm and then contrasting it with the relational paradigm. Since the relational model is primarily used in data management, and the OO model is used primarily in software engineering, Watson posits that it is important to be able to translate between the two. Among the differences he cites is that the OO paradigm has its basis in the software engineering principles of coupling, cohesion, and encapsulation, while the relational paradigm is based on the mathematical concepts of set theory.
Section 4, “Organizational Memory Technologies,” covers a potpourri of technologies. Watson devotes a chapter in this section that touches on data warehousing, data mining, and the multi-dimensional database (MDDB) or cube environment. Given that MDDB is (arguably) the best storage arrangement for actuarial triangles, this section should be of great interest to actuaries. Unfortunately, it barely scratches the surface on data warehousing and data mining. He also devotes two chapters to the Internet and provides some extensive examples on how to use SQL within Java. Finally, he closes the section with a good treatment of XML (extensible markup language) and its emerging use as a data management standard.
The final section, “Managing Organizational Memory,” covers two topics that most actuaries should find of interest: data integrity and data administration. In this time when actuaries are being asked to become advocates for data quality, it is important for them to understand what data quality really means. Watson states that maintaining data integrity involves three goals:
Protecting the existence of the data so it is available whenever it is needed;
Maintaining the quality of the data so that it is accurate, complete, and current; and
Ensuring confidentiality of data so that only those authorized can access it.
He then describes many techniques to achieve these goals.
The author also covers what he calls the 18 dimensions of data quality. As an example, let’s look at three of the dimensions—Accuracy, Timeliness, and Accessibility—and what conditions Watson sets for high quality (see Table 1).
These three dimensions, as well as the other 15 dimensions outlined in the book, are an ongoing pursuit and not a destination. It is worthwhile for actuaries to look at all 18 dimensions and see how each of their organization’s data stacks up against them.
Overall, I would highly recommend Data Management: Databases and Organization to those actuaries who are interested in learning more about the principles and challenges of data management.