Data Management Using Stata: A Practical Handbook, Second Edition

0 reviews
5.00 out of 5


Author: Michael N. Mitchell
Publisher: Stata Press
Copyright: 2010
ISBN-13: 978-1-59718-318-5
Pages: 532; paperback

  • 수량
  •  총 금액

Michael N. Mitchell’s Data Management Using Stata: A Practical Handbook, Second Edition comprehensively covers data management tasks, from those a beginning statistician would need to those hard-to-verbalize tasks that can confound an experienced user. Mitchell does this all in simple language with illustrative examples.


The book is modular in structure, with modules based on data management tasks rather than on clusters of commands. This format is helpful because it allows readers to find just what they need to solve a problem at hand. To complement this format, the book is in a style that will teach even sporadic readers good habits in data management, even if the reader chooses to read chapters out of order.


Throughout the book, Mitchell subtly emphasizes the absolute necessity of reproducibility and an audit trail. Instead of stressing programming esoterica, Mitchell reinforces simple habits and points out the time savings gained by being careful. Mitchell’s experience in UCLA’s Academic Technology Services clearly drives much of his advice.


The second edition brings updates needed for features added to Stata versions since Stata 10: reading and writing Microsoft Excel files, working with Unicode properly, and using frames. Mitchell also added a chapter showing how to build your own utility programs to simplify and automate routine tasks, easing code maintenance and aiding uniformity across projects.


New users will learn everything they need to import, clean, and prepare data for first analyses in Stata. Even experienced users will learn new tricks and new ways to approach data management problems.


This is a great book–thoroughly recommended for anyone interested in data management using Stata.



Michael Mitchell is a senior statistician working in the area of sleep research as well as working on prevention of child maltreatment with the Children’s Data Network. He is the author of three other Stata Press books—A Visual Guide to Stata GraphicsInterpreting and Visualizing Regression Models Using Stata, and Stata for the Behavioral Sciences.

List of tables
List of figures
1 Introduction
1.1 Using this book
1.2 Overview of this book
1.3 Listing observations in this book
1.4 More online resources
2 Reading and importing data files
2.1 Introduction
2.2 Reading Stata datasets
2.3 Importing Excel spreadsheets
2.4 Importing SAS files
2.4.1 Importing SAS .sas7bdat files
2.4.2 Importing SAS XPORT Version 5 files
2.4.3 Importing SAS XPORT Version 8 files
2.5 Importing SPSS files
2.6 Importing dBase files
2.7 Importing raw data files
2.7.1 Importing comma-separated and tab-separated files
2.7.2 Importing space-separated files
2.7.3 Importing fixed-column files
2.7.4 Importing fixed-column files with multiple lines of raw data per observation
2.8 Common errors when reading and importing files
2.9 Entering data directly into the Stata Data Editor
3 Saving and exporting data files
3.1 Introduction
3.2 Saving Stata datasets
3.3 Exporting Excel files
3.4 Exporting SAS XPORT Version 8 files
3.5 Exporting SAS XPORT Version 5 files
3.6 Exporting dBase files
3.7 Exporting comma-separated and tab-separated files
3.8 Exporting space-separated files
3.9 Exporting Excel files revisited: Creating reports
4 Data cleaning
4.1 Introduction
4.2 Double data entry
4.3 Checking individual variables
4.4 Checking categorical by categorical variables
4.5 Checking categorical by continuous variables
4.6 Checking continuous by continuous variables
4.7 Correcting errors in data
4.8 Identifying duplicates
4.9 Final thoughts on data cleaning
5 Labeling datasets
5.1 Introduction
5.2 Describing datasets
5.3 Labeling variables
5.4 Labeling values
5.5 Labeling utilities
5.6 Labeling variables and values in different languages
5.7 Adding comments to your dataset using notes
5.8 Formatting the display of variables
5.9 Changing the order of variables in a dataset
6 Creating variables
6.1 Introduction
6.2 Creating and changing variables
6.3 Numeric expressions and functions
6.4 String expressions and functions
6.5 Recoding
6.6 Coding missing values
6.7 Dummy variables
6.8 Date variables
6.9 Date-and-time variables
6.10 Computations across variables
6.11 Computations across observations
6.12 More examples using the egen command
6.13 Converting string variables to numeric variables
6.14 Converting numeric variables to string variables
6.15 Renaming and ordering variables
7 Combining datasets
7.1 Introduction
7.2 Appending: Appending datasets
7.3 Appending: Problems
7.4 Merging: One-to-one match merging
7.5 Merging: One-to-many match merging
7.6 Merging: Merging multiple datasets
7.7 Merging: Update merges
7.8 Merging: Additional options when merging datasets
7.9 Merging: Problems merging datasets
7.10 Joining datasets
7.11 Crossing datasets
8 Processing observations across subgroups
8.1 Introduction
8.2 Obtaining separate results for subgroups
8.3 Computing values separately by subgroups
8.4 Computing values within subgroups: Subscripting observations
8.5 Computing values within subgroups: Computations across observations
8.6 Computing values within subgroups: Running sums
8.7 Computing values within subgroups: More examples
8.8 Comparing the by and tsset commands
9 Changing the shape of your data
9.1 Introduction
9.2 Wide and long datasets
9.3 Introduction to reshaping long to wide
9.4 Reshaping long to wide: Problems
9.5 Introduction to reshaping wide to long
9.6 Reshaping wide to long: Problems
9.7 Multilevel datasets
9.8 Collapsing datasets
10 Programming for data management: Part 1
10.1 Introduction
10.2 Tips on long-term goals in data management
10.3 Executing do-files and making log files
10.4 Automating data checking
10.5 Combining do-files
10.6 Introducing Stata macros
10.7 Manipulating Stata macros
10.8 Repeating commands by looping over variables
10.9 Repeating commands by looping over numbers
10.10 Repeating commands by looping over anything
10.11 Accessing results stored from Stata commands
11 Programming for data management: Part 2
11.1 Writing Stata programs for data management
11.2 Program 1: hello
11.3 Where to save your Stata programs
11.4 Program 2: Multilevel counting
11.5 Program 3: Tabulations in list format
11.6 Program 4: Scoring the simple depression scale
11.7 Program 5: Standardizing variables
11.8 Program 6: Checking variable labels
11.9 Program 7: Checking value labels
11.10 Program 8: Customized describe command
11.11 Program 9: Customized summarize command
11.12 Program 10: Checking for unlabeled values
11.13 Tips on debugging Stata programs
11.14 Final thoughts: Writing Stata programs for data management
A Common elements
A.1 Introduction
A.2 Overview of Stata syntax
A.3 Working across groups of observations with by
A.4 Comments
A.5 Data types
A.6 Logical expressions
A.7 Functions
A.8 Subsetting observations with if and in
A.9 Subsetting observations and variables with keep and drop
A.10 Missing values
A.11 Referring to variable lists
A.12 Frames
A.12.1 Frames example 1: Can I interrupt you for a quick question?
A.12.2 Frames example 2: Juggling related tasks
A.12.3 Frames example 3: Checking double data entry