Data Analysis Using Stata, Third Edition

0 reviews
5.00 out of 5


Authors: Ulrich Kohler and Frauke Kreuter
Publisher: Stata Press
Copyright: 2012
ISBN-13: 978-1-59718-110-5
Pages: 497; paperback

  • 수량
  •  총 금액

Data Analysis Using Stata, Third Edition has been completely revamped to reflect the capabilities of Stata 12. This book will appeal to those just learning statistics and Stata, as well as to the many users who are switching to Stata from other packages. Throughout the book, Kohler and Kreuter show examples using data from the German Socio-Economic Panel, a large survey of households containing demographic, income, employment, and other key information.


Kohler and Kreuter take a hands-on approach, first showing how to use Stata’s graphical interface and then describing Stata’s syntax. The core of the book covers all aspects of social science research, including data manipulation, production of tables and graphs, linear regression analysis, and logistic modeling. The authors describe Stata’s handling of categorical covariates and show how the new margins and marginsplot commands greatly simplify the interpretation of regression and logistic results. An entirely new chapter discusses aspects of statistical inference, including random samples, complex survey samples, nonresponse, and causal inference.


The rest of the book includes chapters on reading text files into Stata, writing programs and do-files, and using Internet resources such as the searchcommand and the SSC archive.


Data Analysis Using Stata, Third Edition has been structured so that it can be used as a self-study course or as a textbook in an introductory data analysis or statistics course. It will appeal to students and academic researchers in all the social sciences.


Ulrich Kohler is a sociologist at the Social Science Research Center Berlin (WZB). Dr. Kohler is an organizer of the German Stata Users Group meetings.


Frauke Kreuter is an associate professor at the Joint Program in Survey Methodology (JPSM) in the University of Maryland–College Park, professor at the Statistics Department in the Ludwig-Maximilians-University of Munich, and currently head of the Statistical Methods group at the Institute for Employment Research (IAB) in Nuremberg, Germany.


Both authors are associate editors of the Stata Journal. They coauthored a German textbook, Datenanalyse mit Stata, which was the predecessor of this book. They used Data Analysis Using Stata to teach several classes and short courses at the University of Mannheim, the University of Konstanz, the Free University of Berlin, and the University of California–Los Angeles, among others.


List of tables
List of figures
Preface (PDF)
1 The first time
1.1 Starting Stata 
1.2 Setting up your screen 
1.3 Your first analysis 
1.3.1 Inputting commands 
1.3.2 Files and the working memory 
1.3.3 Loading data 
1.3.4 Variables and observations 
1.3.5 Looking at data 
1.3.6 Interrupting a command and repeating a command 
1.3.7 The variable list 
1.3.8 The in qualifier 
1.3.9 Summary statistics 
1.3.10 The if qualifier 
1.3.11 Defining missing values 
1.3.12 The by prefix 
1.3.13 Command options 
1.3.14 Frequency tables 
1.3.15 Graphs 
1.3.16 Getting help 
1.3.17 Recoding variables 
1.3.18 Variable labels and value labels 
1.3.19 Linear regression 
1.4 Do-files 
1.5 Exiting Stata 
1.6 Exercises 
2 Working with do-files
2.1 From interactive work to working with a do-file 
2.1.1 Alternative 1 
2.1.2 Alternative 2 
2.2 Designing do-files 
2.2.2 Line breaks 
2.2.3 Some crucial commands 
2.3 Organizing your work 
2.4 Exercises 
3 The grammar of Stata
3.1 The elements of Stata commands 
3.1.1 Stata commands 
3.1.2 The variable list 
List of variables: Required or optional 
Abbreviation rules 
Special listings 
3.1.3 Options 
3.1.4 The in qualifier 
3.1.5 The if qualifier 
3.1.6 Expressions 
3.1.7 Lists of numbers 
3.1.8 Using filenames 
3.2 Repeating similar commands 
3.2.1 The by prefix 
3.2.2 The foreach loop 
The types of foreach lists 
Several commands within a foreach loop 
3.2.3 The forvalues loop 
3.3 Weights 
Frequency weights 
Analytic weights 
Sampling weights 
3.4 Exercises 
4 General comments on the statistical commands
4.1 Regular statistical commands 
4.2 Estimation commands 
4.3 Exercises 
5 Creating and changing variables
5.1 The commands generate and replace 
5.1.1 Variable names 
5.1.2 Some examples 
5.1.3 Useful functions 
5.1.4 Changing codes with by, n, and N 
5.1.5 Subscripts 
5.2 Specialized recoding commands 
5.2.1 The recode command 
5.2.2 The egen command 
5.3 Recoding string variables 
5.4 Recoding date and time 
5.4.1 Dates 
5.4.2 Time 
5.5 Setting missing values 
5.6 Labels 
5.7 Storage types, or the ghost in the machine 
5.8 Exercises 
6 Creating and changing graphs
6.1 A primer on graph syntax 
6.2 Graph types 
6.2.1 Examples 
6.2.2 Specialized graphs 
6.3 Graph elements 
6.3.1 Appearance of data 
Choice of marker 
Marker colors 
Marker size 
6.3.2 Graph and plot regions 
Graph size 
Plot region 
Scaling the axes 
6.3.3 Information inside the plot region 
Reference lines 
Labeling inside the plot region 
6.3.4 Information outside the plot region 
Labeling the axes 
Tick lines 
Axis titles 
The legend 
Graph titles 
6.4 Multiple graphs 
6.4.1 Overlaying many twoway graphs 
6.4.2 Option by() 
6.4.3 Combining graphs 
6.5 Saving and printing graphs 
6.6 Exercises 
7 Describing and comparing distributions
7.1 Categories: Few or many? 
7.2 Variables with few categories 
7.2.1 Tables 
Frequency tables 
More than one frequency table 
Comparing distributions 
Summary statistics 
More than one contingency table 
7.2.2 Graphs 
Bar charts 
Pie charts 
Dot charts 
7.3 Variables with many categories 
7.3.1 Frequencies of grouped data 
Some remarks on grouping data 
Special techniques for grouping data 
7.3.2 Describing data using statistics 
Important summary statistics 
The summarize command 
The tabstat command 
Comparing distributions using statistics 
7.3.3 Graphs 
Box plots 
Kernel density estimation 
Quantile plot 
Comparing distributions with Q–Q plots 
7.4 Exercises 
8 Statistical inference
8.1 Random samples and sampling distributions 
8.1.1 Random numbers 
8.1.2 Creating fictitious datasets 
8.1.3 Drawing random samples 
8.1.4 The sampling distribution 
8.2 Descriptive inference 
8.2.1 Standard errors for simple random samples 
8.2.2 Standard errors for complex samples 
Typical forms of complex samples 
Sampling distributions for complex samples 
Using Stata’s svy commands 
8.2.3 Standard errors with nonresponse 
Unit nonresponse and poststratification weights 
Item nonresponse and multiple imputation 
8.2.4 Uses of standard errors 
Confidence intervals 
Significance tests 
Two-group mean comparison test 
8.3 Causal inference 
8.3.1 Basic concepts 
Data-generating processes 
Counterfactual concept of causality 
8.3.2 The effect of third-class tickets 
8.3.3 Some problems of causal inference 
8.4 Exercises 
9 Introduction to linear regression
9.1 Simple linear regression 
9.1.1 The basic principle 
9.1.2 Linear regression using Stata 
The table of coefficients 
The table of ANOVA results 
The model fit table 
9.2 Multiple regression 
9.2.1 Multiple regression using Stata 
9.2.2 More computations 
Adjusted R2 
Standardized regression coefficients 
9.2.3 What does “under control” mean? 
9.3 Regression diagnostics 
9.3.1 Violation of E(εi) = 0 
Influential cases 
Omitted variables 
9.3.2 Violation of Var(εi) = σ2 
9.3.3 Violation of Cov(εi, εj) = 0, i ≠ j 
9.4 Model extensions 
9.4.1 Categorical independent variables 
9.4.2 Interaction terms 
9.4.3 Regression models using transformed variables 
Nonlinear relationships 
Eliminating heteroskedasticity 
9.5 Reporting regression results 
9.5.1 Tables of similar regression models 
9.5.2 Plots of coefficients 
9.5.3 Conditional-effects plots 
9.6 Advanced techniques 
9.6.1 Median regression 
9.6.2 Regression models for panel data 
From wide to long format 
Fixed-effects models 
9.6.3 Error-components models 
9.7 Exercises 
10 Regression models for categorical dependent variables
10.1 The linear probability model 
10.2 Basic concepts 
10.2.1 Odds, log odds, and odds ratios 
10.2.2 Excursion: The maximum likelihood principle 
10.3 Logistic regression with Stata 
10.3.1 The coefficient table 
Sign interpretation 
Interpretation with odds ratios 
Probability interpretation 
Average marginal effects 
10.3.2 The iteration block 
10.3.3 The model fit block 
Classification tables 
Pearson chi-squared 
10.4 Logistic regression diagnostics 
10.4.1 Linearity 
10.4.2 Influential cases 
10.5 Likelihood-ratio test 
10.6 Refined models 
10.6.1 Nonlinear relationships 
10.6.2 Interaction effects 
10.7 Advanced techniques 
10.7.1 Probit models 
10.7.2 Multinomial logistic regression 
10.7.3 Models for ordinal data 
10.8 Exercises 
11 Reading and writing data
11.1 The goal: The data matrix 
11.2 Importing machine-readable data 
11.2.1 Reading system files from other packages 
Reading Excel files 
Reading SAS transport files 
Reading other system files 
11.2.2 Reading ASCII text files 
Reading data in spreadsheet format 
Reading data in free format 
Reading data in fixed format 
11.3 Inputting data 
11.3.1 Input data using the Data Editor 
11.3.2 The input command 
11.4 Combining data 
11.4.1 The GSOEP database 
11.4.2 The merge command 
Merge 1:1 matches with rectangular data 
Merge 1:1 matches with nonrectangular data 
Merging more than two files 
Merging m:1 and 1:m matches 
11.4.3 The append command 
11.5 Saving and exporting data 
11.6 Handling large datasets 
11.6.1 Rules for handling the working memory 
11.6.2 Using oversized datasets 
11.7 Exercises 
12 Do-files for advanced users and user-written programs
12.1 Two examples of usage 
12.2 Four programming tools 
12.2.1 Local macros 
Calculating with local macros 
Combining local macros 
Changing local macros 
12.2.2 Do-files 
12.2.3 Programs 
The problem of redefinition 
The problem of naming 
The problem of error checking 
12.2.4 Programs in do-files and ado-files 
12.3 User-written Stata commands 
12.3.1 Sketch of the syntax 
12.3.2 Create a first ado-file 
12.3.3 Parsing variable lists 
12.3.4 Parsing options 
12.3.5 Parsing if and in qualifiers 
12.3.6 Generating an unknown number of variables 
12.3.7 Default values 
12.3.8 Extended macro functions 
12.3.9 Avoiding changes in the dataset 
12.3.10 Help files 
12.4 Exercises 
13 Around Stata
13.1 Resources and information 
13.2 Taking care of Stata 
13.3 Additional procedures 
13.3.1 Stata Journal ado-files 
13.3.2 SSC ado-files 
13.3.3 Other ado-files 
13.4 Exercises