The Workflow of Data Analysis Using Stata
75,600원
Author: J. Scott Long Publisher: Stata Press Copyright: 2009 ISBN-13: 978-1-59718-047-4 Pages: 379; paperback
The Workflow of Data Analysis Using Stata, by J. Scott Long, is an essential productivity tool for data analysts. Aimed at anyone who analyzes data, this book presents an effective strategy for designing and doing data-analytic projects.
In this book, Long presents lessons gained from his experience with numerous academic publications, as a coauthor of the immensely popular Regression Models for Categorical Dependent Variables Using Stata, and as a coauthor of the SPOST routines, which are downloaded over 20,000 times a year.
A workflow of data analysis is a process for managing all aspects of data analysis. Planning, documenting, and organizing your work; cleaning the data; creating, renaming, and verifying variables; performing and presenting statistical analyses; producing replicable results; and archiving what you have done are all integral parts of your workflow.
Long shows how to design and implement efficient workflows for both one-person projects and team projects. Long guides you toward streamlining your workflow, because a good workflow is essential for replicating your work, and replication is essential for good science.
An efficient workflow reduces the time you spend doing data management and lets you produce datasets that are easier to analyze. When you methodically clean your data and carefully choose names and effective labels for your variables, the time you spend doing statistical and graphical analyses will be more productive and more enjoyable.
After introducing workflows and explaining how a better workflow can make it easier to work with data, Long describes planning, organizing, and documenting your work. He then introduces how to write and debug Stata do-files and how to use local and global macros. Long presents conventions that greatly simplify data analysis—conventions for naming, labeling, documenting, and verifying variables. He also covers cleaning, analyzing, and protecting your data.
While describing effective workflows, Long also introduces the concepts of basic data management using Stata and writing Stata do-files. Using real-world examples, Stata commands, and Stata scripts, Long illustrates effective techniques for managing your data and analyses. If you analyze data, this book is recommended for you.
Comments from readers
You have written the book that I had planned to write someday. But I’m glad I didn’t—your book is much better. Congratulations, this was greatly needed.
Prof. Bill Gardner
The Ohio State University
I will post the announcement of Workflow on my door with the following note: “I’m glad to help anybody who followed at least 25% of the advice Long provides—and brings me their do-files!”
Prof. Alan C. Acock
Oregon State University
I just wanted to send you a thank you for taking the time to write this book. I feel a little like an obsessed fan because I read it for several hours last night, bought 3 copies for my new research team and am presenting our new organization scheme tomorrow. It turns out that we have just finished a first flurry of data collection and hiring and I’ve been scratching my head about how to systematize some aspects. It is a perfect time to superimpose a structure. I’ve used aspects of your plan in my own work (hence my eagerness to adopt) but having this coherent volume is a wonderful and practical resource. I learned a lot from reading this. Thank you!
Elizabeth Gifford, Ph.D.
Research Scientist
Duke University
I just received a knock at my door with my new copy of The Workflow of Data Analysis Using Stata. I immediately ripped off the packaging and began perusing it. Just before the knock, I was attempting to write a program to get Stata to save the r(mean) and r(sd) for two variables following a summarize command to be saved for a ttesti command. After looking at your book for about two minutes, I stumbled upon pages 91–92, where it gave me all the information I need. … I have only had the book about 10 minutes and already it has made my life easier. Thanks much, and I am already looking forward to reading the rest of the book!
Claire M. Kamp Dush, Ph.D.
The Ohio State University
I am a Spanish professor of public economics who is at present enjoying a study-research leave at Melbourne University (Australia). Because of that I have had the time to read your book from cover to cover. I just want to thank you for the incredible work you have done! A book such as this one is a must for anyone trying to make an academic career. Definitely, I will recommend it to my graduate students as soon as I go back to Spain. If I had the chance to reach this book twenty years ago I would have been much more efficient doing my work. Never is it too late! Thanks!
Prof. Jose Felix Sanz-Sanz
Dept. of Applied Economics
Universidad Complutense de Madrid
J. Scott Long is Chancellor’s Professor of Sociology and Statistics and Associate Vice Provost for Research at Indiana University–Bloomington. He has contributed articles to many journals, including American Sociological Review, Social Forces, American Statistician, and Sociological Methods and Research. He was editor of Sociological Methods and Research from 1987 to 1994. Dr. Long has authored or edited seven previous books on statistics, including the highly acclaimed Regression Models for Categorical and Limited Dependent Variables. In 2001, he received the Paul Lazarsfeld Memorial Award for Distinguished Contributions to Sociological Methodology. Each summer at the University of Michigan, he teaches workshops at the Inter-University Consortium for Political and Social Research Summer Program in Quantitative Methods of Social Research.
1.2 Steps in the workflow
1.2.2 Running analysis
1.2.3 Presenting results
1.2.4 Protecting files
1.3.2 Organization
1.3.3 Documentation
1.3.4 Execution
1.4.2 Efficiency
1.4.3 Simplicity
1.4.4 Standardization
1.4.5 Automation
1.4.6 Usability
1.4.7 Scalability
1.6 How the book is organized
2.2 Planning
2.3 Organization
2.3.2 Organizing files and directories
2.3.3 Creating your directory structure
A directory structure for a large, one-person project
Directories for collaborative projects
Special-purpose directories
Remembering what directories contain
Planning your directory structure
Naming files
Batch files
2.4.2 Levels of documentation
2.4.3 Suggestions for writing documentation
A template for research logs
3.1.2 Dialog boxes
3.1.3 Do-files
Use version control
Exclude directory information
Include seeds for random numbers
Use alignment and indentation
Use short lines
Limit your abbreviations
Be consistent
A template for simple do-files
A more complex do-file template
Log file already exists
Incorrect command name
Incorrect variable name
Incorrect option
Missing comma before options
Step 2: Start with a clean slate
Step 3: Try other data
Step 4: Assume everything could be wrong
Step 5: Run the program in steps
Step 6: Exclude parts of the do-file
Step 7: Starting over
Step 8: Sometimes it is not your mistake
3.3.4 Example 2: Debugging unanticipated results
3.3.5 Advanced methods for debugging
3.5 Conclusions
Global macros
Using double quotes when defining macros
Creating long strings
4.1.3 Setting options with locals
The forvalues command
Loop example 2: Creating interaction variables
Loop example 3: Fitting models with alternative measures of education
Loop example 4: Recoding multiple variables the same way
Loop example 5: Creating a macro that holds accumulated information
Loop example 6: Retrieving information returned by Stata
4.3.4 Debugging loops
4.4.2 Recoding data using include files
4.4.3 Caution when using include files
4.5.2 Loading and deleting ado-files
4.5.3 Listing variable names and labels
4.5.4 A general program to change your working directory
4.5.5 Words of caution
4.6.2 help me
5.2 The dual workflow of data management and statistical analysis
5.3 Names, notes, and labels
5.4 Naming do-files
5.4.2 Naming do-files to reproduce statistical analysis
5.4.3 Using master do-files
5.5.2 Datasets for larger projects
5.5.3 Labels and notes for datasets
5.5.4 The datasignature command
Changes datasignature does not detect
5.6.2 Systems for naming variables
Source naming systems
Mnemonic naming systems
5.6.4 Principles for selecting names
Use simple, unambiguous names
Try names before you decide
5.7.3 Principles for variable labels
Test labels before you post the file
5.7.5 Creating variable labels that include the variable name
Removing notes
Searching notes
Step 2: Assigning labels
Why a two-step system?
Removing labels
2) Include the category number
3) Avoid special characters
4) Keeping track of where labels are used
5.9.4 Consistent value labels for missing values
5.9.5 Using loops when assigning value labels
5.10.2 Using label language for short and long labels
Step 2: Archive, clone, and rename
Step 3: Revise variable labels
Step 4: Revise value labels
Step 5: Verify the changes
Step 1b: Try the current names and labels
Step 2b: Create rename commands
Step 2c: Rename variables
Step 3b: Revise variable labels
Step 4b: Create label define commands to edit
Step 4c: Revise labels and add them to dataset
Binary-data formats
Using other statistical packages to export data
Using a data conversion program
Values review of data on family values
Examining high-frequency values
Links among variables
Changes in survey questions
Creating indicators of whether cases are missing
Using extended missing values
Verifying and expanding missing-data codes
Using include files
Verify that new variables are correct
Document new variables
Keep the source variables
The clonevar command
The replace command
6.3.4 Additional commands for creating variables
The egen command
The tabulate, generate() command
6.3.6 Verifying that variables are correct
Listing variables
Plotting continuous variables
Tabulating variables
Constructing variables multiple ways
6.4.4 Internal documentation
6.4.5 Compressing variables
6.4.6 Running diagnostics
Checking for unique ID variables
6.4.8 Saving the file
6.4.9 After a file is saved
Creating binary indicators of positive attitudes
Creating four-category scales of positive attitudes
7.1.2 Planning in the middle
7.1.3 Planning in the small
7.2.2 What belongs in your do-file?
7.3.2 Documenting the provenance of results
7.4.2 Loops for repeated analyses
Loops for alternative model specifications
Saving results from nested regressions
Saving results from different transformations of articles
7.4.5 Include files to load data and select your sample
7.6 Replication
7.6.2 Software and version control
7.6.3 Unknown seed for random numbers
Letting Stata set the seed
Training and confirmation samples
Regression tables with esttab
Font size
Presentations
7.9 Conclusions
8.2 Causes of data loss and issues in recovering a file
8.3 Murphy’s law and rules for copying files
8.4 A workflow for file protection
Part 2: Offline backups
8.6 Conclusions
The working directory
A.3 Customizing Stata
A.3.2 Commands to change preferences
Options that need to be set each session