Setting Up Your Research

Setting Up Your Research

While it is possible to simply enter each command into the Command window one by one, it's better to keep track of all your commands in one place and run them from there. Stata allows you to do this in what it calls a "Do File." Also, in many cases you need to edit and clean up the dataset before you can begin serious analysis. This page introduces you to both of these issues. 

Do Files

To open a new Do File, click on the "New Do-file Editor" icon at the top of the screen. It looks like this: 

This brings up a Do File, which looks like this: 

Here you can keep track of all your commands. For example, a Do File with all the commands previously run in this tutorial would look like this: 

To execute a command from a Do File, highlight the command (or commands) you want to execute and click the "Execute (do") button at the top of the Do File screen, which looks like this: 

This will "do" (thus the name Do File) the files you've highlighted. 

It's best to get into the habit of working from a Do File. It might not seem obviously useful at first, but if you're working on a large research project like a seminar paper or a senior thesis it will be incredibly helpful. It allows you to immediately replicate any analysis that you have done with the click of a button. And in the event you find yourself needing assistance, sharing your Do File with one of the ERL assistants will make things much easier. 

Recoding variables

In many cases you will need to recode variables to make them useful. Consider the party identification question from the last section. It had 7 substantive categories, but the other category was actually worthless for research purposes. So even though it looks like an 8-category variable from the outset, it's really more of a 7-category variable in disguise. This biases the summary statistics (the minimum category is -1, even though that category is useless to you; it also skews the mean, standard deviation, etc.). 

So what you want to do in this case is recode the party identification variable and get rid of that category so you can focus on the actual 7-category scale. Stata allows you to do this with the "recode" command. In this case, I'm simply going to drop the -1 category and keep all the others the same. I'll name my new variable "partyid." The command to do this is the following: 

recode V083098X (-1 = .), gen(partyid)

Let's walk through the command. Like before, I begin with a command ("recode") and follow it with the variable name ("V083098X"). I next specify that I want the -1 category to be treated as missing, which in Stata means coding it as a dot ("."). I do this in parentheses. Now that I've told Stata what I want it to do to the original variable, I have to generate a new variable with a new name that has those properties. So I type a comma and a space and then use the "gen" command to generate a new variable, the name of which I place in parentheses afterwards (with no space in between). 

The output looks like this: 

Recall from the previous section that the -1 category had 40 observations, which is why there are 40 differences between the original variable (V083098X) and the new variable (partyid). Those 40 cases are now missing. 

To see this, use the "tab" command again, but this time on the new variable: 

tab partyid

The output looks like this: 

The new variable only has 7 categories (0, 1, 2, 3, 4, 5, 6), all of which are substantive (remember that they range from 0 = "Strong Democrat" to 6 = "Strong Republican"). 

Checking the summary statistics reveals a few differences there as well: 

sum partyid

Compared to the summary statistics for the original variable, the number of observations has dropped from 2,322 to 2,282. The mean is now 2.29667, compared to 2.239879 previously (having those 40 observations attached to the -1 value was weighing down the average). The standard deviation is also now slightly smaller and the minimum value is now 0 ("Strong Democrat") rather than -1 as it was previously. 



<< Back to Exploring a Dataset      
Continue to Initial Analysis >>


Return to Table of Contents


Questions, comments, concerns?
Send an email to the Empirical Reasoning Center
Or drop in during the ERC's open hours