Exploring a Dataset
This page demonstrates how to load a dataset into Stata and explore the available variables.
If you open a .dta file on a computer with Stata, it should automatically open it in Stata. Alternately, you can type the following command:
The command "use" tells Stata you want to use the dataset you type immediately afterwards. In this case, the file name of the public opinion survey I'm using is "25383-0001-Data.dta" and it is located in folder D:\Temp\ICPSR_25383\DS0001. If the file were located elsewhere, the filename in quotation marks would be different. If you're not sure where your file is located, just double click to open it and Stata will figure it out. The screen should look like this:
Notice that you now have a list of variables in the Variables window at the top right. Public opinion datasets from the ANES tend to do a good job of providing usable information in this window, but it still might not be totally clear what each variable means. This is where codebooks are useful.
If you want to look at the data in spreadsheet form (think of how an Excel spreadsheet looks), you can click on the "Data Editor (Browse)" icon near the top. It looks like this:
This brings up the Data Editor, which looks like this:
You can scroll through to look over your data. Just click the X at the top right to go back to the main Stata window.
Now that you have a dataset loaded into Stata and (in this case, at least) a wide array of variables at your fingertips, let's pick one to explore in greater detail. I'll use party identification. Variable number V083098X is the main partisan identification variable. To display a frequency table of this variable, type the following command:
The "tab" command tells Stata to create a table of the variable name you type immediately afterwards. The result looks like this:
The first category ("-1. INAP, -9 in J1; -8,-9 in J1a; -8,-9") can be ignored. This category tells us that 40 people (1.72 percent of all respondents) didn't answer the question properly and thus could not be categorized. The real substance is providedby categories 0-6. In this survey, 580 individuals -- 24.98 percent of all respondents -- categorized themselves as Strong Democrat. You can also see how many identified themselves as Weak Democrat, Independent-Democrat, Independent-Independent, Independent-Republican, Weak Republican, and Strong Republican ("Independent-Democrat" means the person identifies themselves primarily as an Independent, but leans Democratic if forced to choose; likewise with "Independent-Republican").
You can also obtain a number of basic summary statistics about the variable by typing the following:
Just like before, the "sum" command tells Stata to summarize the variable named immediately after the command. The result looks like this:
This tells you there are 2,322 total observations (i.e., number of respondents) for this variable. It has a mean average of 2.239879. Its standard deviation is 2.031143. Its minimum value is -1 (that's the value ANES used to label the missing, etc., respondents) and its maximum value is 6 (that's the value ANES used to label Strong Republicans).