Initial Analysis

Initial Analysis

Now let's move to some real research. Let's say you're interested in the relationship between party identification -- the variable we've been using as an example so far -- and the frequency one attends religious services. To be a little more specific, let's begin with the hypothesis (which we will test here and in the proceeding sections) that people who attend religious services more frequently will identify more with the Republican Party.

So now, in addition to the party identification variable, let's introduce a new variable. The GSS asks respondents how often they attend religious services. The possible responses are never, less than once a year, once a year, several times a year, once a month, 2-3 times a month, nearly every week, every week and more than once a week. In order to see the responses, we will run a tabulation on this new variable "attend".

`tab attend`

You should receive the following output:

The table above shows the number of respondents that selected each response. In the highlighted box, we can see that 140 individuals out of a total of 1,966 (or 7.12% of the total number of respondents) attend religious services more than once a week.

Crosstabs

Now let's run a slightly more involved tabulation, this time with two variables. This is called a crosstabulation, or crosstab. You simply tell Stata to tab two variables rather than one. In this case:

`tab partyid2 attend`

This gives the following result:

The categories for the party identification variable (partyid2) run vertically along the left side, while the categories for the attending religious services variable (attend) run horizontally along the top. If you find the intersection of category 0 on partyid2 and category never on attend, for example, you will see that 82 individuals fall into this category: Strong Democrats who never attend religious services. 356 individuals identified as "Strong Democrats" and 465 individuals answered that they never attend religious services. Therefore, of the 356 "Strong Democrats" 82 "never" attend religious services and out of the 465 individuals that "never" attend religious services, 82 identify as "Strong Democrats". You can similarly find any other combination. For instance, if you want to see how many "Strong Republicans" attend religious services once every week, simply find the crosstabulation of category 6 on partyid2 and category "every wee" on attend. Turns out there are 64 of these people in the sample.

One clear problem with the crosstab above is they just give you raw numbers. What you should generally be more interested in is now the number of people but the percentage of respondents that make up a category. You can tell Stata to include percentages by including an additional command for row and column percentages. In this case, amend the previous command by adding ", row column" to the end, which would be this command:

`tab partyid2 attend, row column`

This gives you a more detailed crosstab where you can find percentages as well as raw frequencies (this is going to take two image files because it's a big output!):

So, what do all these new numbers mean? The top number in each cell is still the raw frequency you got before you added the ", row column" bit. The second number in each cell is now the row percentage (left to right -- note how it adds across the row to 100.00 in the righthand "Total" ). The third number in each cell is the column percentage (top to bottom -- note how it adds down the column to 100.00 in the bottom "Total"). Therefore, we can conclude that 17.63% of respondents that "never" attend religious services identify as "Strong Democrats" and 23.03% of respondents that identify as "Strong Democrats" "never" attend religious services.

Also, notice that the total number of observations shown in the crosstabulation for "partyid2" and "attend" is 1,899. This number is smaller than the total number of observations for "attend" and "partyid", 1,966 and 1,906 respectively. The number of observations decreased because the crosstabulation only included respondents that correctly entered a response for "partyid2" and "attend". We can tell Stata to show us the observations that were dropped from "partyid2" in the crosstabulation with the following command:

list partyid2 attend if missing(attend)

Here, we can see that seven observations were dropped (we do not include "1214" because it does not have a valid entry for partyid2). These seven observations account for in the total number of observations in "partyid2" (1,906) and the number used in the crosstabulation above (1,899).