Research Results and Data Analysis

Research Results and Data Analysis
(read Cozby Chapter 12)

Scales of measurement revisited

Most independent variables are of the nominal type, identifying group membership with a code number for the group (1, 2, 3, 4, ...). Nominal variables can be developed from continuous interval or ratio variables by setting cutoff for group membership, such as quartiles or SD scores.

(E.g., AAS codes from 1 SD above mean on integration, assimilation, marginalisation, etc.)

Ordinal Variables can be used as independent variables however they posses relative value, rank ordering. Can perform various tests such as Spearman Rank Correlation, or Kendall's Tau Coefficient that examines order inversions/pairs.

Interval & Ratio Variables are common as dependent and possibly independent variables. Likert attitude scales are interval and very common. They offer meaningful summary statistics such as means and standard deviations.

Analysing Research Results

There are numerous types of quantitative analyses that can be done, ranging from frequency counts and comparing group percentages, through correlation of individual scores on two or more variables, to comparing group means and deviations.

Comparing group percentages on nominal data (for both) can be done with Chi² analysis. Here, the probabilities of each variable are used to predict the expected cell frequencies in the matrix. Chi²

makes use of this information in comparing expected and observed frequencies and the probabilities of such results occurring by chance.

Correlations of individual scores on two or more (interval/ratio) variables can be made, considering the presence and strength of the relationship between (among) them. If one knows one score and past relationships one can predict the other through regression analysis. Pearson's r is used.

Comparing Group Means (& deviations)

After making some kind of grouping or treatment between people the average scores per group can be compared along with the amount of deviation around the mean that the group's members score.

T-Test and ANOVA can be done for such questions.

Prior to any inferential statistics (hypothesis testing), it is crucial to get the basic summary statistics.

Frequency distributions

Frequency distributions give general information about how popular (common) particular scores are.

Various types of Graphing is possible, ranging from pie charts and bar graphs to frequency polygons.

Pie Charts are particularly good for showing proportions or percentage scores.

Bar graphs provide frequency counts for variables. Are particularly good for dichotomous data or discrete nominal / ordinal variables with only a small number of scores. Alternatively bars can represent small rages of scores, such as percentiles into ten equal width bars. Dual colour bars can be used to present two variables on one scale.

Frequency polygons make use of lines connecting dots or crosses that mark the specific or group scores. Can use dual colour or line/symbol type to designate two groups across the same scores.

Descriptive Statistics

Descriptive statistics provide researchers with general summary information about the scores or characteristics of the people they are studying.

Measures of Central Tendency

Give information about the centroid (or middle) of the distribution.

Mean or arithmetic average is one way to think about the 'middle' where one adds the scores and divides by the number of scores.

Median is the score that falls in the middle of the collection of scores. With an odd number of scores there will be one clear median, the score with equal numbers of scores above and below it.

For samples of equal numbers of scores the median is the average (midpoint) between the two scores that have the same number of scores above or below them.

Percentile scores are placements or rank order or percent placement. They represent the number of scores that fall below the given score in question.

The mode score is the one that occurs most frequently in the distribution. While a single mode is the most frequent score, bi, tri, & multimodal distributions exist with several 'peaks' of most frequent scores.

Measures of Dispersion

Range is a number that indicates the number of units across which the scores occur. Found by subtracting the lowest score from the highest score.

Variance is statistic that indicates the average amount that each scores is from the mean of the collection of scores. It is only appropriate for interval and ratio variables.

Standard Deviation is the square roots of the variance and is used as one of the important factors when calculating a significance t-test.

Graphing Relationships

Whether one is interested in the spread of individual scores or in the average scores of groups, it is best to start with graphing the data.

Addition of standard deviations to mean points is often useful in eyeballing the data. Consideration of scale magnitude also may affect the look of a graph, sometimes providing misleading visual illusions.

Correlation Coefficients (strength of relations)

After understanding the general trend or direction of a relationship between variables one can ask whether or not it is consistent or strong. When two variables increase or decrease together in a direct and consistent fashion across people, they are said to be positively or negatively correlated.

Pearson product-moment coefficient is most commonly used for this purpose with interval or ratio data. The coefficient 'r' provides information about both the strength and direction of the relationship, ranging from 1.00 to -1.00. A score of '0.00' indicates no correlation, while .657 or -.735 are strong correlations.

r is calculated by knowing the pairs of scores for a sample. As with other data, make a graph (scattergram) where each person is marked as an intersection of the two scores.
Each variable is represented by one axis.

Perfect correlations are marked by clusters of diagonal lines at 45^o angles to the axes. Zero correlations will make a spheroid shape of points with no slope.

Restriction of Range may be a problem where it can reduce the magnitude of correlation (if there is one in the population. Thus sampling a full range of potential scores is important to maintain heterogeneity of data.
E.g., When sampling only high GPA students for a programme it is often difficult to find strong correlations or to predict success.

Curvilinear Relationships may be present, however ppmc is insensitive to them, only able to detect linear relationships. Scatter plots will reveal them.

Effect Size

Is similar to r insofar as it indicates the strength of the relationship between the variables. Effect size is a form of correlation coefficient that estimates the strength of relationship between independent and dependent variables.
It can range from 0.00 to 1.00 above .5 are strong, below .3 are weak.

Statistical Significance

Is an estimate that is used in decision making about hypotheses, namely the probability that such result would occur by chance.

Regression Equations

Are used to calculate a person's score on one variable when their score on another one is known. This is used for predicting scores based upon other scores, such as fourth year gpa from first year gpa.

Regression formulae such as Y=a+bX are used to predict y from a constant and weighted known score x.
The constant 'a' gives a baseline and the weighting makes the adjustment from one scale to the other.

Testing criterion validity of intelligence or aptitude tests one attempts to predict later grades or occupational success based upon scores on some screening measure. One can predict the criterion variable from the predictor variable.

Multiple Correlations

It is possible to examine the correlation of many variables at once, not simple two. Multiple Correlation or (Regression) is done when some criterion variable is predicted using a number of predictor variables.
Greater accuracy can be achieved using multiple predictors , such as using college grades, GRE scores, letters of recommendation and work experience to select candidates for graduate school. Each variable is given a weighting, and they together have a baseline constant. Y=a+bX+cW+dZ....

Partial Correlation and tertiary variables

When considering correlations it may well be a third variable that is the source of correlation between two observed variables.
To remove the observed correlation between a 'third' variable and each of two others, a partial correlation is performed between the two principals.

Like graphic equalizing a sound track, this technique comes post-production to remove the noise and make the main instruments sound better.

Structural Modeling
Is a technique carried out to 'propose causal sequences' (not prove them) and test a theory based upon the regression coefficient on the components.

Path analysis is an earlier version (formula) that is used to explore relationships among variables.

Factor Analysis
Is a technique used to examine the relationships among a collection of variables all at once.

Principal Components Analysis attempts to find the hypothetical criterion variable that can be predicted best by groups of variables. Such 'criterion variables' or meta correlations are often found in groups where three or four such factors can be used to reduce the mass of data from many variables.

The variables are said to "load" on the factors insofar as they are correlated with them. Each variable has a loading for each factor that has been derived from the data. The total maximum number of factors is equal to the number of variables.

The fewer the factors the less variance is accounted for by the solution. Can specify the number of factors desired, or determined them through standard methods of factor selection.

Contemporary intelligence tests are built on the notion of factor analysis where specific traits or abilities load on the main factors or types of intelligence.

download brownb.doc