PSY302-300506302
From PsychWiki - A Collaborative Psychology Wiki
TESTS
Two-sample t-test: between (independent)
Definition: Testing the relationship between a categorical independent variable and a continuous independent variable, in which the categorical independent variable is a between-subjects design with two levels.
Example: http://www.nature.com/ijo/journal/v24/n4/full/0801175a.html
Application: Differences between the independent variables of the PWS group and control group were analyzed by the two-sample t-test. Analysis of covariance was used to calculate the difference in ADMR between both groups, defined by the binary variable PWS, adjusted for bone age BMR, FM and gender as the other independent variables in the model.
Two-sample t-test: within (related)
Definition: Testing the relationship between a categorical independent variable and a continuous independent variable, in which the categorical independent variable is a within-subjects design with two levels.
Example: http://www.nature.com/ejcn/journal/v58/n7/full/1601935a.html
Application: Comparisons of the fasting results between the pre and post-intervention periods were performed using Student's paired t-test (two-tailed). Statistical analysis of the results was also performed by repeated-measures analysis of variance (ANOVA) with two within-subjects factors (pre and post-intervention, meal pattern), or with three within-subjects factors (time after the test meal, meal pattern, pre and post-intervention) as appropriate. When ANOVA indicated a significant main effect, the level of significance for pairwise comparisons of each was obtained. Statistical significance was set at P<0.05 for all statistical tests.
One-Way ANOVA test
Definition: Testing the relationship between a categorical independent variable and a continuous independent variable, in which the categorical independent variable is a between-subjects design with three or more levels.
Example: http://www.highbeam.com/doc/1G1-213956671.html
Application: This article discussed some tests that the researchers had done in the previous articles using the One-way ANOVA test. The one-way ANOVA in a randomised design was applied to an optometric experiment designed to compare the effect of the drug Tropicamide on the degree of pupil dilation in a group of control subjects, a group of patients diagnosed with Alzheimer's disease (AD), and a group of patients diagnosed with dementia other than AD. This type of ANOVA was described as a 'fixed effects' model in which the objective was to estimate the differences between the subject groups and these were regarded as 'fixed' or discrete effects to be estimated.
Two-Way ANOVA test
Definition: Testing the relationship between two categorical independent variable and a continuous independent variable.
Application: In this article, each set of 3 measurements was made on two occasions (separated by a minimum of 30 minutes and additional data collection involving 46 further measurements of posture and movement on the same and an additional subject before the thoracic kyphosis measurements were re-measured) by one rater. The reliability of the measurements was analyzed using 2-way ANOVA intraclass correlation coefficients, 95% confidence intervals and standard error of measurement for precision, for a single measurement and the average of 3 measures.
Correlation test
Definition: Testing the relationship between two continuous variables, either of which can be considered the independent or dependent variable.
Example: http://psychcentral.com/news/2010/03/08/women-who-drink-gain-less-weight/11970.html
Application: In this article, researchers found a correlation on women who drink moderately appear to be better at keeping the weight off than those women who don’t drink at all. Where drinking alcohol is your independent variable and the dependent is the weight that the women gain.
Chi-square test
Definition: Testing the relationship between two categorical variables, either of which can be considered the independent or dependent variable.
Example: http://www.biomedcentral.com/1471-2458/10/123/abstract
Application: This was a study that the researchers wanted to find out if improving nutrition knowledge among children may help them to make healthier food choices. They used the chi-square test to help find these results. "Total nutrition knowledge score at follow-up, adjusted for baseline score, deprivation, and school size, was higher in intervention than in control schools (mean difference = 1.1; 95% CI: 0.05 to 2.16; p=0.042). At follow-up, more children in the intervention schools said they 'are currently eating a healthy diet' (39.6%) or 'would try to eat a healthy diet' (35.7%) than in control schools (34.4% and 31.7% respectively; chi-square test p<0.001)."
Concepts
Standard Score
Definition: A score obtained by using the transformation z=(X-(mean))/S
Application: This article used to convert raw scores into the final scores that appear in the World’s Best Universities rankings tables. They collected the data and then they calculate standard scores for each column of data so that they are compatible. That allowed them to combine the data reliably and apply the weightings fairly in the calculation of the overall score.
Confidence Interval
Definition: A range of score values expected to contain the value of mu with a certain level of confidence.
Example: http://abovethelaw.com/2010/03/fantasyscotusnet_testing_the_a.php
Application: They have recorded their outcome statistics and standard majority ratio (SMR). The SMR provides a method to test whether or not users perceive the Court as dominated by conservative ideology. A total of 39% of predictions found that the Supreme Court would reverse, and the confidence interval was 10.25% (at the 95% level), indicating that the results were a decently accurate representation of predictions about the case (and their general incorrectness).
Parametric Test
Definition: A statistical test involving hypotheses that state a relationship about a population parameter.
Example: http://www.accessmylibrary.com/article-1G1-141754783/measuring-security-price-performance.html
Application: The article is called, Measuring security price performance using Chilean daily stock returns: the event study method. This paper examines statistical properties of daily stock returns and how the particular characteristics of these data affect the empirical performance of the short-run events-study methodology when security returns data are drawn from the Chilean stock market. Their findings showed that while symptoms of non-normality in security returns and security abnormal returns persist even at the portfolio level, methods based on the use of parametric tests for samples of 10 or more securities are well specified, at least for a significance level of 5%. The power of the three parametric t-tests was very sensitive to both the sample size and the length of the event period. Although the evidence presented in this study provides an initial basis for selecting between alternative procedures in event studies conducted in the Latin American stock market.
Nonparametric Test
Definition: A statistical test involving hypotheses that do not state the relationship about a population parameter. Also known as a distribution free test.
Example: http://goliath.ecnext.com/coms2/gi_0199-6253985/A-distribution-free-tabular-CUSUM.html
Application: Computing the control limits for the latter two procedures requires an estimate of the variance parameter of the monitored process. That would be, the sum of covariances at all lags. Nevertheless, these CUSUM-based charts are distribution free since one can estimate the variance parameter using a variety of distribution-free techniques that are popular in the simulation literature.
Statistically significant difference
Definition: The observed value of the test statistic falls into a rejection and null hypothesis is rejected.
Example: http://abovethelaw.com/2010/03/fantasyscotusnet_testing_the_a.php
Application: In Briscoe v. Virginia, the Supreme Court reversed the Virginia Supreme Court’s. The Court held that a Defendant does not waive his 6th Amendment rights to the confrontation of forensic analysts who prepared forensic evident used in court by failing to call them as witnesses. 63% of predictions correctly guessed that the Supreme Court would reverse. The results were statistically significant at a 99% confidence level, with a confidence interval of 11.86. For the specific split, only 3 users, 4.6% of total predictions, predicted a unanimous outcome.
Nonsignificant difference
Definition: The observed value of the test statistic does not fall into a rejection region and the null hypothesis is not rejected.
Example: http://abovethelaw.com/2010/03/fantasyscotusnet_testing_the_a.php
Application: In Florida v. Powell, the Court reversed a Florida Supreme Court decision that held that informing a defendant that they had a right to “talk to an attorney” was insufficient to inform them they had a right to have counsel present. Overall, 51% of members predicted that the Court would reverse, but with such a tight margin, the results were not statistically significant. Only three users, approximately 2% of total predictions, predicted the correct split, while only one user predicted the correct votes. In conjunction with the indeterminate nature of the general outcome predictions, the SMRs did not reveal any additional information since all Justices had SMRs that were not significantly different from 1.
Between-subjects Design, with two groups
Definition: An experiment in which two or more groups are created and tested against each other.
Example: http://www.nature.com/npp/journal/v28/n1/full/1300004a.html
Application: In this article they are testing facial expressions of two subject groups by using between-subjects design. The total that they used were, 24 healthy female volunteers between the ages of 21 and 59 years took part in this study. Participants were screened to exclude those with a current or previous history of psychiatric disorder or significant physical illness. All gave their written consent to participate in the study, which was approved by the local ethical committee. Volunteers were randomly allocated to receive citalopram (10 mg, i.v.) or placebo. These two groups were matched in terms of age (mean age: 40.13.6 and 37.33 years) and years of education.
Between-subjects Design, with three or more groups
Definition: An experiment in which three or more groups are created and tested against each other.
Example: http://www.highbeam.com/doc/1G1-14082678.html
Application: The present experiments directly investigate different DRO intervals using a between-subjects design to compare three values of the "R-[S.sup.R]" intervals and extinction (EXT) as response elimination procedures. Three experimental phases were used: acquisition, treatment, and reacquisition. No differences were expected between the groups in the number of responses emitted during the acquisition phase.
Random sampling
Definition: A sampling method in which individuals are selected so that each member of the population has an equal chance of being selected for the sample, and the selection of the member is independent of any other member of the population.
Example: http://www.accessmylibrary.com/article-1G1-141754783/measuring-security-price-performance.html
Application: In the article, Measuring security price performance using Chilean daily stock returns: the event study method, they did an examination that was carried out using a simulation approach analogous to that introduced by Brown and Warner. Unlike a Monte Carlo simulation, where the researcher samples artificially generates values from a specified theoretical probability distribution, the Brown-Warner approach randomly selects event dates and stocks to simulate event studies without assuming a particular distribution of stock returns. The contribution attempted to help test statistic selection, reducing the probability of misspecification when studies involve Latin American equity market securities.
Random Assignment
Definition: A method of assigning subjects to treatment groups so that any individual selected for the experiment has an equal probability of assignment to any of the groups and the assignment of one subject to a group does not affect the assignment of any other individual to that same group.
Example: http://www.accessmylibrary.com/coms2/summary_0286-10700332_ITM
Application: In this study it provided the recommended type of matching trial, with prospective assignment of clients to interactional therapy, based on their levels of sociopathy and global psychopathology. The comparison condition was random assignment to these treatments, to replicate the usual clinical practice of assigning clients to treatment without consideration of matching variables.
Independent Variable
Definition: A variable manipulated in an experiment to determine its effect on the dependent variable.
Example: http://www.nature.com/ijo/journal/v24/n8/full/0801236a.html
Application: The tumor, node, metastases (TNM) cancer staging system is widely accepted by physicians as a predictor of prognosis and as a guide to therapy. The statistical goal of staging cancer patients is to classify them into several stages so that within each stage the responses (survival) are homogeneous and the differences in survival between stages are large. The stages are defined through independent variables, which, in the case of TNM, include tumor histology, primary site, presence or absence of regional node involvement, and distant metastases.
Levels of the Independent Variable
Definition: One value of the independent variable. To be a variable, an independent variable must take on at least two different levels.
Example: http://www.highbeam.com/doc/1G1-126559787.html
Application: In this article the authors applied an exercise test to obtain more information to predict complications of surgery in the treatment of lung cancer. The test involves riding a static bicycle using a medical protocol. There are many independent variables, but three levels of it were the most important: One would be the exercise time in minutes, secondly is the expired volume of air in 1 second (respiratory function (RF)), measured as a percentage of the expected values for sex, age, and height, with values under the 25th percentile in the studied population considered pathological for the purpose of the investigation and thirdly is the oxygen desaturation during the test.
Confounds
Definition: An extraneous variable that is covarying with the independent variable, potentially masking the true effects of the variable on the dependent variable.
Example: http://www.nature.com/oby/journal/v16/n9/full/oby2008271a.html
Application: The question in this study was: Are the effects of depression and obesity related or do they influence C-reactive protein (CRP) levels independently? In the method and procedures, depression was measured using the Beck Depression Inventory (BDI). Confounding variables were age, gender, BMI, waist and hip measures, smoking and alcohol habits, medications, biochemical measures of the metabolic syndrome, and indirect measures of insulin resistance.
Dependent Variable
Definition: The variable in an experiment that depends on the independent variable. In most instances the dependent variable is some measure of a behavior.
Example: http://www.accessmylibrary.com/coms2/summary_0286-6590627_ITM
Application: The teaching assistant was faced with the difficulty of how to analyze the data to address her research hypotheses. During the course of her graduate studies, she had taken some research and statistics courses but her experiences were limited. The thing she did know about her study is: First her research design consisted of two independent variables and one dependent variable. One independent variable involved the repeated measurement of tests; this reminded her of the dependent samples (sometimes referred to as a paired-samples) t-test. The second independent variable involved a comparison of the morning and afternoon class; this comparison reminded her of the independent samples t-test or a one-way analysis of variance (ANOVA). The dependent variable was the actual test scores from the students.
Within-subjects Design, with two groups
Definition: A research design in which one group of subjects is exposed to and measured under each level of an independent variable. In a within-subjects design, each subject receives each treatment condition. Also known as a repeated measures design or a treatment-by-subjects design.
Example: http://www.highbeam.com/doc/1G1-14082678.html
Application: Differential reinforcement of other behavior (DRO), has often been used for the elimination of an operant response. This reinforcement-based response elimination procedure is frequently termed omission training (OT) in the applied setting and differential reinforcement of other behavior (DRO) in the laboratory setting. Very few studies have investigated the effects of varying the response-reinforcement interval on the effectiveness of DRO in eliminating a response. They used a within-subject design to investigate this parameter by varying the number of DRO units that had to be completed per food delivery.
Within-subjects Design, with three or more groups
Definition: A research design in which one group of subjects is exposed to and measured under each level of an independent variable. In a within-subjects design, each subject receives each treatment condition. Also known as a repeated mesures design or as a treatment-by-subjects-design.
Example: http://www.jultrasoundmed.org/cgi/content/abstract/29/3/337
Application: This study's objective was: Coracoid impingement has been recognized as an etiology for anterior shoulder pain; however, no imaging reference standard exists. We used sonography to compare the coracohumeral interval (CHI) in asymptomatic volunteers with the CHI in patients with coracoid impingement. By using repeated measures design they found that the CHI to be significantly narrower in symptomatic shoulders than in asymptomatic volunteers (P < .0001).
Main effect
Definition: The effect of the change in level of one factor in a factorial experiment measured independently of other variables.
Application: In the sorority research finds that the tradition may be more than just a rite of passage for many college women. It holds that rushing sororities may have profoundly negative effects on body image and self esteem, as optimistic pledges are often evaluated predominantly on their outward appearance, putting a great deal of pressure on them to look hot to find a spot. There was a significant main effect of membership, such that new sorority members showed higher levels of self-objectification compared to those who did not rush.
Interaction
Definition: A situation in a factorial design in which the effect of one independent variable depends on the level of the other independent variable with which it is combined.
Example: http://www.nature.com/ijo/journal/v24/n8/full/0801236a.html
Application: The first socio-demographic variable for which we investigated the association with Body mass index (BMI) was sex. There appeared to be significant interaction effects between sex and both occupational level and family income and therefore it was decided to perform sex-specific analyses. BMI at baseline was not related to the subsequent 6-year change in BMI (correlation coefficients in males: r=-0.06, P=0.25, and in females: r=-0.04, P=0.44).
Strength of Effect (Eta squared)
Definition: A measure of how much knowing the level of an independent variable that a subject received reduces the error in predicting the subject's score in the sample tested.
Example: http://goliath.ecnext.com/coms2/gi_0199-5284102/Library-instruction-online-or-in.html
Application: A repeated measures analysis of variance (ANOVA) was conducted to compare the pre and post-test scores of all three groups. There was a significant main effect for time, Wilks' Lambda =.46, F(1,138)= 159.55, p 0.71, and the effect size was small (eta squared =.005) showing that no one group changed more than the other two.
Scatterplot
Definition: A plot of a bivariate distribution in which the X variable is plotted on the horizontal axis and the Y variable is plotted on the vertical axis.
Example: http://bleacherreport.com/articles/361073-bubble-game-of-the-night-georgia-tech-vs-north-carolina
Application: In a sport article they used a scatterplot, of North Carolina's defensive efficiency and field goal percentage. There it shows that there is a correlation between when UNC shooting the ball well and playing better defense.
Positive relationship
Definition: A relationship between two variables in which, as the value of one variable increases the value of the other variable tends to increase also.
Example: http://www.highbeam.com/doc/1G1-140476599.html
Application: This article said that most of the studies have shown a strong positive relationship between inflation and variability of relative prices. In some studies it has been widely cited, supports the view that greater instability of inflation is associated with greater relative price dispersion. Within the literature, inflation has been decomposed into anticipated and unanticipated components to allow for consideration of systematic, persistent processes and differential roles of inflation shocks, respectively. The work of Cukierman and Hercowitz has demonstrated that the unanticipated inflation rate is the most statistically significant explanatory variable for RPV.
Negative relationship
Definition: A relationship between two variables in which, as the value increases, the value of the other variable tends to decrease.
Application: The study finds that the mean score on the Eating Attitudes Test (EAT), which measures eating habits and opinions, for both rushees and non-rushees, was “well below the proposed cut-off score that indicates a clinical level of eating disturbance.” Only women who accepted bids to sororities were included in the rush group, and only those who did not participate in rush at all were include in the non-rush group. The study further finds that “women who participated in sorority rush had higher levels of self-objectification,” adding that there was a negative correlation between body mass index (BMI) and finishing rushing a sorority. That means that for every one point increase in BMI, a rushee was 44 percent more likely to drop out of rush, according to the study.
No relationship
Definition: A zero correlation indicates that there is no relationship between the variables.
Example: http://newswise.com/articles/view/523539/
Application: In this article, a team of scientists from Philadelphia made a study to identify which stages of swallow function are differentially affected by chemoradiation treatment for head and neck cancer, to describe the incidence of long term complications including clinical pneumonia and prolonged feeding tube dependence, and to correlate the clinical variables to the modified barium swallow findings. A Fisher's exact test was used to correlate several clinical variables to modified barium swallow findings. No significant relationship was found between pharyngoesophagram abnormalities and a history of neck dissection, tumor stage, or patient age. Duration from radiotherapy to modified barium swallow also had no significant impact on swallow findings.
Linear relationship
Definition: A relationship between two variables that can be descrbibed by a straight line.
Example: http://www.waterworld.com/index/display/news_display/142290922.html
Application: For the Metropolitan Community College (MCC) project, six mixtures were prepared and samples were produced per ASTM C1688 during placement of the preliminary test panels. Unit weight and air void content for each mixture were measured and plotted, and a linear regression analysis was used to determine the relationship between air void content and unit weight. As one might expect, there is a linear relationship between void content and unit weight of pervious concrete mixtures, with a maximum unit weight (about 150 lb/ft3 [2400 kg/m3]) associated with zero air void content.
Curvilinear relationship
Definition: A graph consisting of, bounded by, or characterized by a curved line.
Example: http://www.highbeam.com/doc/1G1-55315002.html
Application: Using continuous measures of force and function, the authors have described the relationship as linear. There is growing evidence, however, that the relationship is curvilinear. Those studies in which a curvilinear relationship has been described, as well as some studies in which categories of function have been examined, support the concept of a threshold for muscle force for functional activities. Identification of force thresholds for various functional activities has important clinical implications.
Coefficient of Determination
Definition: The value of r squared indicating the common variance of variables X and Y.
Example: http://www.expressindia.com/news/fe/daily/19980918/26155174.html
Application: There has been a lot of talking and rhetoric on the so called correlation between the domestic and global stock markets. The main objective of this study was to explore whether any correlation exists between the Indian and the global stock market indices. The correlation between the BSE 30 and the Hang Seng indices for the same period is +0.27. A positive correlation would indicate that the BSE 30 should rise when the Hang Seng moves up and vice-versa. Considering the fact that 0.27 is a small figure, we could conclude on a miniscule positive correlation here. On squaring the correlation, the coefficient of determination works out to 0.07. This means only 7 per cent of thevariance of the BSE 30 is explained by the variation on the Hang Seng.
Correlation does not equal causation
Definition: Variables can be highly correlated, but the existence of a correlation between variables x and y does not imply one causes the other.
Application: Correlation does not imply causation. It is spurious to suggest the high growth in foreign labour force increased the wages of Singaporeans simply from the correlation between the two variables. Exported-orientated economy is highly leveraged to the global economic environment. The exogenous factor we all know is the higher external demand which resulted in increased demand for labour resources in the Singapore economy. Wage increases would have been higher had it not for the lax foreign labour policy which capped the wages of lower-end and middle-level Singaporean workers.
Extra-Credit (3 points each)
Normal Distribution
definition: A theoretical mathematical of the measured variable into different categories.
application: A normal distribution tells then to move such as them (or larger) should happen 0.13 per cent of the time. They should therefore see them just once in the sample of 894 weeks. But they have been much more common than this. There have been five three standard deviation falls in sterling against the dollar, and eight three standard deviation falls in the trade-weighted index. These are four to seven times more likely than a normal distribution predicts.
Symmetrical
definition: A distribution in which observations equidistant from the central maximum have the same frequency. Also known as symmetric distribution.
example: http://www.highbeam.com/doc/1G1-90220642.html
application: The approach followed in this study differed from previous analyses that they have done in two ways. First, they developed tests of mean reversion that are natural extensions to their model of the standard Dickey-Fuller test. Second, there seems no strong a priori reason to believe that transactions costs will necessarily generate symmetric deviations from purchasing power parity, so that we allow for asymmetric effects in our STAR models. To do so, they find it more convenient to adapt the logistic transition function rather than the exponential function used by previous authors. Although considerations of possible asymmetry have arisen in other applied econometric work, as, for example, in the analysis of Neftci of U.S. unemployment, there are relatively few applications to financial time series.
Asymptotic
definition: A distribution for which the tails of the distribution never touch the X axis.
example: http://www.accessmylibrary.com/article-1G1-168740031/asymptotic-confidence-limits-repairable.html
application: Confidence limits for availability and reliability of the two-unit redundant systems were investigated and examined asymptotic confidence limits for the steadystate availability of a two-unit parallel system with the introduction of preparation time for the service station. Also, it derived a consistent asymptotically normal estimator and an asymptotic confidence interval for the steady-state availability of a two-unit cold standby system in which the failure rate of the unit while online is a constant and the repair time distribution is a two-stage Erlangian.
Continuous
definition: Of or relating to a line or curve that extends without a break or irregularity.
example: http://archopht.ama-assn.org/cgi/content/full/116/2/165
application: A linear regression model was used for the statistical evaluation of the effects of age and type of Usher syndrome on the continuous variables of logMAR visual acuity and visual field area. In an attempt to reflect the changing relationship of the response variables with age, quadratic curves were used to fit the data.
Sampling Error
definition: The amount by which a sample mean differs from the population mean.
example: http://www.gallup.com/poll/126602/Taliban-Increasingly-Unpopular-Pakistan.aspx
application: Results are based on face-to-face interviews in Pakistan with 1,147 adults, aged 15 and older, conducted in Nov. 14 to Dec. 7, 2009, and 1,133 adults, conducted May 1 to June 30, 2009. For results based on the total sample of national adults, one can say with 95% confidence that the maximum margin of sampling error is ±3.7 percentage points.
Null Hypothesis
definition: A statement of a condition that scientist tentatively holds to be true about a population. The null hypothesis is the hypothesis tested by the statistical test.
example: http://www.medscape.com/viewarticle/718270
application: The authors purposely excluded studies that involved an initial, single-blind, lead-in phase, attempting to remove early placebo-responders. However, in doing so they further reduced the potential for identifying drug-placebo differences and increased the possibility of favoring the null hypothesis. Which in this study the null hypothesis is no difference between drug and placebo.
Alternative Hypothesis
definition: A statement of what must be true if the null hypothesis for a statistical test is false.
example:
application:
Significance level
definition: A probability value that provides the criterion for rejecting a null hypothesis in a statistical test.
example: http://content.nejm.org/cgi/content/full/NEJMoa1001286
application: At a significance level of less than 0.05, intensive blood-pressure management did reduce the rate of two closely correlated secondary outcomes, total stroke and nonfatal stroke. Assuming that this finding was real, the number needed to undergo intensive blood-pressure management to prevent one stroke over the course of 5 years was 89.
Two-tailed test
definition: A statistical test using rejection regions in both tails of the sampling distribution of the test statistic. Also called a nondirectional test.
example: http://www.nature.com/hdy/journal/v90/n5/full/6800254a.html
application: The directional heterogeneity test, based here on a nondirectional P-value obtained from a Kruskal–Wallis test (H=4.55, n=131, P=0.103), revealed that the ordering of the means was significant (rsPc=0.87, P<0.02). Similar analyses conducted separately for each cause of extinction showed the same ordering of the means, and were both significant by the directional heterogeneity tests
One-tailed test
definition: A statistical test employing a rejection region in only one tail of the sampling distribution of the test statistic. Also called a directional test.
Degrees of Freedom
definition: The number of scores to vary when calculating a statistic.
Type I Error
definition: The error in statistical decision making that occurs if null hypothesis is rejected when actually it is true of the population.
example: http://www.accessmylibrary.com/article-1G1-114009533/inflation-type-error-rates.html
application: The present study discloses that, for a wide variety of non-normal distributions, especially skewed distributions, the Type I error probabilities of both the t test and the Wilcoxon-Mann-Whitney test are substantially inflated by heterogeneous variances, even when sample sizes are equal. The Type I error rate of the t test performed on ranks replacing the scores (rank-transformed data) is inflated in the same way and always corresponds closely to that of the Wilcoxon-Mann-Whitney test. For many probability densities, the distortion of the significance level is far greater after transformation to ranks and, contrary to known asymptotic properties, the magnitude of the inflation is ma increasing function of sample size.
Type II Error
definition: The error in statistical decision making thatt occurs if null is not rejected when it is false and the alternative hypothesis is true.
example: http://goliath.ecnext.com/coms2/gi_0199-4717191/Multicollinearity-and-measurement-error-in.html
application: "The literature on structural equation models is unclear on whether and when multicollinearity may pose problems in theory testing (Type II errors). Two Monte Carlo simulation experiments show that multicollinearity can cause problems under certain conditions, specifically: (1) when multicollinearity is extreme, Type II error rates are generally unacceptably high (over 80%), (2) when multicollinearity is between 0.6 and 0.8, Type II error rates can be substantial (greater than 50% and frequently above 80%) if composite reliability is weak, explained variance ([R.sup.2]) is low, and sample size is relatively small. However, as reliability improves (0.80 or higher), explained variance [R.sup.2] reaches 0.75, and sample becomes relatively large, Type II error rates become negligible."
Power
definition: The probability of rejecting null hypothesis when null hypothesis is false and alternative hypothesis is true. The power of a statistical test is given by 1 minus beta.
example: http://mbe.oxfordjournals.org/cgi/content/abstract/9/1/138
applications: "The power of the test to detect genetic differentiation in a selectively neutral Wright-Fisher island model depends on both sample size and the rates of migration, mutation, and recombination. It is found that the power of the test is substantial with samples of size 50, when 4Nm less than 10, where N is the subpopulation size and m is the fraction of migrants in each subpopulation each generation. More powerful tests are obtained with genes with recombination than with genes without recombination."
Pairwise comparisons
definition: Statistical comparisons involving two means.
application: The Analytic Hierarchy Process (AHP) is a theory of measurement through pairwise comparisons and relies on the judgements of experts to derive priority scales. It is these scales that measure intangibles in relative terms. The comparisons are made using a scale of absolute judgements that represents, how much more, one element dominates another with respect to a given attribute.
Post-hoc comparisons
definition: Statistical tests that make all possible pairwise comparisons after a statistically significantly F obs has occured for the overall analysis of variance.
example: http://goliath.ecnext.com/coms2/gi_0199-5284102/Library-instruction-online-or-in.html
application: In this study, three groups of students in an English composition course received basic information literacy instruction by completing an online tutorial, attending a presentation by a librarian, or doing both. The between subjects effect for this research was .039 with a small effect size (eta squared =.05). Post-hoc comparisons using the Tukey HSD revealed that the mean score for Group C (M=14.04, SD=1.82) was significantly different from Group B (M=12.79, SD=2.69). The change in scores for students receiving both forms of instruction was significantly different from the group receiving classroom instruction, suggesting that the addition of LiONiL had an influence on the higher test scores in Group C.
◄ Back to Winter 2010 - Inferential Statistics - PSY302 page
