Checking Data Entry

From PsychWiki - A Collaborative Psychology Wiki

Jump to: navigation, search
  1. People make mistakes. If your data have been entered by hand (by a person), then you need to double-check whether any mistakes were made when transferring into a dataset (software like SPSS, SAS, R+, S, etc.)
  2. Computers make mistakes. If you collected your data online or through an online system (such as surveymonkey or your own hosted site, then the data *should* transfer into the dataset without error, unless the online system was incorrectly set up.
  3. Irrespective of how the mistake occurred, mistakes will misrepresent your true data. The purpose of conducting research is to discover reality, so incorrectly entered data thwart the purpose of research.
  4. Misrepresenting the data that was collected can significantly impact your findings. A single incorrectly entered number can be an outlier or reduce normality or change the findings from your study.
  1. They can't all be prevented, but some simple procedures can reduce data entry errors.
  2. Ask research assistants to put their initials at the end of each line of data they enter. This helps them feel accountable while they are entering the data and also gives you a way to keep track of who entered what. You may discover that one research assistant makes far more errors than another (and should be assigned to something else when the next data entry project arises).
  3. Make clear to research assistants during training that talking and entering data at the same time is unacceptable. Drop in on them periodically during their data entry time to make sure they're complying.
  4. Tell them ahead of time that accuracy is more important than speed. Some research assistants think you'll be impressed if they finish ahead of schedule (i.e., they overvalue speed, perhaps to the detriment of accuracy).
  1. Have two or more people enter the same data and look for discrepancies. Fe40.png - If you enter the data in excel or spss, you can have two or more people enter the data into separate excel files, and then merge them together looking for differences between the two.
  2. Have someone enter the data, and then double-check by randomly picking different segments to look for incorrectly entered data. If the data enterer's initials appear at the end of each row of data, you'll be able to track whether one research assistant is noticeably less conscientious than the others.
  3. Statistical software (like SPSS, SAS, R+, S, etc) can use descriptive analysis to look for numbers that are out of range or errors in data entry. Fe40.png - The output below from SPSS for the variable "system1" shows that a subject put a "13" for the question even though the only correct responses were 1 through 11.
System1 outofrange.PNG

  1. The first step is to identify why it was entered incorrectly. Fe40.png - The output above for variable "system1" shows a "13". Since 13 is an invalid number, you then need to identify why “13” was entered. Did the person entering data make a mistake? Or, did the subject respond with a “13” even though the question indicated that only numbers 1 through 11 are valid?
  2. You can identify the source of the error by looking at the hard copies of the data. Fe40.png - Find the subject who indicated the "13" by sorting the data by that variable. Look at the hard copies of the data for that subject to see if it was the subject who put a "13" or whether the subject responded with a different number than recorded in the dataset.

◄ Back to Analyzing Data page

Personal tools