Assessing the quality of survey data (Good session!) #ESRA15 #MRX 


Live blogged from #ESRA15 in Reykjavik. Any error or bad jokes in the notes are my own. As you can see, I managed to find the next building from the six buildings the conference is using. From here on, it’s smooth sailing! Except for the drizzle. Which makes wandering between buildings from session to session a little less fun and a little more like going to a pool. Without the nakedness. 

Session #1 – Data quality in repeated surveys: evidence from a quasi-experimental design by multiple professors from university of Rome

  • respondents can refuse to participate in the study resulting in series of missing data but their study had very little missing data, only about 5% this time [that’s what student respondents does for you, would like to see a study with much larger missing rates]
  • questions had an i do not know option, and there was only one correct answer
  • 19% of gender/birthday/socioeconomic status changed from survey to survey [but we now understand that gender can change, researchers need to be open to this. And of course, economic status can change in a second]
  • Session #2 – me!  Lots of great questions, thank you everyone!

Session #3 – Processing errors in the cross national surveys

  • we don’t consider process errors very often as part of total survey error
  • found 154 processing errors in the series of studies – illegitimate variable values such as education that makes little sense or age over 100, misleading variable values, contradictory values, value discrepancies, lack of value labels, maybe you’re expecting a range but you get a specific value, what if 2 is coded as yes in the software but no in the survey
  • age and education were most problematic, followed by schooling
  • lack of labels was the worst problem, followed by illegitimate values, and misleading values
  • is 22% discrepancies out of all variables checked good or bad?

Session #4 – how does household composition derived from census data describe or misrepresent different family types

  • strength of census data is their exhaustivity, how does census data differ from a smaller survey
  • census counts household members, family survey describes families and explores people outside the household such as living apart, they desribe different universe. a boarder may not be measured in the family survey but yes mentioned in the census survey
  • in 10% of cases, more people are counted in the census, 87% have the same number of people on both surveys
  • census is an accounting tool, not a tool for understanding social life, people do not organize their lives to be measured and captured at one point and one place in time
  • census only has a family with at least one adult and at least one child
  • isolated adult in a household with other people is 5% of adults in the census, not classified the same in both surveys
  • there is a problem attributing children to the right people – problem with single parent families; single adults are often ‘assigned’ a child from the household
  • a household can include one or two families at the most – complicated when adult children are married and maybe have a kid. A child may be assigned to a grandparent which is in err.
  • isolated adults may live with a partner in the dwelling, some live with their parents, some live with a child (but children move from one household to another), 44% of ‘isolated’ adults live with family members, they aren’t isolated at all
  • previously couples had to be heterosexual, even though they survey as a union the rules split them into isolated adults [that’s depressing. thank you for changing this rule.]
  • census is more imperfect than the survey, it doesnt catch subtle transformations in societal life. calls into question definitions of marginal groups
  • also a problem for young adults who leave home but still have strong ties to the parents home – they may claim their own home and their parents may also still claim them as living together
  • [very interesting talk. never really thought about it]

Session #5 – Unexpectedly high number of duplicates in survey data

  • simulated duplicates created greater bias of the regression coefficient when up to 50% of cases were duplicated 2 to 5 times
  • birthday paradox – how many persons are needed in order to find two having an identical birthday – 23. A single duplicate in a dataset is likely.
  • New method – the Hamming diagram – diversity of data for survey – it looks like a normal curve with some outliers so i’m thing Hamming is simply a score like mahalonobis is for outliers
  • found duplicate sin 10% of surveys, 14 surveys comprised 80% of total duplicates with one survey at 33%
  • which case do you delete? which one is right if indeed one is right. always screen your data before starting a substantial analysis.
  • [i’m thinking that ESRA and AAPOR are great places to do your first conference presentation. there are LOTS of newcomers and presentation skills aren’t fabulous. so you won’t feel the same pressure as at other conferences. Of course, you must have really great content because here, content truly is king]
  • [for my first ESRA conference, i’m quite happy with the quality of the content. now let’s hope for a little sun over the lunch hour while I enjoy Skyr, my new favourite food!]

Related Posts

%d bloggers like this: