Help and standards of good practice in survey data management
On this page we try to give a succinct overview, and links to our own resources and relevant external sites, on good practice in the methodology of data management in the domain of social survey research. We think that this topic is in general unduly neglected from methodological discussion in the social sciences, even though data management choices and activities are both a major component of research workload, and can be a substantial influence upon the results of any analysis.
Comments are given covering:
- Quick start - help me out
- A methodology of data management
- Documentation for replication
- Matching files
- Recoding variables
- Averaging and scaling variables
- Professional standards
- References cited
The content of this page is mostly written by Paul Lambert, with a few contributions from other researchers from the DAMES Node

Quick start - help me out!
I want to link together some data files! Try our 'matching files' section
I want to change the categories of some variables! Try our 'recoding variables' section
I want to learn more about using syntax! Good! Our software guide handout introduces the idea of using syntax and includes further links. There are many examples of SPSS and Stata syntax/do files in the downloadable materials from our workshop programme, see especially 'Documentation and workflows for social survey research'. There are also materials further down this page, starting with our section on 'documentation for replication'.
All this stuff is too much for me! Try some introductory materials on survey data analysis and data support, or a training workshop on these topics. The ESDS sites have many useful links, resources and recommendations (with a UK orientation).

Defining 'data management' and a methodology of data management.
- What is 'data management'? In the DAMES Node we use the term 'data management' to refer to tasks which generally involve 'manipulating' or 'enhancing' data in some way. With survey data, this typically means tasks like recoding variables, standardising measures, or linking other data resources (such as aggregate statistics) to the survey record. These are all tasks typically performed by social science researchers for the benefit of their own research analysis. In general, there are usually a lot of plausible data management options open to an analyst, and some degree of selection between possibilities is undertaken
- I thought 'data management' meant.. We're well aware that for many readers the same term is associated with activities which are better thought of as being about 'controlling' data (e.g. tasks involved in archiving and distributing datasets, and research governance, typically performed by data archivists). These sort of 'data management' issues are less central to the DAMES Node's research. For a good external source on data management in this latter sense, we recommend the UK Data Archive's guidance on 'Creating and Managing Research Data'.
- Who is involved in data management? Data management tasks (in our sense) are commonly performed by researchers as a prelude to conducting their desired analysis - they are sometimes called 'pre-analysis' tasks, for instance (example: a researcher harmonising measures of education across a selection of cross-sectional surveys prior to a pooled analysis in which the new measure will be an explanatory variable). The same sorts of tasks are also very often performed by people who distribute data, and indeed many of the more extensive bodies of information resources on data management tasks can be found at data distribution projects (example: staff at a data archive preparing an indicator of 'highest educational qualification', to be released as a derived variable with a survey, according to data from the multiple response question on which qualification(s) a respondent holds).
- What would a 'methodology of data management' involve?
      We'd argue that a methodology of data management involves systematic attention to the processes of data management, alongside effort in providing tools or services (such as software) to enable researchers to undertake relevant tasks. In general terms, is is desirable to clarify what is being done (for instance, how is a particular recode of educational qualifications calculated); why it is done in a particular way (e.g. what is the source reference for the recode that was used); and what the consequences of approaching it in this or a different manner might be (e.g. a brief sensitivity analysis of the recode used, versus some plausible alternatives). In practical terms, software resources which can support most data management requirements have been available for quite some time, but it isn't always the case that researchers are confident or effective in their use of software for such purposes. For example, we don't always manage to preserve a clear record of the tasks that have been undertaken, and sometimes we simply don't know how to go about conducting a particular type of manipulation.
      For us, the central pillars of a methodology for data management are threefold: (1) embedding 'documentation for replication' as a guiding objective in data management activities; (2) unpacking popular but moderately complex data management tasks (see our resources below intended to support tasks of matching files, recoding, and scaling variables , and to enable sensitivity analysis across different derived measures); and (3) promoting professional standards which include expectations of good scholarship concerning data management tasks.
- Data management matters!
      Arguably, for quite some time the activities of data management haven't generated much academic esteem in the social sciences. Writing in 1986, for example, our colleague Ken Prandy summarised working on related processes as '...often seen as a form of 'good housekeeping' - useful, but essentially supportive of real work' (1986: 137). In one sense this is surprising since data management tasks are a large part of many research endeavours, and data management decisions clearly have the potential to have a major impact upon the results of an analysis. On the other hand, data management can be quite an onerous activity, whilst it is quite commonly the case that the empirical consequences of different data management strategies are not so great (sometimes they are, and sometimes they're not, in our experience!). So why think about the methodology of data management at all? We argue that good practice in data management is important because it is essential to the replication of work; because it certainly can matter in the sense of substantially influencing empirical results in important ways (albeit potentially and not necessarily); because there is no longer a good technological excuse not to undertake relevant data management transformations; and because good practice in data management can act more generally a prelude to becoming a more fluent and effective data analyst in itself. For us, therefore, good practice in data management is cenral to good professional standards of scholarly empirical research - Show me an example...
      We prepared a technical paper, Lambert and Gayle (2008), in which we illustrate a range of data management operations, their documentation (via Stata and SPSS syntax), and their consequences for the results of analysis. The paper is a short analysis of RAE scores from 2008 (the RAE is a UK exercise in ranking research quality at Higher Education institutions). We think the paper makes a good example because it includes some instances where a data management decision has a major impact on the interpretation of results - for instance standardising ratings by size of submission, size of unit, and relative prestige of subject, all matter substantially. However it also includes examples where the impact of different data management operations is minimal - for instance different numerical averaging rules have little impact on analytical results about rankings. We used publicly available data on UK Higher Education Institutions so that both the analysis and data files can also be made available from our website if you wish to replicate the work (the files are available from here)
As a second example, the graph above depicts the way in which three different measures of the socio-economic structure vary, on average, by the age of adult respondents. We think it makes a good illustration of the relevance of data management activities in two ways: first, it reminds us the there are all sorts of socio-economic indicator measures available to us, of which the three measures shown, of occupational position, educational attainment, and income, are a small subset; second, it shows different behaviour across different measures, here in regard to the relation to age, meaning that the choice of measure could be of considerable impact and accordingly a good analyst should try out many different measures and explore their relation with age (indeed, controlling adequately for age and/or birth cohort differences is often a particularly challenging aspect of data management when dealing with survey data).

Documentation for replication.
Shortcut! Here's our extended pdf handout on this general topic (produced for a workshop)
As described above, we are concerned with a methodology of 'data management' where we are interested in activities which manipulate or enhance data resources for the purpose of research analysis. The documentation of such processes can usefully be thought of as concerning the 'paper trail' which tells us adequately about tasks which have been undertaken. That trail is commonly in the form of software specific command files (e.g. 'do files' in Stata, or 'syntax files' in SPSS), but it could conceivably be in another format - for example, the hand-written laboratory notebook is a classic form of comparable documentation, whilst contemporary lab books often come in the form of a collection of related paper and electronic documents.
The most effective form of documentation can be described as 'documentation for replication'. This refers to the idea that the documentation must be sufficiently clear and detailed that other researchers (or the analyst themselves, at a later date) ought to be able to use the documentation to exactly replicate the data management and analysis undertaken. Dale (2006) and Freese (2007) have both written persuasively on the possibilities and desirability of documentation for replication in the social sciences, and we would argue that it is an essential prerequisite to adopting a scientific approach to social research.
Unfortunately, we'd claim, most social survey research projects do not achieve documentation for replication. It is rare to see published results being accompanied by immediate access to the 'paper trail' which documents what generated them, and indeed replication analyses themselves are very rarely published in social science domains. Part of the problem, we'd assert, lies with the inattention to replication analysis within professional standards in the social sciences(see also below). Most of the problem, though, we think, reflects little more than a lack of awareness and training amongst many researchers over how to undertake their work in a manner that readily generates good documentation.
Actually, in most circumstances, it is reasonably easy to undertake analysis whilst generating documentation of a suitable standard for replication. For most of us working with survey data, the trick is to use a syntactical programming language which, with just a little extra effort in tidying up the file, can readily provide a replicable log of the tasks undertaken. Popular examples include the use of SPSS 'syntax' files or Stata 'do' files (illustrated below). If you're already familiar with such approaches to syntactical programming you won't need to be convinced of their benefits; if you are not, however, suffice to say that whilst such languages may initially seem a little intimidating, they are in fact quite easy to pick up, and are hugely beneficial in the longer term.
The figures above depict respectively a researcher using SPSS and Stata syntax files to run their analysis. We have lots more examples of using syntax (plus onward links to external resources)in our workshop materials and software guide document. If you're not already familiar with the format of syntax files being used for documentation-for-replication, we'd suggest there are two important features to recognise:
- The text written on the syntax files directly invokes commands in the software language, which performs operations on the data and generates outputs. The syntax file might in principle be run in one single operation, but many users work progressively through a file by invoking commands bit by bit (a few lines at a time).
- Good syntax files should be clearly annotated with metadata about the analysis (e.g. author, date, source of data used); carefully organised sequentially; and written in a reasonably generic way (e.g. in the above image, macros are used to define the specific location of the data files on the computer being used, so that they don't need to be re-specified later on).
So, the practical work of documentation for replication in social survey research usually centres on using software effectively in order to achieve an adequately replicable 'paper trail'. The figures we illustrate above depict examples of working in Stata and SPSS through syntax files. In most situations this will support adequate levels of documentation, and Long (2009), for example, gives a very thorough introduction to using Stata effectively for this purpose.
As an aside, the task of learning how to use software effectively, for most of us, transmutes into the task of learning to be a good programmer in the relevant software language - most successful data analysts, therefore, in the social sciences, are also succesful programmers in the language of their choice. At the time of writing, we'd argue that there are few sophisticated activities, in terms of data management and data analysis, that can be achieved without some degree of programming proficiency. One interesting contemporary development here can be seen in the ongoing ESRC-funded 'E-Stat' Node, which is trying to build a software system to better support the combined inputs of three groups of specialists - statisticians, programmers and social scientists - for the benefit of social science research. Amongst other contributions, this project is working on an 'ebook' tool which is intended to provide researchers with a dynamic log book featuring electronic records of all stages of the research undertaken in a coherent and clearly documented manner. The ebook being developed has the capacity to record automatically generate syntax code and so does not necessarily rely upon the researcher learning the relevant programming language in order to record a consistent log.
New resource! We've written an extended guide on how to use data analysis software effectively for the purposes of good standards of documentation and data management more generally (originally distributed as part of our workshop of 24/5 November 2010). We recommend it for detailed instructions on how to organise your work with data analysis software packages more effectively. It also has accompanying example command files in different software packages (Stata, SPSS, R, MLwiN and lEM), accessible from our the website for our November 2010 workshop.
Relevant workshop! We held a 2-day training workshop on the general topic of documentation for replication in social survey research in November 2010. The materalas for that workshop (e.g. presentation slides and lab session handouts and example files, including detailed guides to good practice in preparing documentation) are all available online, and we'd encourage you to look into them.
Other sources: There are many other places where it is possible to find more introductory material and worked examples of using software effectively for documentation for replication. The UCLA statistical computing pages are particularly widely used across the social sciences, and in the UK, the ESDS training sites include many software orientied resources. For our part, we have prepared many example command files, as well as collating further external links, which can be found from the 'Stata support' and 'SPSS support' from our earlier project on 'Longitudinal Data Analysis for Social Science Researchers' (these were written in 2008).

Matching files
By 'matching files' we are refering to linking together different electronic datasets in a structured way. In survey data analysis, the files involved are the characteristic 'variable-by-case' matrix which summarises the survey data, and some other relevant data. Matching files then refers to the process of linking components of different (but intentionally related) variable-by-case matrices.
This figure tries to illustrate a typical file matching operation. File A is a social survey microdata file, with a measure of the respondents' occupations ('bjbiscon'). File B is separate aggregate data with information (namely the ISEI codes taken from Ganzeboom and Treiman, 1996) on occupations ('isco88', which is in the same format as 'bjbjiscon'). After a file-matching routine is run, File C is generated, which in this example can be thought of as an augmented version of file A (it now has an extra variable, namely 'isei')
Various software packages have routines to support matching files, but their techniques are not widely taught in social science training courses, and in our experience many researchers are not confident in how to link files together. This is unfortunate, because a great many useful data enhancements require some sort of file matching exercise to be undertaken.
(Our discussion here focusses upon approaches to what is sometimes called 'deterministic' file matching, which means linking together data files according to shared and known identifier characteristics. This is in contrast to 'probabilistic file matching', which generally refers to using some statistical algorithm to impute values on one dataset (recipient) on the basis of statistical patterns in another related dataset (donor). There are quite a few current methodological projects in the UK involved in developing probabilistic matching, or 'data fusion', techniques - for instance ADMIN, BIAS, NeISS, and our own project theme on social care data).
Common examples of file matching can be described as 'one-to-many' links (whereby records from the cases of one file are distributed across a number of related records in a second file, such as when distributing aggregate level summaries to individual cases); the one-to-one link (whereby records from different files are linked on a shared identifier value, such as when linking individual responses from different years in a longitudinal study); and appending data files (where different datasets are added to the same record but individual cases between the datasets are not explicitly linked).
In a recent workshop we gave extended examples of matching files using Stata (see the materials from our training workshop of August 2009), and in earlier project we developed illustrative examples in both SPSS and Stata covering a range of file matching operations (see the online materials for the project on 'Longitudinal Data Analysis for Social Science Researchers').
Since file matching is such an important data management practice we give below illustrations of some of the most important mechanisms for matching data, across the three software package Stata, SPSS and R. We do not know of too many other online guides to file matching using these packages, although the ATS Statistical Computing (UCLA) webpages are one noble exception (see their guides for SPSS, Stata and R). These are brief and rather superficial examples, however - take a look at our workshop programme (see above) for more extended examples of matching files!
Example 1: A one-to-many match merge operation using one shared variable. In this example, we have a microdata file from a survey (fileA) plus we want to link in aggregate data about occupations from an external files (isco88_isei). The merge below links in cases form the isco88_isei file whenever there is suitable data on fileA to allow that. After the match all records from fileA are retained, but the procedure discards any remaining data from isco88_isei which does not successfully match to the microdata file. |
Example files: SPSS syntax example: get file="isco88_isei.sav". rename variables isco88=bjbiscon. sort cases by bjbiscon. sav out="temp.sav". get file="fileA.sav". sort cases by bjbiscon. match files file=* /table="temp.sav" /by=bjbiscon. sav out="fileC.sav". |
|
Image representing one-to-many matching:
|
|
Stata syntax example: use isco88_isei.dta, clear rename isco88 BJBISCON sort BJBISCON sav temp.dta, replace use fileA.dta, clear sort BJBISCON merge BJBISCON using temp.dta tab _merge /* (_merge was Auto-generated) */ keep if _merge==1 | _merge==3 drop _merge /* _merge used to keep fileA cases only */ sav fileC.dta, replaceR syntax example: fileA <- read.table("fileA.dat", header=T)
isco88_isei <- read.table("isco88_isei.dat", header=T)
fileC <- merge(fileA, isco88_isei, by.x="BJBISCON", by.y="isco88",
all.x=T, all.y=F, sort=F, suffixes = c(".x",".y") )
write.table(fileC, file="fileC.dat", col.names=T, row.names=F)
|
Example 2: A one-to-one match merge operation using one shared variable. The example shown here is of merging cases from two different data files which pertain (potentially) to the same respondents. Respondents are uniquely indicated by the 'id' variable, and the combined file shows, side-by-side, respondents' values on the two different files. In practice, not all respondents are on both files. In the illustrative figure, the _merge variable indicates coverage: _merge=3 means that the respondent does have values linked from both datasets; _merge=1 or _merge=2 means that they only have values from the A or O dataset respectively. |
Example files: SPSS syntax example: get file="waveOsubset.sav". sort cases by id. sav out="m1.sav". get file="waveAsubset.sav". sort cases by id. match files file=* /in=merge1 /file="m1.sav" /in=merge2 /by=id. cro tables=merge1 by merge2. /* merge used to indicate presence */ Stata syntax example: use waveOsubset.dta, clear sort id sav temp.dta, replace use waveAsubset.dta, clear sort id merge id using temp.dta tab _merge drop _merge /* _merge generated automatically */R syntax example: waveA <- read.table("waveAsubset.dat", header=T)
waveO <- read.table("waveOsubset.dat", header=T)
waveAO <- merge(waveA, waveB, by.x="id", by.y="id",
all.x=T, all.y=T, sort=F, suffixes = c(".x",".y") )
write.table(waveAO, file="waveAOcomb.dat", col.names=T, row.names=F)
|
|
Image representing one-to-one matching:
|
Example 3: Appending multiple related files. **Under construction - inputs to follow** |
Further notes on the syntax examples.
    To simplify the example we haven't included the paths of the files within the syntax. The above syntax will work if you set the 'file handle' in SPSS, or use 'cd' in Stata or 'setwd()' in R, to ensure that the software is pointing to the folder where you've downloaded the example files. Alternatively, put the full path into the file call, or use macros to define paths for the relevant files (see our software guide on this issue).
    Common practical problems with match-merge operations are when the storage formats of the key linking variables are not compatible (some data processing may be necessary, such as converting a string format into a numeric format); and when data files have a linking variable in a 'one' file which is not in fact unique for each case (that is, in a one-to-many link or a one-to-one link, there should only be one distinct row for every unique combination of the key linking variable(s) for the 'one' part of the link; and anything else will give you some form of error, though the appearance will vary between software packages - to address this you may need to subset your data to ensure only one-case-per-value, see our section on aggregating data for examples).
       **DAMES ONLINE TOOL FOR MATCHING DATA FILES**
If you don't fancy writing out the necessary software syntax to merge to related data files, we also have an online service which will perform this task for you if you submit the two data files and provide information on the files that are to be matched.
[IN PREPARATION]

Recoding variables.
Text to follow
       **DAMES ONLINE TOOL FOR RECODING VARIABLES**
[IN PREPARATION]

Averaging and scaling variables.
Text to follow
       **DAMES ONLINE TOOL FOR AVERAGING AND SCALING VARIABLES**
[IN PREPARATION]

Professional standards.
After spending some time thinking about the processes and practice of data management in social survey research, we've come to develop fairly firm views on good and bad habits in this research domain. We present below some intentionally pejorative notes on what we think is required for more effective application of a scientific approach to social survey research.
The model of science that we adopt is influenced by Steuer (2003). We take as central tenets of a scientific approach to our field the principle that empirical survey research should be cumulative (i.e. motivated by, building upon and learning from previous endeavours); and that it should be open to cross-examination and further evaluation, by being explicitly recorded and its processes exposed to others. Of course, these are by no means agreed upon definitions for the term 'science' or social science research, but we think they are useful principles to aspire to in survey research, because we would claim that they are common features of many of the most productive examples of published social survey research (e.g. Townsend, 1979; Breen, 2004), whilst they are frequently absent in more problematic survey-based studies (cf. Huff, 1954).
It is all very well for us to claim that reseach would be better conducted by adopting certain principles, such as of documentation for replication and so forth. We may claim the above, and others may claim differently, and it may not be so obvious if either is always right or who should have influence. We can however appeal to the idea of professional standards in order to make a case that there could usefully be agreed upon standards for academic social survey research. The presence of a code of conduct that is both agreed upon and enforced is commonly seen as a defining feature of professionalism (e.g. Prandy, 1965), and we think that this principle would be a useful device for setting quality standards linked to empirical survey research.
To elaborate the analogy for the example of survey research, the specification of professional standards can be made in order to ensure that relevant work is conducted to a suitable standard, whereas it is argued that using qualification standards alone, and the membership of associations, are not adequate criteria for professionalism. In social survey research, we could point to skill requirements (e.g. the ability to coax a good graph out of a software package), and publication critera (e.g. the ability to successfully disseminate analytical results via publications), as broad parallels to qualification standards and association membership. Our claim is that neither of these prove sufficient to gaurantee that productive scientific work is conducted in our domain - researchers can be both qualified,and successful in publishing their work, but this doesn't ensure a desirable standard of scientific investigation is being undertaken. If such standards are desirable, it is necessary to express more clearly the methodological expectations relevant to such approaches in the context of survey research.
A codification of professional standards in survey research could therefore be used as a device for enforcing more rigorous scientific standards in survey analysis projects. That codification cannot, of course, be printed on quality paper and henceforth disseminated across the research community to immediate effect. It could, however, gradually filter through to a variety of relevant gatekeeping organisations, such as journal and research grant refereeing criteria, higher examining standards, appointment criteria and so forth, to the long term benefit of social science research standards. The content of the code of conduct, in turn, is obviously vital, and the critical claim of our argument is that better quality research (more cumulative and replicable) is only likely to be enabled if the relevance of data management is recognised, and prominent within, a professional code of conduct designed to promote better standards of survey research.
Our claim hinges upon the terms and definitions of data management given above. There, we sought to show that a range of activities involved in the organisation and preparation of data are potentially very consequential to the results of analysis, and that in general terms there was much room for imporvement in the way that that range of activities is typically undertaken. Elaborating on these issues, we claim that the following points ought in our view to be embedded within our professional expectations as social science researchers in the domain of social survey analysis:
- Documentation for replication: It should be a standard expectation that survey-based research projects should supply suitable documentary materials that others could use to accurately replicate their analysis. This would require description of data used, and description of data management operations and analytical techniques. The former can often be achieved within a published output, but in most circumstances that full details of the latter cannot, and instead require provision of adequately documented syntax files
- Expectations of review and replication in data and definitions: Researchers are faced with many data management choices such as over variable standardisation and operationalisation options (comparability between studies is often hindered by lack of consistency in the choices made between studies). We assert that it should be a standard scholarly expectation that a new analysis finds out and considers the definitions that were used in other studies, and cites them accordingly. Subsequently, the analysis should by default use the same measures as previously, unless a convincing argument can be made for changing instruments. Setting a citation requirement for all such choices, and penalising work which does not provide such scholarship, would be a reasonable strategy to enforce such expectations
- Valuation of replication analysis: Replication analyses specifically (i.e. those studies which directly replicate a previous study subject to a relatively small variations in definition or scope) help to build research traditions cumulatively. Nevertheless contemporary esteem criteria, such as publication criteria, may appear to under-value replication studies in comparison to alternative studies which apparently present new or original work. Accordingly esteem criteria should be explicitly revised in order to better recognise the scientific value of replication.
- Expectations of sensitivity analysis: Sensitivity analysis refers in broad terms to conducting many different analyses with different permutations of data in order to check that results are consistent across otpions, and to identify consequential differences if not. This is highly relevant to data management operations in survey research, such as the example of choosing between multiple candidate measurement operationalisations for a particular concept. Accordingly, it should be a standard expectation of analytical studies that senstivity analysis is conducted and/or cited whenever a range of different data management operations are available .
- Expectation of fluency in both data management and analysis: Since so many different devices for preparing and analysing survey data are available, good choices are only likely to be made if the analysts themselves are sufficiently fluent across a range of options that they can reasonable evaluate them and choose between them. An adequate professional research standard requires, therefore, that the range of important data management choices (such as are highlighted in the pages above) be widely known across the research community, and their omission from consideration regarded as unscholarly.
- Mechanisms to reward work which follows good practice in data management, and to penalise work which does not: The various expectations of good practice sketched above are all feasibly achieved, but they often come at the significant time cost of both initial training, and subsequent extended scholarly endeavour. Accordingly, it is not easy to achieve these standards, and less scrupulous analyses which cut various corners are naturally left at an advantage in producing results more quickly. This can generate an inconsistency in the relationship between rewards and standards, which might only be addressed if those in positions to allocate rewards recognise and value the importance and challenges of efforts devoted to data management work.
Implicit in many of the above points is the claim that much contemporary research does not adhere to adequate professional standards. For example, many survey analysis studies select variables on fairly ad hoc grounds, and do not engage with previous studies, nor use sensitivity analysis, to inform their choice; many studies do not provide clear documentation of the syntax files used to generate their results, nor demonstrate selection of methods from a fluent knowledge of feasible alternative. Our rather tough position, therefore, is that scientific standards would be improved if criteria existed that allowed researchers and reviewers alike to identify such weaknesses, and aspire to higher standards.
We've included these components of professional standards for social survey research becuase they have emerged as important methodlogical principles during our consideration of data management as a methodology. The above points might not be perfectly defined or comprehensive, but we believe they reflect the most important components of the relevance of data management within the survey research process. We think that the points above are not widely recognised or acted upon within the existing research environment, and so we see the major challenge for contemporary social survey research as persuading the wider research community to accept these or similar professional standards, and to adequately enforce them.

References cited.
- Breen, R. (Ed.). (2004). Social Mobility in Europe. Oxford: Oxford University Press.
- Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.
- Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), 153-171.
- Huff, D. (1954). How to Lie with Statistics. London: Gollancz.
- Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.
- Lambert, P. S., & Gayle, V. (2008). Data management and standardisation: A methodological comment on using results from the UK Research Assessment Exercise 2008. Stirling, University of Stirling: Technical Paper 2008-3 of the Data Management through e-Social Science Research Node (www.dames.org.uk).
- Prandy, K. (1965). Professional Employees. London: Faber and Faber.
- Prandy, K. (1986). Similarities of Life-style and the Occupations of Women. In R. Crompton & M. Mann (Eds.), Gender and Stratification (pp. 137-153). Cambridge: Polity Press.
- Steuer, M. (2003). The Scientific Study of Society. Boston: Kluwer Academic.
- Townsend, P. (1979). Poverty in the United Kingdom : A survey of household resources and standard of living. London: Allen Lane.
- University of Essex, & Institute for Social and Economic Research. (2010). British Household Panel Survey: Waves 1-18, 1991-2009 [computer file], 7th Edition. Colchester, Essex: UK Data Archive [distributor], July 2010, SN: 5151.










