Project Themes

The work of DAMES involves developing four groups of social science provisions. These comprise work on:

alongside three more specialist topics in social science research:

All of these projects are concerned, in different ways, with tasks of 'data management'. This term is sometimes used for different purposes, but in the DAMES Node we use it to refer to activities, typically undertaken by social researchers themeselves, concerned with manipulating data. Examples include checking data for inconsistencies (‘cleaning data’); linking together different data files; and coding or recoding measures (‘operationalising variables’). Such tasks are a substantial and often challenging part of social research.

These four social science themes run alongside four groups of computer science research activities which focus on using e-Science approaches to deliver these provisions. Administratively, the Node therefore has 8 research themes, as follows:

1.1) Grid Enabled Specialist Data Environments 2.1) Description, discovery and use of data and services through use of metadata and data abstraction
1.2) Data Resources for Micro-Simulation on Social Care Data 2.2) Techniques to handle data from multiple sources
1.3) Linking e-Health and Social Science Databases 2.3) Workflow modelling for social science
1.4) Training and Interfaces for Management of Complex Survey Data 2.4) Security driven data management
Below we provide some further details on the 8 research themes of DAMES.


Meanings of the terms 'data management'.



Theme 1.1: Grid Enabled Specialist Data Environments

This theme (which we abbreviate as 'GESDE') is in many ways a direct extension to a previous project which involved most of the same researchers, called 'Grid Enabled Occupational Data Enivronment' (GEODE). That project began in October 2005 (supported by an ESRC Small Grant in e-Social Science). The GEODE services and website are now updated as part of the DAMES Node, with services available from : http://www.geode.stir.ac.uk/.

The GESDE theme deals with specialist social science data on occupations, on educational qualifications, and on ethnicity. Our basic argument goes that in each area, there is quite a lot of specialist social science data - such as, for example, data on how to use occupations to classify people into a social class scheme. Many datasets on these topics are potentially of use to a wide group of social scientists. However, it is frequently the case that some degree of specialist knowledge - and manual effort! - is required before a researcher can effectively access and exploit relevant specialist data. Therefore, in DAMES, we are interested in facilitating access to, and the distribution of, such specialist data.

The precise way we provide services in these areas varies slightly for each topic, reflecting slightly different types of data and different user requirements. We published a description of the requirements and objectives of the GESDE services as a technical paper of the Node in Lambert et al. (2008).

The lead in Theme 1.1 is taken by Paul Lambert and Vernon Gayle (see Personnel). Paul Lambert's research background is particularly focussed upon data on occupations and on ethnicity; Vernon Gayle has worked for many years on projects concerned with data on educational qualifications

Theme 1.2: Data Resources for Micro-Simulation on Social Care Data

Theme 1.2 is concerned with linking together data from different sources which is relevant to the analysis of social care needs in the UK - with a particular focus upon analysis through 'micro-simulation modelling'.

Theme 1.2 has a very specific focus which will contribute to research on social care needs and demographic trends - for instance in terms of the costing of social service interventions for older populations, and their effectivenes. It is also hoped that the model for data linkage developed in this theme will be instructive to a wider range of application areas.

A significant activity in this Theme is about finding approaches to effectively link together different types of data which can inform the same analyses. This include:

The lead in Theme 1.2 is taken by Alison Bowes, David Bell and Alison Dawson (see Personnel). David Bell (Dept Economics) led a recent project on micro-simulation (see OPERA). Alison Bowes and Alison Dawson (Dept Applied Social Science) have backgrounds in the collection, review and analysis of different forms of social care data.

Theme 1.3: Linking e-Health and Social Science Databases

This theme focussed upon the topic of health inequalities and how social science data can be utilised in conjunction with e-Health data across a wide spectrum of clinical, biomedical and health related fields. At issue here is the existence of 'large and complex' datasets both with a social science focus (e.g. large scale social surveys) and a medical focus (e.g. morbidity records). Many of these datasets are not readily accessible to other researchers and tend largely to be analysed in isolation. Our interest in the DAMES Node lies in the processing of such data; its secure storage and access; and the possibilities for data linkaqe or enhancement between resources.

There is a great deal of e-Health data relevant to this sort of enquiry. There are also several other relevant ongoing research activities in the UK which are exploring similar issues, including the Scottish Health Informatics Programme, the Secure Data Service, and the Methodbox/Obesity e-lab project, with all of which we sought to build complementary services. Our activities covered:

The lead in Theme 1.3 is taken by Margaret Maxwell and Nadine Dougall (representing social and health research communities) and Richard Sinnott (representing Computer Science) (see Personnel). Margaret Maxwell and Nadine Dougall work in health inequalities research with particular interests in mental health outcomes. Richard Sinnott has interests in data storage and access infrastructures with bio-medical applications, and has projects in this area in UK applications, with his colleagues at the National e-Science Centre at the University of Glasgow, and in Australia, in his recent appointment as director of eResearch at the University of Melbourne.

Theme 1.4: Training and Interfaces for Management of Complex Survey Data

This theme involves programmes of training activities, and the development of generic services, for data management of complex social survey data. Information on our workshops, and links to resources and materials that we've made available, are on our 'Workshops and capacity building resources' page.

Theme 1.4 is more wide ranging than the other social science themes. Its generic provisions are designed to complement and generalise the provisions developed under themes 1.1, 1.2 and 1.3.

Theme 1.4 is led by Paul Lambert and Vernon Gayle (see Personnel). Whenever possible, activities and services from this theme have been coordinated with other major UK led capacity building activities, such as programmes within the ESRC NCRM and RDI initiatives, and in particular with other programmes covering quantitative methods and e-Social Science. At various points we've coordinated, for instance, with the projects Applied Quantitative Methods Network (AQMeN); the National e-Infrastructure for Social Simulation; the Lancaster-Warwick-Stirling NCRM Node; the training project Longitudinal Data Analysis for Social Science Researchers; and the training project Scottish Social Survey Network.

Theme 2.1: Description, discovery and use of data and services through use of metadata and data abstraction

This theme will address the challenges involved in providing easy but secure access to distributed heterogeneous data resources. An important consideration is to develop approaches which are compatible with the standards adopted by major social science data providers, such as the UK Data Archive.

The theme has specified work-packages covering metadata support; data abstraction; semantically-based data discovery; and data usability.

The lead in Theme 2.1 is taken by Ken Turner, Jesse Blum and Guy Warner (see Personnel). Slides from an introductory talk on metadata which were prepared by Jesse Blum for a social sicence audience are available here (ppt).

Theme 2.2: Techniques to handle data from multiple sources

This research theme arises from the observation that social science datasets are often distributed, disaggregated and uncoordinated. Specialist data such as is examined in Themes 1.1, 1.2 and 1.3 is often stored in differing formats, with differing metadata descriptions, requiring differing access techniques. Where such datasets hold related data, the social science researcher faces considerable challenges in extracting the information that they require form different sources, and merging the data into a uniform body. This theme investigates ameliorating this problem through providing grid services that "virtually fuse" disparate data sources in order to answer research questions through uniform query processing.

Work-packages within this theme concern 'data abstraction techniques', 'data fusion techniques', and 'query processing'. The lead in Theme 2.2 is taken by Simon Jones and Guy Warner (see Personnel).

Theme 2.3: Workflow modelling for social science

'Workflows' are often examined within an e-Science framework as an approach which allows for the recording of patterns and processes within research activities, and building services in response to those patterns. In this theme, work will look at describing and supporting workflow models appropriate to social science tasks of data management. Examples might include the sequential steps involved in the manipulation then analysis of social survey datasets; or the progression from data access, to linking together datasets, to undertaking analyses on the new linked dataset.

The lead in Theme 2.3 is taken by Ken Turner, Simon Jones, Larry Tan and Guy Warner (see Personnel).

Theme 2.4: Security driven data management

This work will build on the themes 2.1, 2.2 and 2.3 and focus on supporting social science scenarios requiring finer grained security. This will leverage a range of technology-oriented projects at NeSC Glasgow such as SPAM-GP, VPman, DyVOSE and GLASS combined with a range of clinical, epidemiological and geographic information system research projects at NeSC Glasgow such as VOTES, SFHS, EuroDSD where linkage with social data could greatly benefit research capacity.

Ultimately our focus in this theme is to draw the results of the other themes together and demonstrate the added value offered by the DAMES infrastructure. Key areas of added value we will focus on is far richer linkage of data resources and services for the social science community, and the usability of the DAMES infrastructure for accessing and sharing data resources.

Theme 2.4 is led by Richard Sinnott, John Watt and Susan McCafferty (see Personnel).

Last update: 6/JUN/2010