GEMDE: Grid Enabled ethnic Minority Data Environment
This page describes the 'GEMDE' service (Grid Enabled ethnic Minority Data Environment). It is one of three related provisions developed in the DAMES Node under the umbrella term 'GESDE' (Grid Enabled Specialist Data Environments; introduction to GESDE).
GEMDE is a resource for supporting quantitative data broadly related to the analysis of ethnicity for social science research (spanning topics such as ethnic itentity, nationality and national origins, religion, immigration, language). It boils down to a means of providing access to organised, easy to use statistical resources on the topic. We refer to these resources as either 'MUGs' or 'MIRs' (we'll explain these terms below..)
This page describes GEMDE and features instructions on using its services. The resource itself is hosted in an online 'portal' environment: Go straight to the GEMDE portal (needs a login). We also have a help page with more details on using the service.



News
Occassional updates on GEMDE or selected other relevant activities are posted here...
| External seminar, 11th March 2011, Manchester |
Promoting methodological innovation and capacity building in research on ethnicity
An ESRC-funded NCRM Network for Methodological Innovation Researching ethnicity: what, why and how? Friday 11 March 2011 Manchester Conference Centre, Sackville Street, Manchester The conference highlights some of the methodological issues identified during the previous workshops and offers an opportunity for reflection and discussion. We welcome participants from across all sectors. There is no charge for attendance and we can offer assistance with travel if needed. We also welcome offers of posters related to the topics of this conference. The conference facilities are flexible on the size of the posters that can be presented. However, posters that are around A1 size (approx 59cm x 84cm) are preferred. If you would like to present a poster, could you please submit a short abstract describing the proposed content to: gillian.meadows@manchester.ac.uk. To view the programme and book a place for this event please go to http://www.methods.manchester.ac.uk/events/2011-03-11/ |

BACKGROUND
There are several reasons why it can be difficult to undertake statistical analysis of concepts related to ethnicity in social survey research. There are different views about what should be measured as ethnic differences (e.g. language, religion, ethnic identity, somantic difference, etc); and categorisations tend, in any case, to change over time or between contexts. In addition, many dataset don't (seem to) adequately support analysis of all the minority groups we might in theory have been interested in, such as when some minority categories are only represented by a few cases. Finally, many ethnic differences are strongly correlated with other socio-demographic differences, such as in terms of population age structures (as illustrated in Figure 1). Such correlations make the satisfactory interpretation of statistical patterns of difference very challenging.
Researchers do often find solutions to these problems. They are usually achieved by significant effort in 'data management', such as by recoding ethnicity categories or defining new ethnicity classifications. In the Data Management through e-Social Science (DAMES) Node our general objective is to promote improved standards of data management in the social sciences in terms of consistency and collaboration between research projects. In GEMDE, we try to do this by:
- Documenting different classifications of ethnicity (what we call 'MUGs')
- Facilitating access to statistical data about classifications (what we call 'MIRs')
- Facilitating preparation and analysis of data on ethnicity (by using MUGs and MIRs)
GEMDE is one of three related services provided in DAMES under the umbrella term of 'GESDE' (Grid Enabled Specialist Data Environments). They are 'Grid Enabled' because they use an 'e-Science' approach to organise and distribute their information (see also FAQ 7). The GESDE applications cover:
- Occupations: The GEODE service (e.g. use GEODE to perform a transformation from the occupational unit group measure ISCO-88 to a derived occuption-based social classification)
- Education: The GEEDE service (e.g. use GEEDE to look up summary information on common qualification types in the UK in the 1960s)
- Ethnic minorities: The GEMDE service (e.g. use GEMDE to find out how to reclassify the 13 category version of the UK Census 2001 ethnic group question into a simpler 4 category version)
The unifying theme of all the GESDE services concerns effective collaboration. We are aware that there are expert researchers in the social sciences who develop their own modifications to data on ethnicity (and occupations and educational data). We seek to provide resources to allow experts to share their modifications, and to allow other researchers to obtain them, documenting and citing their provenance. We talk more about the DAMES Node's GESDE project in our technical paper Lambert et al (2008). We also gave a talk on the background to GEMDE and its current features at the NCRM methods network seminar on ethnicity at University of Essex, May 2010 (ppt slides).

HOW TO USE GEMDE
We're still developing GEMDE. Please contact us with feedback/comments on the service and its usability.
GEMDE is intended to be a resource for social scientists undertaking research which involves data on ethnicity. There are generally two ways in which you might want to use GEMDE. Either to deposit data, such as if you have created a new information resource as a by-product of your work; or to search for and access data. In general, many more users of GEMDE are interested in the latter.
1) Entering the GEMDE portal
ACCESS THE GEMDE PORTAL (Link tries to open new window)
The portal is an internet site which allows us to put various controls on access to data, and to integrate that access with particular programmes for searching, reviewing, uploading or analysing data. In GEMDE, we use a portal system known as 'Liferay'.
You can enter the portal either as a Guest or as a named user. The former doesn't require individual level authentication [guest login instructions]. The latter requires you to identify yourself through Shibboleth security (instructions on named user authentication). When you login as a guest, you can search for and access data resources stored at GEMDE. When you login as a named user, you can also upload resources to GEMDE, provide quality ratings, and you may be permitted to access certain secure data resources.
Once you have entered the portal, you will see a number of links to different resources associated with DAMES. The GEMDE tab (or 'portlet') covers resources linked with this service; other tabs indicate other resources linked to DAMES.
2) Finding data with GEMDE
You can use the 'Browse' and 'Search' options in the GEMDE portlet to search for MUGs and MIRs.
Did you remember..? MUGs are 'ethnic Minority Unit Groups', they are systematic lists of categories of ethnicity scheme. MIRs are ethnic Minority Information Resources, that is, databases of summary data about MUGs.
Other types of data and information can also be found at GEMDE, including a resource for obtaining certain bespoke summary statistics from fixed microdata records, and information about the quality/ratings of resouces linked to GEMDE. Ways of finding this data are described on the portal itself, and in our extended (GEMDE help page)
[For additional instructions on the current prototype, see our workshop materials ]
3) Depositing data with GEMDE
We enourage you to submit new data resources to GEMDE (e.g. information files you've got that relate to ethnicity research in the social sciences, perhaps produced as a by-product of a recent research project).
Why should you bother? Being altruistic helps, but there are other good reasons to send in information files to GEMDE! First, it's good scientific practice to disseminate documentation and information on the data files or schemes used in your research - indeed sending your materials in to GEMDE may be a very effective way for you to conduct your work in accordance with replication policies of journals, ethics boards and the like. Second, publishing supplementary materials has the potential to publicise your own research, such as to encourage citations of your own work or data files.
Submitting data at GEMDE usually comes in two steps:
The first stage of data entry is designed as a means of sending in basic information about the resource with minimal hastle for the depositor (there's a form with information we'd like to get about the resource, but we've kept it as short as possible). You can also upload the data file(s) to the GEMDE system at the same time. At the second stage, there is an opportunity to provide much more information about the resource (i.e. 'metadata' about the resource), as well as to edit or ammend earlier recors. In practice, it is quite common that some of the details for the second stage are actually filled in by members of the GEMDE project, rather than the original supplier of the data.
[For instructions on using the current prototype, see our workshop materials ]

WORKSHOP ON GEMDE PROTOTYPE SERVICES, 28 JANUARY 2010
Access the workshop materials:
| Presentations |
1. Introduction to data on ethnicity (
pdf;
ppt
). 2. Introduction to GEMDE ( pdf; ppt ). 3. GEMDE features and demo ( pdf; ppt ). 4. Review and consultation ( pdf; ppt ). |
| Lab session |
Lab handout (
pdf;
doc
). Questionnaire ( pdf; doc ). Stata command file examples ( gemde_bhps_examples.do ). Example macro (recoding BHPS ethnic groups) bhps_ethnicity_cmbined.do ). Example MUGs [folder] ). Example MIRs [folder] ). |

FREQUENTLY ASKED QUESTIONS
1) What's a MUG?
- We use this acronym for 'Minority Unit Group'. This is our own terminology which draws parallels with the better known 'Occupational Unit Group' (see listing of OUGs at the GEODE site). A Minority Unit Group is any systematic listing of the categories of a measure of ethnicity.
- Example: In the UK, the Census ethnic group questions are well known examples of MUGs. E.g., the 2001 Census ethnic group question for England and Wales constitutes a MUG with the following principal categories: "British", "Irish", "Other White", "White and Black Caribbean", "White and Black African", "White and Asian", "Other Mixed", "Indian", "Pakistani", "Bangladeshi", "Other Asian", "Black Caribbean", "Black African", "Other Black", "Chinese", "Other Ethnic Group" (see e.g. the SARS listing of this variable).
2) What's a MIR?
- We use this acronym for 'Minority Information Resource'. This is our own terminology. By a MIR, we mean any piece of information which supplies systematic data on a minority unit group (MUG) classification. We've used this term to be deliberately similar to the phrase 'Occupational Information Resources' we used on the GEODE project to describe databases of information about occupations (cf. Lambert et al. 2007).
- Typcial examples of MIRs are summary statistical data about the categories from a MUG (e.g., the table of median ages for each ethnic group, which is summarised in figure 1 above); and documentation or information about recodings to MIRs which have been used in a particular study (e.g., a Stata syntax command file which tells you how a I recoded the UK 2001 census MUGs into the three category division of 'White', 'Asian' and 'Other' which I used in the analysis presented in a paper).
- Social scientists are not in general aware of the existence of many MIRs (when compared say to widespread use of popular Occupational Information Resources, such as, for example, Harry Ganzeboom's ISEI and EGP translation files). There are various useful data resources in existence, however, which we seek to publicise in GEMDE. We argue that better communication and dissemination of MIRs is in fact an important step towards better scientific practice of replication and standardisation of research.
- So, in our terms, every MIR necessarily links to a MUG (but not every MUG has a MIR). Keep up!
3) What sort of data can I get at GEMDE? What coverage is there - in terms of countries and time periods?
- GEMDE coordinates and supplies aggregate data and metadata about measures of ethnicity. Examples are tables of summary statistics on ethnic categories (MIRs), and definitions of ethnic categories themselves (MUGs). We don't supply any survey microdata (we assume you get your microdata elsewhere, such as from a data archive).
- GEMDE resources cover any data related to the quantitative analysis of ethnicity in social science research. We don't impose any limits on this, and data can cover the various related concepts such as ethnic identity, language, nationality or national origins, religion, and somantic differences.
- Due to our UK base, we have a de facto weighting towards contemporary survey data resources from the UK. Indeed, the GEMDE portal should contain listings of all ethnic group categorisations used on major social surveys in the UK over the last 20 years.
- We also support numerous data resources from other countries and/or time periods. We don't impose any restrictions on the scope of data resources which may be registered with GEMDE, so long as they can broadly described as 'MUGs' or 'MIRs'!
4) Aren't there other services already giving this information?
- We really should stress the GEMDE isn't the only place where you can find online resources related to the analysis of survey data on ethnicity. Many data providers or research support services provide similar information. Many GEMDE resources are themselves nothing more than links onwards to externally published resources. Our top suggestions for other useful links are:
- ESDS Government, theme on Ethnicity : Advice for contemporary UK datasets
- National Statistics Harmonisation Guide : UK contemporary standards
- UK Data Archive : Search data resources, and access documentation on ethnicity measures
- IPUMS internatonal ethnicity codes : Lists ethnicity categories for numerous international census datasets
- GEMDE's contribution is about supporting the collation of a wide range of information from different sources, and supporting its subsequent distribution to researchers. Therefore, in many instances GEMDE acts merely as a linkage between researchers and existing data resources. Nevertheless, in the course of developing GEMDE we have also undertaken our own research which led to us generate many original data resources themselves (i.e. MIRs and MUGs).
- We really should stress the GEMDE isn't the only place where you can find online resources related to the analysis of survey data on ethnicity. Many data providers or research support services provide similar information. Many GEMDE resources are themselves nothing more than links onwards to externally published resources. Our top suggestions for other useful links are:
5) How do I know that GEMDE resources are of a good quality?
- GEMDE allows almost anybody, so long as they register with us, to deposit data resources in the portal. This approach opens the door to the publication of poor quaity resources, as well as potential duplication in publications. We are deliberately 'pluralist' in our approach insofar as we want to support all approaches, even though we recognise that this runs the risk of disseminating problematic materials. Ultimately we put the onus on the researchers using GEMDE to evaluate for themselves the data they are accessing. However, we have also taken several steps to ensure quality control on resources:
- We are developing 'quality ranking' measures on the GEMDE service
- We monitor resources supplied to GEMDE and manually edit or remove problematic material
- We provide citation instructions which should serve to guarantee the replicability of use of resources (and easy substitution if resources are revised
- GEMDE allows almost anybody, so long as they register with us, to deposit data resources in the portal. This approach opens the door to the publication of poor quaity resources, as well as potential duplication in publications. We are deliberately 'pluralist' in our approach insofar as we want to support all approaches, even though we recognise that this runs the risk of disseminating problematic materials. Ultimately we put the onus on the researchers using GEMDE to evaluate for themselves the data they are accessing. However, we have also taken several steps to ensure quality control on resources:
6) Why is it called 'GEMDE'
- The initials are for 'Grid Enabled ethnic Minority Data Environment'.
- We use an 'M' in the middle because we already have a different service called 'GEEDE' ('Grid Enabled Educational Data Environment'). The M stands for 'minority'.
- See FAQ 7 for what the terms 'Grid Enabled' and 'Data Enivornment' entail.
7) Why do you refer to the 'Grid' and to 'e-Science'
- GEMDE is one of a number of projects in the DAMES Node which seeks to exploit e-Science tools and services in order to help with 'data management' for social science research. e-Science refers to making use of a broad range of developing information technologies associated with enhanced online communication (though some people would describe it slightly differently - see .... for a review).
- 'Grid Enabled' is nowadays a somewhat old-fashioned way of referring to data or resources which exploit an e-Science approach. The 'Grid' is a concept of electronic collaboration and coordination which is central to e-Science, but most people in the field nowadays suggest that term 'e-Science' involves more than the Grid alone.
- The phrase 'Data Environment' is used to refer to the coordinated sructure linking information resources and data files which is common to all three of our 'GESDE' provisions.
8) Why is this work important?
- Our motivation is that we believe that there is, in general, room for improvement in how social science researchers deal with data on ethnicity. We think that many analyses of ethnicity data don't manage to take full advantage of the complex and multifaceted concepts that are, substantively, of interest
- Things are multifaceted because many different social dimensions of difference related to ethnicity can be identified (e.g. language skill, immigrant status or immigrant 'generation', religion, identity, visibility). The ideal analysis would describe the position of people in various different situations defined by all combinations of these measures, but few published analyses are able to work at such levels of disaggregation.
- Things are complex because ethnic minority groups also tend to have numerous other distinctive aspects to their socio-demographic/socio-economic profiles (such as age profiles, regional settlement patterns, educational experiences, intergenerational relationships). These typically interact with the processes under study. They ought to be fully controlled for in analysis, but it is not easy to achieve this in many circumstances.
- In GEMDE, we can't immediately solve all the difficulties of working with quantitative data on ethnicity, but we think that we can open doors to higher standards of analysis in a number of ways
- Supporting easy documentation and replication in analysis (e.g. to be able to tell how an earlier analysis used measures of ethncity, and transform alternative data in exactly the same way)
- Supporting innovative treatments of measures of ethnicity which (arguably) improve the chances that an analysis recognises the complex and multifaceted nature of ethnic distinctions (e.g. providing data for scaling ethnic categories)
- Promoting understanding of the statistical properties of ethnic groups (e.g. making it easier for researchers to recognise how the distinctive age profiles of minority groups affect descriptive comparisons between groups).
9) What is the difference between a 'supplier' and an 'original creator'?
- Information resources on ethnicity stored on GEMDE are sometimes deposited by the people that also created them (for example if you generated some summary statistics on ethnic groups during your research, and wish to disseminate them via GEMDE). Alternatively, they may be supplied to GEMDE by somebody who finds them useful, but who didn't themselves originally produce them (e.g., you might have transcribed some data on ethnic groups definitions from a journal article, which you think will be useful to other researchers). Therefore in GEMDE we distinguish between the 'supplier' (the person who sends the information into GEMDE, via our portal), and the 'original creator' (who is the author of the resource, and who may or may not be the same person who is supplying the data to GEMDE). When you supply data to GEMDE, please do let us know whether you are the original creator of the data, and if not, provide us with information on who is!

CONTACT
We're still developing GEMDE. Please contact us with feedback/comments on the service and its usability. Feel free to tell us...
- Did you understand what we're trying to do here..?
- What needs to be clearer?
- Did the portal work when you used it?
- What problems did you find? How could we fix them?
- Have we forgotten something?
GEMDE is work undertaken by the Data Management through e-Social Science Node, supported by the ESRC. The main work in developing the GEMDE service has been undertaken by Tom Doherty, Susan McCafferty, and Paul Lambert (see DAMES Node personnel). Many others from the Node have also contributed to the resource, especially Richard Sinnott, John Watt, Vernon Gayle, Alison Bowes, Larry Tan and Guy Warner.
- To send us feedback on GEMDE, we suggest emailing or contacting Paul Lambert in the first instance:
- Alternatively, there is also a feedback form within the GEMDE portal itself.
Workshops/Outreach events/Publications
We will be presenting GEMDE in several workshops and presentations in the next years, and the service is desribed in some of our papers from the DAMES Node. For latest outputs see:

References
Bosveld, K., Connolly, H., & Rendall, M. S. (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.
Khattab, N. (2009). Ethno-religious Background as a Determinant of Educational and Occupational Attainment in Britain. Sociology, 43(2), 304-322.
Lambert, P. S. (2005). Ethnicity and the Comparative Analysis of Contemporary Survey Data. In J. H. P. Hoffmeyer-Zlotnik & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 259-277). Manheim: ZUMA-Nachrichten Spezial 11.
Lambert, P.S., Gayle, V., Tan, K.L.L., Blum, J.M., Bowes, A., Jones, S., Turner, K.J., Warner, G., Sinnott, R.O. and Bihagen, E. (2008). Grid Enabled Specialist Data Environments: Forward Planning for the GE*DE Services for Specialist Data on Occupations, Educational Qualifications, and Ethnicity, University of Stirling: Technical Paper 2008-1 of the Data Management through e-Social Science Research Node (www.dames.org.uk).
Lambert, P.S., Tan, K.L.L., Turner, K.J., Gayle, V., Sinnott, R.O. and Prandy, K. (2007). 'Data Curation Standards and Social Science Occupational Information Resources', International Journal of Digital Curaton, 2(1) 73-91.
Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], SN: 5666.
Last updated 18/DEC/2010, by Paul Lambert


