*******************************************************. *** Data Management through e-Social Science ** ** www.dames.org.uk ** ** ** ESRC DAMES Research Node, data management training programme: ** ** ** Workshop of 24-25 August 2009 ** Data Management for Social Survey Research ** ** ** LAB 2: ADVANCED DATA MANAGEMENT WITH STATA ** SUBFILE: RAE_EXEMPLAR.DO ** ** ** www.dames.org.uk ** Paul Lambert / Vernon Gayle, 24 August 2009 *******************************************************. *******************************************************. *******************************************************. ****** LAB 2 SUB-FILE: RAE_EXEMPLAR.DO *******************************************************. ************ ** Background: This example file correpsonds to analyses which were * used within the working paper: * Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: * A methodological comment on using results from the UK Research Assessment * Exercise 2008. Stirling: University of Stirling, Technical paper 2008-3 of * the Data Management through e-Social Science research Node (www.dames.org.uk) * * Within this example file, comments such as [stata1] correspond to references * within the above working paper. * * The RAE data is convenient to access and exhibits many features that ultimately * make it quite complex, so it serves as a good exemplar. * To readers who work in HE, it is also an intrinsically interesting analysis, though * it might possibly bore the socks off those from other backgrounds... * ************ *******************************************************. ** NOTIFICATION OF FILE LOCATIONS / DIRECTORIES AND STATA SETUP ** ** ** ** i) File location declarations: *** For the commands below to work, you should begin by running the following ** macros, which tell Stata where to look for the relevant data files (mentioned ** above) on your machine : . global path1 "d:\dames09\data\rae\" ** * The location of a folder where you have saved the * open access online data files on the RAE results: * rae2008_1.dta * rae2001_1.dta * Main table of 2008 RAE results.xls ; * these files are also available from * http://www.longitudinal.stir.ac.uk/workshop_materials.html; the last one is free to * download from http://www.rae.ac.uk/ * * This folder also needs to hold copies of the * following data and command files: * uoa0108recode.do * uni_typology.dta ; * these files are also available at * http://www.longitudinal.stir.ac.uk/workshop_materials.html ***** global path2 "d:\dames09\work\" * (the location of your working directory - where you will save * newly created data files and output) . global path9 "d:\dames09\temp\" * (a location of a temporary folder where you can save intermediate files) . ****************************************************************************. ****************************************************************************. ************************************* ******** [Stata-1] ** Opens Stata format data file derived from the downloadable excel file ** as produced in [SPSS-1] ************************* use $path1\rae2008_1.dta, clear summarize ************************************* ************************************* ******** [Stata-2] ** Inspecting RAE quality profile measures use $path1\rae2008_1.dta, clear summarize capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1*1) / 100 capture drop power_1 gen power_1 = gpa_1*fte summarize gpa_1 power gsort -gpa_1 -power_1 tab uoaname if uoa==40 list heiname fte s* uc gpa_1 power_1 if uoa==40 numlabel _all, add tab hei3 capture drop scot gen scot=(hei3==25 | hei3==26 | hei3==27 | hei3==32 | /// hei3==55 | hei3==61 | hei3==64 | hei3==73 | /// hei3==86 | hei3==92 | hei3==93 | hei3==107 | hei3==115 | hei3==111 | /// hei3==138 | hei3==139 | hei3==140 | hei3==147 ) tab heiname scot if scot==1 list heiname fte s* uc gpa_1 power_1 if uoa==40 & scot==1 list uoaname fte s* uc gpa_1 power_1 if hei3==139 ************************************* ************************************* ******** [Stata-3] ** Average measures of GPA use $path1\rae2008_1.dta, clear summarize capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1*1) / 100 histogram gpa_1, title(Unweighted) yscale( range(0 2)) graph save bit1, replace gen ftei=floor(fte+1) histogram gpa_1 [fweight=ftei], title(Weighted by FTE) yscale( range(0 2)) graph save bit2, replace graph combine bit1.gph bit2.gph graph export $path2\gpa_histograms_1.emf, as(emf) replace ************************************* ************************************* ******** [Stata-4] ** Mean numbers of UoA submissions by HEIs (ignoring multiple submissions): use $path1\rae2008_1.dta, clear summarize gen uoa_u=1 collapse (sum) uoa_u, by(hei3) summarize tab uoa_u codebook uoa_u tab hei3 if uoa_u == 1 tab hei3 if uoa_u == 53 codebook uoa_u if uoa_u >= 2 histogram uoa_u , title("Number of UoA's, all HEI's") yscale( range(0 0.08)) graph save bit1, replace histogram uoa_u if uoa_u >= 2, title("Number of UoA's, exlcuding specialists") /// yscale( range(0 0.08)) graph save bit2, replace graph combine bit1.gph bit2.gph, title(RAE 2008) graph export $path2\uoas_1.emf, as(emf) replace table hei3 if uoa_u >= 40, c(mean uoa_u) ************************************* ************************************* ******** [Stata-5] ** a) Data linkage and enhancements: RAE data linked with number of UoA's per * institution and external database of University typologies use $path1\rae2008_1.dta, clear summarize gen uoa_u=1 collapse (sum) uoa_u, by(hei3) summarize sort hei3 sav $path9\m1.dta, replace use $path1\uni_typology.dta, clear summarize sort hei3 sav $path9\m2.dta, replace use $path1\rae2008_1.dta, clear summarize sort hei3 merge hei3 using $path9\m1.dta drop _merge summarize sort hei3 merge hei3 using $path9\m2.dta drop _merge sav $path1\rae2008_2.dta, replace ** b) Data used to show difference between university types use $path1\rae2008_2.dta, replace summarize table oxbridge [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table ancients [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table civic [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table redbrick [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table plate [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table sixties [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table russell [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table group94 [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table pre92 [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table poly [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table nonpoly [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) table nonuni [aweight=fte], c(mean s4 mean s3 mean s2 mean s1) capture drop wt gen wt=s4*(fte/100) graph hbar (mean) oxbridge ancients civic redbrick plate sixties russell group94 pre92 poly nonpoly /// [aweight=wt], title("Proportion of staff in 4-star category from..") ** c) Correlates between GPA, number of UoA submissions, & Unitypes use $path1\rae2008_2.dta, clear capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1*1) / 100 summarize label variable uoa_u "Number of UoAs in HEI" label variable gpa_1 "Grade Point Average" scatter gpa_1 uoa_u correlate gpa_1 uoa_u oxbridge ancients russell scatter gpa_1 uoa_u, title(All) yscale( range(0 4)) xscale( range(0 50)) graph save bit0 , replace scatter gpa_1 uoa_u if russell==1, title(Russell) yscale( range(0 4)) xscale( range(0 50)) graph save bit1 , replace scatter gpa_1 uoa_u if sixties==1, title(Sixties) yscale( range(0 4)) xscale( range(0 50)) graph save bit2 , replace scatter gpa_1 uoa_u if group94==1, title(94 Group) yscale( range(0 4)) xscale( range(0 50)) graph save bit3 , replace scatter gpa_1 uoa_u if poly==1, title(Poly) yscale( range(0 4)) xscale( range(0 50)) graph save bit4 , replace scatter gpa_1 uoa_u if pre92==0, title(Post-92) yscale( range(0 4)) xscale( range(0 50)) graph save bit5 , replace graph combine bit0.gph bit5.gph bit4.gph bit1.gph bit3.gph bit2.gph graph export $path2\uoas_2.emf, as(emf) replace ************************************* ************************************* ******** [Stata-6] ** Mean standardisations using conventional gpa-measure use $path1\rae2008_2.dta, clear capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1*1) / 100 sav $path9\m1.dta, replace summarize gpa_1 collapse (mean) mean1=gpa_1 (sd) sd1=gpa_1 [aweight=fte], by(uoa3) summarize sort uoa3 sav $path9\m2.dta, replace use $path9\m1.dta, clear sort uoa3 merge uoa3 using $path9\m2.dta tab _merge drop _merge capture drop gpa_2 gen gpa_2 = 50 + (15*((gpa_1 - mean1) / sd1)) summarize gpa_2 [aweight=fte] * Example: standardised subject rankings at Stirling university list uoa3 s4 s3 s2 s1 fte gpa* if hei3==139 table uoa3 if hei3==139, c(mean s4 mean s3 mean fte mean gpa_1 mean gpa_2) format(%4.2f) * Example: ranking within 1960 universities using within uoa standardised scores graph hbar (mean) gpa_2 if sixties==1 [aweight=fte], /// over(hei3, label(labsize(vsmall)) sort(1) ) title(1960's: HIE mean gpa's standardised within UoAs) ************************************* ************************************* ******** [Stata-7] ** Comparison of three related panels use $path1\rae2008_2.dta, clear numlabel _all, add capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1*1) / 100 tab uoa3 tab hei3 if uoa3==40 table hei3 if uoa3==14 table hei3 if uoa3==49 table uoa3 if uoa3==49 | uoa3==14 | uoa3==40 [aweight=fte] , /// c(mean s4 mean s3 mean s2 mean s1 mean gpa_1) format(%6.3g) ************************************* ************************************* ******** [Stata-8] ** Calculation of UoA level summary measures and their mapping to unit level data use $path1\rae2008_2.dta, clear capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1*1) / 100 sav $path9\m1.dta, replace summarize gpa_1 russell collapse (mean) mean1=gpa_1 (sd) sd1=gpa_1 (mean) rusmean=russell [aweight=fte], by(uoa3) summarize label variable rusmean "Proportion of UoA members from Russell group HEIs" sort uoa3 sav $path9\m2.dta, replace use $path9\m1.dta, clear sort uoa3 merge uoa3 using $path9\m2.dta tab _merge drop _merge capture drop gpa_2 gen gpa_2 = 50 + (15*((gpa_1 - mean1) / sd1)) label variable gpa_2 "Within UoA standardised GPA" summarize gpa_2 summarize gpa* summarize gpa* [aweight=fte] sort uoa3 sav $path9\temp2.dta, replace summarize gpa_2 fte collapse (mean) gpa_2u=gpa_2 [aweight=fte], by(hei3) summarize sort hei3 label variable gpa_2u "HEI level average of within UoA standardised GPA" sav $path9\m2.dta, replace use $path9\temp2.dta, clear sort hei3 merge hei3 using $path9\m2.dta tab _merge drop _merge collapse (mean) unit_g=gpa_2u [aweight=fte], by(uoa3) label variable unit_g "UOA level average of HEI level standardised GPAs" summarize sort uoa3 sav $path9\m3.dta, replace use $path9\temp2.dta, clear sort uoa3 merge uoa3 using $path9\m3.dta tab _merge drop _merge sort hei3 merge hei3 using $path9\m2.dta tab _merge drop _merge summarize * Some illustrative results table uoa3 [aweight=fte], c(mean gpa_1 mean gpa_2 mean unit_g sd unit_g) table uoa3 [aweight=fte], c(mean gpa_1 mean unit_g mean rusmean mean russell) correlate unit_g rusmean graph hbar (mean) unit_g , over(uoa3, label(labsize(tiny)) sort(1) ) graph save bit1, replace graph hbar (mean) rusmean , over(uoa3, label(labsize(tiny)) sort(1) ) graph save bit2, replace graph hbar (mean) unit_g if sixties==1, over(hei3, label(labsize(tiny)) sort(1) ) graph save bit3, replace graph hbar (mean) rusmean if sixties==1, over(hei3, label(labsize(tiny)) sort(1) ) graph save bit4, replace graph combine bit1.gph bit2.gph, title(Ranking of UoAs by host HEI averages) graph export $path2\uao_rank1.emf, as(emf) replace graph combine bit3.gph bit4.gph, title(Ranking of 1960s HEIs by UoA rankings) graph export $path2\hei_60s_rank1.emf, as(emf) replace * Example of unit level data from Stirling Univ. list uoa3 s4 s3 fte gpa* unit_g rusmean if hei3==139 ***************** ************************************* ******** [Stata-9] ** Mean standardisations using alternative gpa-measure use $path1\rae2008_2.dta, clear capture drop gpa_1 gen gpa_1 = (s4*100 + s3*50 + s2*20 + uc*-40) / 100 sav $path9\m1.dta, replace summarize gpa_1 collapse (mean) mean1=gpa_1 (sd) sd1=gpa_1 [aweight=fte], by(uoa3) summarize sort uoa3 sav $path9\m2.dta, replace use $path9\m1.dta, clear sort uoa3 merge uoa3 using $path9\m2.dta tab _merge drop _merge capture drop gpa_2 gen gpa_2 = 50 + (15*((gpa_1 - mean1) / sd1)) summarize gpa_2 [aweight=fte] * Example: standardised subject rankings at Stirling university list uoa3 s4 s3 s2 s1 fte gpa* if hei3==139 table uoa3 if hei3==139, c(mean s4 mean s3 mean fte mean gpa_1 mean gpa_2) format(%4.2f) * Example: ranking within 1960 universities using within uoa standardised scores graph hbar (mean) gpa_2 if sixties==1 [aweight=fte], /// over(hei3, label(labsize(vsmall)) sort(1) ) title(1960's: HIE mean gpa's standardised within UoAs) ************************************* ****************************************************************** ****************************************************************** *** [Stata-10] * * Version of analytical outputs presented in section 3 of DAMES Technical Paper 2008-3 * * * 1) Ranking of HEIs by Average GPA scores, within selected HEI typologies * use $path1\rae2008_2.dta, clear capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1) / 100 summarize gpa_1 sav $path9\m1.dta, replace collapse (mean) mean1=gpa_1 (sd) sd1=gpa_1 [aweight=fte], by(uoa3) summarize sort uoa3 sav $path9\m2.dta, replace use $path9\m1.dta, clear sort uoa3 merge uoa3 using $path9\m2.dta tab _merge drop _merge capture drop gpa_2 gen gpa_2 = 50 + (15*((gpa_1 - mean1) / sd1)) capture drop gpa_1b gen gpa_1b=gpa_1*20 summarize uoa_u gpa_2 gpa_1b[aweight=fte] graph hbar (mean) gpa_2 gpa_1b if sixties==1 & uoa_u > 2 [aweight=fte], /// over(hei3, label(labsize(tiny)) sort(1) ) title(1960's HEIs) /// bar(1, bcolor(gs5)) bar(2, bcolor(gs11)) bargap(-10) fxsize(50) fysize(90) /// legend( label(1 "Z GPA") label(2 "GPA*20")) graph save bit1, replace graph hbar (mean) gpa_2 gpa_1b if russell==1 & uoa_u > 2 [aweight=fte], /// over(hei3, label(labsize(tiny)) sort(1) ) title(Russell group HEIs) /// bar(1, bcolor(gs5)) bar(2, bcolor(gs11)) bargap(-10) fxsize(50) fysize(90) /// legend( label(1 "Z GPA") label(2 "GPA*20")) graph save bit2 , replace graph hbar (mean) gpa_2 gpa_1b if group94==1 & uoa_u > 2 [aweight=fte], /// over(hei3, label(labsize(tiny)) sort(1) ) title(1994 group HEIs) /// bar(1, bcolor(gs5)) bar(2, bcolor(gs11)) bargap(-10) fxsize(50) fysize(100) /// legend( label(1 "Z GPA") label(2 "GPA*20")) graph save bit3 , replace graph hbar (mean) gpa_2 gpa_1b if poly==1 & uoa_u > 2 [aweight=fte], /// over(hei3, label(labsize(tiny)) sort(1) ) title(Former Poly HEIs) /// bar(1, bcolor(gs5)) bar(2, bcolor(gs11)) bargap(-10) fxsize(50) fysize(100) /// legend( label(1 "Z GPA") label(2 "GPA*20")) graph save bit4 , replace graph combine bit3.gph bit1.gph, /// title("RAE 2008, comparisons across HEIs") /// note("Average GPA throughout HEI, weighted by FTE, with and without UoA standardisation") graph export $path2\hei_rank_1.emf, as(emf) replace graph combine bit2.gph bit4.gph , /// title("RAE 2008, comparisons across HEIs") /// note("Average GPA throughout HEI, weighted by FTE, with and without UoA standardisation") graph export $path2\hei_rank_2.emf, as(emf) replace ******************************************************* ** 2. Rankings of UOA across institutions ** Calculation of UoA level summary measures and their mapping to unit level data use $path1\rae2008_2.dta, clear capture drop gpa_1 gen gpa_1 = (s4*4 + s3*3 + s2*2 + s1*1) / 100 sav $path9\m1.dta, replace summarize gpa_1 russell collapse (mean) mean1=gpa_1 (sd) sd1=gpa_1 (mean) rusmean=russell [aweight=fte], by(uoa3) summarize label variable rusmean "Proportion of UoA members from Russell group HEIs" sort uoa3 sav $path9\m2.dta, replace use $path9\m1.dta, clear sort uoa3 merge uoa3 using $path9\m2.dta tab _merge drop _merge capture drop gpa_2 gen gpa_2 = 50 + (15*((gpa_1 - mean1) / sd1)) label variable gpa_2 "Within UoA standardised GPA" summarize gpa_2 summarize gpa* summarize gpa* [aweight=fte] sort uoa3 sav $path9\temp2.dta, replace summarize gpa_2 fte collapse (mean) gpa_2u=gpa_2 [aweight=fte], by(hei3) summarize sort hei3 label variable gpa_2u "HEI level average of within UoA standardised GPA" sav $path9\m2.dta, replace use $path9\temp2.dta, clear sort hei3 merge hei3 using $path9\m2.dta tab _merge drop _merge collapse (mean) unit_g=gpa_2u [aweight=fte], by(uoa3) label variable unit_g "UOA level average of HEI level standardised GPAs" summarize sort uoa3 sav $path9\m3.dta, replace use $path9\temp2.dta, clear sort uoa3 merge uoa3 using $path9\m3.dta tab _merge drop _merge sort hei3 merge hei3 using $path9\m2.dta tab _merge drop _merge capture drop rusmean2 gen rusmean2=rusmean*100 summarize graph hbar (mean) unit_g rusmean2 , /// over(uoa3, label(labsize(tiny)) sort(1) ) /// bar(1, bcolor(gs5)) bar(2, bcolor(gs11)) bargap(-10) xsize(9) ysize(12) /// legend( label(1 "(1)") label(2 "(2)")) /// title("Ranking of UOAs across HEIs", span) /// note("RAE 2008: average zGPAs of HEIs hosting UOA staff (1), " /// " and percent of UOA staff in Russell group institutions (2)", size(vsmall) ) graph export $path2\uoa_rank3.emf, as(emf) replace ****** ****************************************************************** ****************************************************************** **** Overview in a sentence: ** In this analysis we tried to show how standardisations of RAE responses in terms of the * number of FTE staff within the submission, and the relative prestige of different units of assessment, * lead to different results in terms of ranking of institutions; we released a data file with our * derived measures, including a comparison with the 2001 system, which is available on the DAMES website. use http://www.dames.org.uk/rae2008/rae2008_3.dta, clear summarize sort w2001 gpa_2 list heiname w2001 gpa_1 gpa_2 if uoa3==62 ****************************************************************** ****************************************************************** ****************************************************************** ****************************************************************** ** EOF