– missing datasets

The recent launch of has shown that there is much demand for raw data out there. But despite there being a fascinating array of educational data released, there are a couple of missing datasets.

A recent FoI Request of mine released the information that Ofsted have for many years been publishing the data from their inspections. This information seems not to be widely known, but is very welcome. The published format (Excel files) and presentation is not necessarily to my liking, but at least its available.

The second missing dataset is an odd one. The School Revenue Balances data is published annually by the DCSF and purports to show how much money is being hoarded by naughty headteachers and school governors, instead of being spent on the pupils. It doesn’t, at least not in the way presented. But that argument is for another post. It’s still a very useful dataset and a still bit of reverse engineering can give data such as individual school budgets going back to the millenium.

There are a couple of small problems with this data as presented. Firstly, schools are not identified by their URN, a primary key used in most other educational datasets. Instead it uses a 2 field key of LocalAuthority-EstablishmentNumber. This is the way schools were previously identified, but has been superceded by the URN, for quite some time now. The other issue is that it is not normalised. This is a database design term relating to how data is structured. But this is inherent in the data format (excel) and presentational requirements.

But that does mean that any meaningful queries of the data need to be preceded by some in depth hardcore data manipulation.

Both datasets are valuable and deserve inclusion on I’m working on improving the format and presentation and I’ll blog thatand make it available here when I’m done.

