Data Analysis |

On Thursday, 7th May, the UK voted in a Conservative government by a small majority, to the surprise of most. The surprise was not so much that the Conservatives won the largest number of seats, but that a majority government could be formed at all – a hung parliament was widely expected. As the final votes were being counted on Friday morning, I set about analysing some of the results. In particular I was interested in how the UK system – in which 650 MPs are elected into parliament by separate votes in each constituency – affected things. I created a webpage to present this; this post is about how I did that.

Getting the data

The dataset I wanted included both the overall results – number of seats and votes won be each party – and the per constituency results. I couldn’t find any website providing this data in an easily accessible form, so I set about scraping the results from the BBC’s live coverage.

Obtaining the overall results proved to be very straightforward, the above page contains the results in a JSON string. So far, so good.

Per constituency results turned out to be a little more tricky. The BBC provided a results page for each constituency (e.g. the constituency I voted in). Unfortunately these results weren’t in a nice JSON format, so I’d have to parse the HTML. The BeautifulSoup Python module is great for this:

#!/usr/bin/env python2
"""
Take the HTML from the constituency page and extract the results into a tsv
"""

import argparse
import json

import BeautifulSoup

parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()

with open(args.filename) as handle:
    html_string = handle.read()
    html = BeautifulSoup.BeautifulSoup(html_string, convertEntities=BeautifulSoup.BeautifulSoup.HTML_ENTITIES)

# Get the constituency name
constituency = html.find("h1", attrs={"class": "constituency-title__title"}).text

# To extract the data, we can just target the table columns with the required
# css classes; they will be in the same order as each other so we don't
# need to worry about getting the parent

names = html.findAll("div", attrs={"class":'party__name--long'})
names = [name.text for name in names]

votes = html.findAll("li", attrs={"class":'party__result--votes essential'})
votes = [int(vote.contents[0].replace(',','')) for vote in votes]

# Construct a datastructure to json encode
data = {}
data["name"] = constituency
data["results"] = zip(names, votes)

# Write data out to a json file
with open(args.filename + ".json", "w") as handle:
    json.dump(data, handle)

#!/usr/bin/env python2

"""

Take the HTML from the constituency page and extract the results into a tsv

"""

import argparse

import json

import BeautifulSoup

parser = argparse.ArgumentParser()

parser.add_argument('filename')

args = parser.parse_args()

with open(args.filename) as handle:

html_string = handle.read()

html = BeautifulSoup.BeautifulSoup(html_string, convertEntities=BeautifulSoup.BeautifulSoup.HTML_ENTITIES)

# Get the constituency name

constituency = html.find("h1", attrs={"class": "constituency-title__title"}).text

# To extract the data, we can just target the table columns with the required

# css classes; they will be in the same order as each other so we don't

# need to worry about getting the parent

names = html.findAll("div", attrs={"class":'party__name--long'})

names = [name.text for name in names]

votes = html.findAll("li", attrs={"class":'party__result--votes essential'})

votes = [int(vote.contents[0].replace(',','')) for vote in votes]

# Construct a datastructure to json encode

data = {}

data["name"] = constituency

data["results"] = zip(names, votes)

# Write data out to a json file

with open(args.filename + ".json", "w") as handle:

json.dump(data, handle)

Cool. But I still needed to get the data for all the individual constituencies. The BBC results pages clearly had an ID in the title (e.g. http://www.bbc.co.uk/news/politics/constituencies/E140006020), so I just needed to find a list of IDs somewhere. After a bit of digging, I found that the winner of each constituency was being fetched on the results page in a nice JSON string. The network traffic analyser in the Chrome developer tools was great for finding this. A quick bit of manipulation of this JSON structure later, and we have a textfile containing all 650 constituency IDs. A simple bash pipeline later…

cat ../codes.txt | parallel -n1 -j2 'wget "http://www.bbc.co.uk/news/politics/constituencies/{}" -O {}.html'

1	cat ../codes.txt \| parallel -n1 -j2 'wget "http://www.bbc.co.uk/news/politics/constituencies/{}" -O {}.html'

…and we have all the pages we need, fetched two at a time with parallel and wget. By the way, I love GNU Parallel.

With the data now in an easy format to work with, it was pretty straightforward to knock together some quick⁽¹⁾Time was of the essence; I wanted to have this ready to share whilst interest in the results was high Python scripts to extract some results.

Generating graphs

Initially I planned to create a static image with a few graphs. But I decided that it would be easier to share the analysis as webpage. First up, I would need a Javascript graphing library⁽²⁾Ok, so I could have created the images offline, but I was committed to the web endeavour! Plus this would give a little interactivity to the plots. I’ve used Flot in the past, but I decided to try out Google Charts for no reason other than that it’s good to experiment. The documentation was pretty clear, as were the examples, so it was easy to get started. First-up, a bar-chart of how the result would be affected if seats were allocated based on the proportion of the national result won by the party:

function draw_seats(){
    var data = google.visualization.arrayToDataTable([
        ['Party', 'Difference',{ role: 'style' }],
        ['Conservative', -90, COLOURS['con']],
        ['Labour', -33, COLOURS['lab']],
        ['UKIP', 82, COLOURS['ukip']],
        ['Lib. Dem.', 43, COLOURS['ld']],
        ['SNP', -25, COLOURS['snp']],
        ['Green', 24, COLOURS['grn']],
        ['DUP', -4, COLOURS['dup']],
        ['Plaid Cymru', 1, COLOURS['pc']],
    ]);

    var options = {
        animation: {
            "startup": true,
            "duration": 1000
        },
        title: 'Change in seat allocation if number of seats were determined by percentage of vote',
        chartArea: {width: '60%'},
        legend: { position: 'none' },
        height: 250,
        hAxis: {
            title: 'Seats gained/lost if proportional to vote percentage',
            minValue: 0
        },

        vAxis: {
            title: 'Party'
        }
  };

    var chart = new google.visualization.BarChart(document.getElementById('seats_chart_div'));
    chart.draw(data, options);   
}

function draw_seats(){

var data = google.visualization.arrayToDataTable([

['Party', 'Difference',{ role: 'style' }],

['Conservative', -90, COLOURS['con']],

['Labour', -33, COLOURS['lab']],

['UKIP', 82, COLOURS['ukip']],

['Lib. Dem.', 43, COLOURS['ld']],

['SNP', -25, COLOURS['snp']],

['Green', 24, COLOURS['grn']],

['DUP', -4, COLOURS['dup']],

['Plaid Cymru', 1, COLOURS['pc']],

]);

var options = {

animation: {

"startup": true,

"duration": 1000

title: 'Change in seat allocation if number of seats were determined by percentage of vote',

chartArea: {width: '60%'},

legend: { position: 'none' },

height: 250,

hAxis: {

title: 'Seats gained/lost if proportional to vote percentage',

minValue: 0

vAxis: {

title: 'Party'

}

};

var chart = new google.visualization.BarChart(document.getElementById('seats_chart_div'));

chart.draw(data, options);

}

Getting a PNG image of the graph from Google Charts for this post was also straightforward:

I also generated a pie chart of how seats are allocated under the current system. Setting the colours of the pie slices could not be done in the same way as with the colours of the bars. I’ve no idea why; the latter method seemed much simpler and less bloated.

function draw_result(){
    var data = google.visualization.arrayToDataTable([
        ['Party', 'Seats'],
        ['Conservative', 331],
        ['Labour', 232],
        ['SNP', 56],
        ['Lib. Dem.', 8],
        ['DUP', 8],
        ['Plaid Cymru', 3],
        ['UKIP', 1],
        ['Green', 1],
        ['Other', 10],
    ]);

    var options = {
        animation: {
            "startup": true,
            "duration": 1000
        },
        pieSliceText: 'value',
        pieHole: 0.3,
        chartArea:{
            width: '80%',
            height: '80%'
        },
        title: "Number of seats won",
        legend: { position: 'none' },
        slices: [
        {color: COLOURS['con']},
        {color: COLOURS['lab']},
        {color: COLOURS['snp']},
        {color: COLOURS['ld']},
        {color: COLOURS['dup']},
        {color: COLOURS['pc']},
        {color: COLOURS['ukipn']},
        {color: COLOURS['grn']},
        {color: COLOURS['other']},
        ]
  };

    var chart = new google.visualization.PieChart(document.getElementById('results_chart_div'));
    chart.draw(data, options);   
}

function draw_result(){

var data = google.visualization.arrayToDataTable([

['Party', 'Seats'],

['Conservative', 331],

['Labour', 232],

['SNP', 56],

['Lib. Dem.', 8],

['DUP', 8],

['Plaid Cymru', 3],

['UKIP', 1],

['Green', 1],

['Other', 10],

]);

var options = {

animation: {

"startup": true,

"duration": 1000

pieSliceText: 'value',

pieHole: 0.3,

chartArea:{

width: '80%',

height: '80%'

title: "Number of seats won",

legend: { position: 'none' },

slices: [

{color: COLOURS['con']},

{color: COLOURS['lab']},

{color: COLOURS['snp']},

{color: COLOURS['ld']},

{color: COLOURS['dup']},

{color: COLOURS['pc']},

{color: COLOURS['ukipn']},

{color: COLOURS['grn']},

{color: COLOURS['other']},

]

};

var chart = new google.visualization.PieChart(document.getElementById('results_chart_div'));

chart.draw(data, options);

}

Creating the webpage

Since (web)design is really not my forte, I thought I’d try out out Bootstrap to help ensure everything looked halfway reasonable. This was the first time I’ve tried using one of these frameworks, but it was pretty straightforward to get going with for my simple use case. I used the panel components to separate out the different parts of analysis, with a simple grid to present tables and graphs with simple analysis alongside. I was pleased that this seemed to look ok on mobile devices pretty much straight away, although the table is at risk of being cut-off on smaller screens.

Summing up

Despite not having previously used Google Charts or Bootstrap, I would it pretty straightforward to generate this simple webpage. Both of these tools passed the test of being easy to get started with. Overall, I found this a good way to present the information and made it easy to share with friends.

Finally, putting any personal political viewpoints aside, I think it demonstrates how crazy the current system is.

[ + ]

1.	↑	Time was of the essence; I wanted to have this ready to share whilst interest in the results was high
2.	↑	Ok, so I could have created the images offline, but I was committed to the web endeavour! Plus this would give a little interactivity to the plots

Ben Smithers

Rambling thoughts on Programming / Bioinformatics / Personal Life

Category Archives: Data Analysis

Analysing the UK General Election Results

Getting the data

Generating graphs

Creating the webpage

Summing up