Current .csv file - v0.1

Changelog

  • v0.1
    • First public release

Citation

Sonnet, Luke. 2019. “2018 Pakistani General Election Polling Station Data.” https://doi.org/10.17605/osf.io/mtsnd.

Metadata

Description

Dataset name: 2018 Pakistani General Elections Electoral Area and Census Block data

Overview

Each row in this data is unique to the combination of constituency, polling station, and census block, and was released by the Electoral Commission of Pakistan as Form 28.

Some polling station areas include several census blocks, and several census blocks are across several polling stations. The mapping of polling station to census block is many-to-many; this means that the same polling Furthermore, female and male only polling stations can have different mappings from polling station to census block.

The ECP released Form 28s separately for the National and Provincial Assembly constituencies. While the delimitation of polling station areas should be identical for the two constituencies, in the data they are not. We report both of them here stacked together.

Note: 714 of the 461290 non-missing block codes actually represent several census blocks. As such, the block_code field is a string. For these rows, the block_code field usually has a few block codes pasted together. However, splitting the rows that represent multiple census blocks is difficult, as it is unclear how to divide the voters and booths across census blocks.

While this only affects a small percentage of rows, it could prevent a challenge for those polling stations. If you are having difficulties with this, please leave an issue or email Luke Sonnet.

Variable summary

Throughout, a value of -88 denotes that the data should have been reported but was missing on the forms, either due to a scanning error or some other ommission. Please check the variable distributions before using this data.

  • constituency_ps_id: this is used to merge with PS level data; the pasted together constituency_id and ps_id, a unique identifier of the constituency-polling station
  • constituency_ps_id_block_code: constituency_ps_id pasted with the block_code to generate a unique identifier for each polling station-census block code.
  • province
  • assembly
  • constituency_id
  • constituency_area
  • ps_id: the official serial number of the polling station
  • ps_name_from_form28: the name of the polling station; not always consistent within constituency_ps_id, and may not match the ps_name from the polling station data
  • block_code: the census block code; note, there are some block_code values that have dashes in them. These seem to represent more than one census block code (e.g. “332030301-06-07” seems to represent “332030301”, “332030306”, and “332030307”). Unfortunately this is how the census blocks were reported and we are unable to figure out which voters correspond to which of the census blocks. If you need help creating a full linking between polling stations and each of these block codes, please leave a message.
  • block_code_type: “Urban”, “Rural”, or “Unknown”
  • name_ea_rural: the name of the “rural” electoral area covered by this polling station-census block
  • name_ea_rural: the name of the “urban” electoral area covered by this polling station-census block
  • voter_serials_assigned_to_station: sometimes, the serial number range of the voters covered in this EA (per polling station) are reported. They are repeated here verbatim from the forms.
  • male_voters: the number of registered male voters in this polling station-census block
  • female_voters: the number of registered female voters in this polling station-census block
  • total_voters: the number of registered voters in this polling station-census block
  • male_booths: the number of assigned male booths for this polling station (Note: some errors seem to show different booths within polling station id in this dataset)
  • female_booths: the number of assigned female booths for this polling station (Note: some errors seem to show different booths within polling station id in this dataset)
  • total_booths: the total number of assigned booths for this polling station (Note: some errors seem to show different booths within polling station id in this dataset)

Metadata for search engines

  • keywords: constituency_ps_id_block_code, constituency_ps_id, province, assembly, constituency_id, constituency_area, ps_id, ps_name_from_form28, block_code, block_code_type, name_ea_rural, name_ea_urban, voter_serials_assigned_to_station, male_voters, female_voters, total_voters, male_booths, female_booths and total_booths

Variables

constituency_ps_id_block_code

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
constituency_ps_id_block_code character 0 461320 461320 0 461320 11 62

constituency_ps_id

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
constituency_ps_id character 0 461320 461320 0 167616 5 9

province

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
province character 0 461320 461320 0 4 3 11

assembly

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
assembly character 0 461320 461320 0 2 8 10

constituency_id

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
constituency_id character 0 461320 461320 0 849 3 5

constituency_area

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
constituency_area character 0 461320 461320 0 682 4 54

ps_id

Distribution

0 missing values.

Summary statistics
name data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
ps_id integer 0 461320 461320 120.13 94.13 1 48 98 168 924 ▇▃▂▁▁▁▁▁

ps_name_from_form28

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
ps_name_from_form28 character 0 461320 461320 0 83096 3 242

block_code

Distribution

30 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
block_code character 30 461290 461320 0 163074 3 52

block_code_type

Distribution

0 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
block_code_type character 0 461320 461320 0 3 5 7

name_ea_rural

Distribution

170869 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
name_ea_rural character 170869 290451 461320 0 52597 2 244

name_ea_urban

Distribution

291065 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
name_ea_urban character 291065 170255 461320 0 25992 1 244

voter_serials_assigned_to_station

Distribution

69248 missing values.

Summary statistics
name data_type missing complete n empty n_unique min max
voter_serials_assigned_to_station character 69248 392072 461320 0 632 1 244

male_voters

Distribution

122 missing values.

Summary statistics
name data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
male_voters integer 122 461198 461320 252.54 265.27 0 0 208 413 3573 ▇▂▁▁▁▁▁▁

female_voters

Distribution

162 missing values.

Summary statistics
name data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
female_voters integer 162 461158 461320 199.47 214.25 0 0 158 327 2499 ▇▂▁▁▁▁▁▁

total_voters

Distribution

153 missing values.

Summary statistics
name data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
total_voters integer 153 461167 461320 458.75 338.62 0 221 389 616 4341 ▇▃▁▁▁▁▁▁

male_booths

Distribution

102 missing values.

Summary statistics
name data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
male_booths integer 102 461218 461320 1.57 1.53 -99 0 2 2 5 ▁▁▁▁▁▁▁▇

female_booths

Distribution

102 missing values.

Summary statistics
name data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
female_booths integer 102 461218 461320 1.37 1.43 -99 0 1 2 5 ▁▁▁▁▁▁▁▇

total_booths

Distribution

102 missing values.

Summary statistics
name data_type missing complete n mean sd p0 p25 p50 p75 p100 hist
total_booths integer 102 461218 461320 2.94 1.2 -99 2 3 4 6 ▁▁▁▁▁▁▁▇

Codebook table

JSON-LD metadata The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.

{
  "name": "2018 Pakistani General Elections Electoral Area and Census Block data",
  "description": "\n### Overview\n\nEach row in this data is unique to the combination of constituency, polling station, and census block, and was released by the Electoral Commission of Pakistan as Form 28.\n\nSome polling station areas include several census blocks, and several census blocks are across several polling stations. The mapping of polling station to census block is many-to-many; this means that the same polling Furthermore, female and male only polling stations can have different mappings from polling station to census block.\n\nThe ECP released Form 28s separately for the National and Provincial Assembly constituencies. While the delimitation of polling station areas should be identical for the two constituencies, in the data they are not. We report both of them here stacked together.\n\nNote: 714 of the 461290 non-missing block codes actually represent several census blocks. As such, the `block_code` field is a string. For these rows, the `block_code` field usually has a few block codes pasted together. However, splitting the rows that represent multiple census blocks is difficult, as it is unclear how to divide the voters and booths across census blocks.\n\nWhile this only affects a small percentage of rows, it could prevent a challenge for those polling stations. If you are having difficulties with this, please leave an issue or email [Luke Sonnet](lukesonnet.com).\n\n### Variable summary\n\nThroughout, a value of `-88` denotes that the data should have been reported but was missing on the forms, either due to a scanning error or some other ommission.  Please check the variable distributions before using this data.\n\n* constituency_ps_id: *this is used to merge with PS level data*; the pasted together constituency_id and ps_id, a unique identifier of the constituency-polling station\n* constituency_ps_id_block_code: `constituency_ps_id` pasted with the block_code to generate a unique identifier for each polling station-census block code.\n* province\n* assembly\n* constituency_id\n* constituency_area\n* ps_id: the official serial number of the polling station\n* ps_name_from_form28: the name of the polling station; not always consistent within `constituency_ps_id`, and may not match the `ps_name` from the polling station data\n* block_code: the census block code; note, there are some `block_code` values that have dashes in them. These seem to represent more than one census block code (e.g. \"332030301-06-07\" seems to represent \"332030301\", \"332030306\", and \"332030307\"). Unfortunately this is how the census blocks were reported and we are unable to figure out which voters correspond to which of the census blocks. If you need help creating a full linking between polling stations and each of these block codes, please leave a message.\n* block_code_type: \"Urban\", \"Rural\", or \"Unknown\"\n* name_ea_rural: the name of the \"rural\" electoral area covered by this polling station-census block\n* name_ea_rural: the name of the \"urban\" electoral area covered by this polling station-census block\n* voter_serials_assigned_to_station: sometimes, the serial number range of the voters covered in this EA (per polling station) are reported. They are repeated here verbatim from the forms.\n* male_voters: the number of registered male voters in this polling station-census block\n* female_voters: the number of registered female voters in this polling station-census block\n* total_voters: the number of registered voters in this polling station-census block\n* male_booths: the number of assigned male booths for this polling station (Note: some errors seem to show different booths within polling station id in this dataset)\n* female_booths: the number of assigned female booths for this polling station (Note: some errors seem to show different booths within polling station id in this dataset)\n* total_booths: the total number of assigned booths for this polling station (Note: some errors seem to show different booths within polling station id in this dataset)\n\n\n\n## Table of variables\nThis table contains variable names, labels, their central tendencies and other attributes.\n\n|name                              |data_type |missing |complete |n      |empty |n_unique |min |max |mean   |sd     |p0  |p25 |p50 |p75 |p100 |hist     |\n|:---------------------------------|:---------|:-------|:--------|:------|:-----|:--------|:---|:---|:------|:------|:---|:---|:---|:---|:----|:--------|\n|constituency_ps_id_block_code     |character |0       |461320   |461320 |0     |461320   |11  |62  |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|constituency_ps_id                |character |0       |461320   |461320 |0     |167616   |5   |9   |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|province                          |character |0       |461320   |461320 |0     |4        |3   |11  |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|assembly                          |character |0       |461320   |461320 |0     |2        |8   |10  |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|constituency_id                   |character |0       |461320   |461320 |0     |849      |3   |5   |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|constituency_area                 |character |0       |461320   |461320 |0     |682      |4   |54  |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|ps_id                             |integer   |0       |461320   |461320 |NA    |NA       |NA  |NA  |120.13 |94.13  |1   |48  |98  |168 |924  |▇▃▂▁▁▁▁▁ |\n|ps_name_from_form28               |character |0       |461320   |461320 |0     |83096    |3   |242 |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|block_code                        |character |30      |461290   |461320 |0     |163074   |3   |52  |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|block_code_type                   |character |0       |461320   |461320 |0     |3        |5   |7   |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|name_ea_rural                     |character |170869  |290451   |461320 |0     |52597    |2   |244 |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|name_ea_urban                     |character |291065  |170255   |461320 |0     |25992    |1   |244 |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|voter_serials_assigned_to_station |character |69248   |392072   |461320 |0     |632      |1   |244 |NA     |NA     |NA  |NA  |NA  |NA  |NA   |NA       |\n|male_voters                       |integer   |122     |461198   |461320 |NA    |NA       |NA  |NA  |252.54 |265.27 |0   |0   |208 |413 |3573 |▇▂▁▁▁▁▁▁ |\n|female_voters                     |integer   |162     |461158   |461320 |NA    |NA       |NA  |NA  |199.47 |214.25 |0   |0   |158 |327 |2499 |▇▂▁▁▁▁▁▁ |\n|total_voters                      |integer   |153     |461167   |461320 |NA    |NA       |NA  |NA  |458.75 |338.62 |0   |221 |389 |616 |4341 |▇▃▁▁▁▁▁▁ |\n|male_booths                       |integer   |102     |461218   |461320 |NA    |NA       |NA  |NA  |1.57   |1.53   |-99 |0   |2   |2   |5    |▁▁▁▁▁▁▁▇ |\n|female_booths                     |integer   |102     |461218   |461320 |NA    |NA       |NA  |NA  |1.37   |1.43   |-99 |0   |1   |2   |5    |▁▁▁▁▁▁▁▇ |\n|total_booths                      |integer   |102     |461218   |461320 |NA    |NA       |NA  |NA  |2.94   |1.2    |-99 |2   |3   |4   |6    |▁▁▁▁▁▁▁▇ |\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.8.1).",
  "identifier": "https://osf.io/mtsnd/",
  "datePublished": "2019-10-21",
  "creator": {
    "@type": "Person",
    "givenName": "Luke",
    "familyName": "Sonnet",
    "email": "luke.sonnet@gmail.com"
  },
  "citation": "Sonnet, Luke. 2019. “2018 Pakistani General Election Polling Station Data.” https://doi.org/10.17605/osf.io/mtsnd.",
  "url": "https://osf.io/mtsnd/",
  "keywords": ["constituency_ps_id_block_code", "constituency_ps_id", "province", "assembly", "constituency_id", "constituency_area", "ps_id", "ps_name_from_form28", "block_code", "block_code_type", "name_ea_rural", "name_ea_urban", "voter_serials_assigned_to_station", "male_voters", "female_voters", "total_voters", "male_booths", "female_booths", "total_booths"],
  "@context": "http://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "name": "constituency_ps_id_block_code",
      "@type": "propertyValue"
    },
    {
      "name": "constituency_ps_id",
      "@type": "propertyValue"
    },
    {
      "name": "province",
      "@type": "propertyValue"
    },
    {
      "name": "assembly",
      "@type": "propertyValue"
    },
    {
      "name": "constituency_id",
      "@type": "propertyValue"
    },
    {
      "name": "constituency_area",
      "@type": "propertyValue"
    },
    {
      "name": "ps_id",
      "@type": "propertyValue"
    },
    {
      "name": "ps_name_from_form28",
      "@type": "propertyValue"
    },
    {
      "name": "block_code",
      "@type": "propertyValue"
    },
    {
      "name": "block_code_type",
      "@type": "propertyValue"
    },
    {
      "name": "name_ea_rural",
      "@type": "propertyValue"
    },
    {
      "name": "name_ea_urban",
      "@type": "propertyValue"
    },
    {
      "name": "voter_serials_assigned_to_station",
      "@type": "propertyValue"
    },
    {
      "name": "male_voters",
      "@type": "propertyValue"
    },
    {
      "name": "female_voters",
      "@type": "propertyValue"
    },
    {
      "name": "total_voters",
      "@type": "propertyValue"
    },
    {
      "name": "male_booths",
      "@type": "propertyValue"
    },
    {
      "name": "female_booths",
      "@type": "propertyValue"
    },
    {
      "name": "total_booths",
      "@type": "propertyValue"
    }
  ]
}`