Birthday Graph for South Africa

One of the projects I worked on in 2013 gave me access to a large list of South African ID numbers. Since the South African ID number encodes the person’s birthdate into the first 6 digits, I realised it would be possible to make a South African version of this map of common birthdays in the US.  I grabbed digits 3 – 6 (month and day) of each ID and discarded the rest to be safe. From there, its was simple enough calculate the frequency distribution.  The US map only shows the ranking, not the actual distribution and so I have made an interactive version that does both:

Screen Shot 2014-09-15 at 2.38.05 PM

The picture above shows the ranking from 1 to 366 and it is kind of interesting, but random things often look like patterns. However, this gets more interesting when you look at the frequency distribution. The 1st of January is a massive outlier and this was very unexpected given the how things look on the US version of the map. The 1st of January is at about 400,000 and is more than twice the next highest, which is the 10th of October at about 189,000.

Screen Shot 2014-09-15 at 2.38.15 PM

Screen Shot 2014-09-15 at 2.38.45 PM          Screen Shot 2014-09-15 at 2.38.31 PM

The spread across days of the week and the months look about right. To confirm that there was not a problem with the source data, I did that same analysis using a 10% sample set from the Census data, I saw the same trend.

Frequency Plot for Census Data
Frequency Plot for Census Data
voting
Frequency Plot for Other Data

Either a disproportionately large number or South African’s are New Years babies, or (and this is my personal hypothesis), pre 1994 a large number of South African’s would not have a had official ID numbers or birth certificates. A valid ID number was required to vote in the first official democratic election and the process of allocating ID numbers to those who did not have must have started prior. If the person applying for the ID number did not have a valid birth certificate, or their date of birth was not known, then they were probably given the date of 1 January and a guess at the year. The other dates that stand out are 2 Feb, 3 Mar, 4 April, etc. Similarly, if you only know the month, it would be simplest to just match the day and the month numerically i.e. 2-2, 3-3, 4-4 etc. The other days that stand out are the 16th of June and the 25th of December. The anomaly dates become less as you narrow the date range to exclude older people. Given that my source data was the voters roll, I could not do this for people born in the last 20 years. If I can get access to that, I will do an updated version.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

3 thoughts on “Birthday Graph for South Africa”

  1. Awesome job!

    It is obviously an artifact that comes from some allocation process. Apart from your hypothesis the result could quite easily have a Department of Home Affairs screw-up. Does the 10% sample provide a date of birth? Could you provide a break-down of the relevant socio-geographical indicators?

    1. How old are these people?
    2. Where do they live
    3. What is their race?

    Of course, if you’re correct, this group will almost certainly not include any whites. My guess is that they are very rural if their date of birth is not known.

    1. The 10% sample does provide the date of birth. It will take a bit of time to do more analysis but I’m fairly sure your 3 questions can be answered. Will do an update to this when I get some time.

  2. New Year baby theory 2 – There was a time, not so long ago, when admission to schools used the criteria of which half of the year you were born. So if you were born during the 1st half of the year, you were allowed to attend school the year you turned 6, whereas if you were born in the 2nd half of the year, you were forced to delay your entry by a year.

    Many parents would hold off registration of the kid until the new year to allow them to go to school earlier.

Leave a Reply

Your email address will not be published. Required fields are marked *