DIY OPEN DATA ANALYTICS
What we hear most often from NGOs and small organisations is this:
"We have all this data but we have no idea what to do with - we know we should be doing something"
Yes, they should. It is not difficult to generate a great deal of insight for what is mostly free data and free resources. No expensive database system is required.
Often this data is a combination of lists of supporters/customers, a few separate lists of sign ups, some transactions, a few mailing lists and maybe a rough media plan. This list may be different for everyone, for charities it could be both/either supporter or recipient data, it may be community impact data for those receiving grants, or customer and supplier data for SMEs. The general issues however are the same.
The initial step is joining all this data together. This can be done using a common identifier and usually grouped up to a unique person or small community. This can be done using tools such as SQL, Python, R or even simply Excel. The benefits of seeing all the information about your supporters in one record are immense. You can see:
-who your longest supporters are (these are your most loyal)
-who is still active (who do you need to try to reactivate?)
-who are your most valuable (prioritize your activities on these)
-who are inactive and least valuable (why are you wasting money sending magazines to these?)
-who has potential (how can you engage them to the next level)
And so on. Simple but effective. But this is just the beginning.
If you have contact information for the customers you can do so much more. And a great place to start is the postal address.
What can first be done is geo-code the address. This means clean up the address and add the latitude and longitude co-ordinates. This means we know physically where there person is located. This can mostly be done for free providing addresses are in a semi-legible state. This can also be done using the G-NAF (free) data in Australia or the PAF (not free) file in New Zealand.
Once we have the co-ordinates we can map the customers. This mapping can be seen below for a hypothetical charity organisation based in Oakleigh in South Melbourne. These are mapped in a free (up to a generous limit) cloud based tool called CartoDB. We see the Red dot as the charity location and the Orange dots are their supporters. You can drill in and out and move around Australia.
Here we can see where bunches of customers are based. Likewise how influential the location of the charity is. A catchment could also be worked out - such as 70% of customers are with 10km radius of the charity location. This will help plan and focus activities.
The next step unleashes a whole new world of data insight. We have can do is use the latitude and longitude co-ordinates to determine which meshblock or SA1 the customer sits in. The meshblock is the lowest commonly used geographic structure in Australia (the same in New Zealand where an identical method can be used). There are 347k meshblocks in Australia so under 100 people per meshblock on average. SA1 is the next level up. The Australian structure can be seen below:
Vast quantities of open data are held within this structure. This includes all census data which is mostly made publicly available at meshblock or SA1 level. Although we can't use this data to find out information about specific individuals we can use it to identify key trends or skews against more the general population data. Internally held data by the organisation can also be enhanced with this data (such as age, gender and occupation which may be stored in the database).
Variables we can process the supporter base on will include:
-social services payments
From here we can start doing some profiling. The chart below looks at distance of supporters to the location. We can see confirmation that most supporters are within 4km. Comparing this to population counts we can see penetration is much higher in this zone too, although highest in the slighter outer ring of 3-4km. This (hypothetically) could be because the charity location is surrounded by apartments or commercial zones less affiliated to the charity.
The next two charts we can start seeing how the charity profiles socio-economically. On the first chart, we can see the population around the location peaks with young people (aged in 20s). Yet the charity supporters are a much older group. The charity probably knows this but it is gives confirmation and a strong indication on where to target future acquisition activities. The second chart looks at a SEIFA socio-economic variable (10 = [roughly] the most wealthy). Supporters tend to fall into this wealthy category at quite extreme levels. There is very little penetration below 5 so targeting in these area should be avoided. This is not just about geographical targeting. These profiles guide things like print media, website tone and messaging and which TV shows to advertise on.
As discussed, similar profiling can be done on almost endless variables. A dashboard such as below can be developed bringing together the key variables which are most pertinent for the organisation. This obviously provides rich insight and is a very effective tool. It also brings in forecast information, so one is not focused on just the present or past data, but what is expected to happen in the future.
This can also be expressed geographically. For instance, the charity is interested in expanding its activities to New South Wales and Queensland. They are initially interested in not only where the population resides but where future growth is likely to occur. The map below shows this, the size and density of the bubbles represent the number of people whereas the colouring the expected growth over the next five years (the darker the green the greater the growth).
The next step starts getting more strategic and sophisticated. The charity wants to know the areas where to best target their future activities in New South Wales and Queensland. Some simple statistical modelling has been done to score each area based on the key profiling variables they found in their profiling activities (for instance, they wanted to target older areas with high income but low internet connectivity, high crime rates and high levels of people born outside Australia). Each area is scored to find the best targets. This gives great insight into where to focus future activities or consider land purchases or the like.
This can also be mapped, as per below, to see groups of areas which score highly (the larger the bubble the higher the score). Areas of high large bubble density would present the optimal places to set up. The number one optimal place can be modelled although we would take not that other non-measurable considerations would also have to be made. Note again, this is hypothetical data.
STRATEGY & ACTION
This insight can now be put into action. Actions are based on evidence, not just hunch or what has been done before. This likewise lays the foundations for even more sophisticated modelling or segmentations to make even further gains. But like anything, get the basics right first.
It can be seen a great deal of insight can be generated with just address data. We have only scratched the surface. Such methods can be used to identify high value supporters, plan store locations for retailers, plan marketing and media and likewise overall organisational strategy.
All this data and tools are freely available. We are happy to help with putting this data together, along with your own data, to generate the insight you need to guide and build your organisation.