Population Center of Ohio

Population Center of Ohio

Click to enlarge

There are a few different ways of determining a population’s center, but the method I used for this graph is the one currently used by the Census [PDF]. If every person were the same weight, where would the balancing point be? Mathematically, it’s the average latitude (λ) and longitude (Φ) of every person’s position (with a correcting factor for the convergence of longitude.)Equation describing population centerI found Brandon Martin-Anderson’s details about his census dot-map very helpful, although I used my own method to process the data since I didn’t need his level of detail.

The Census provides shapefiles at their FTP here that describe each Census block (their smallest geographic unit). You can download them all or pick a specific state by looking up its FIPS code.

Next, I processed the shapefile in Python using the shapefile library. This script loads each census block (a shape within the file), finds its average latitude and longitude, and extracts the number of people in that particular block. Finally all the data is printed into a text file for final processing in R.

import shapefile

def main():
	sf = shapefile.Reader("D:\\oh_shape")
	shapes = sf.shapes()
	records = sf.records()
	
	rows = []
	
	for i in range(len(shapes)):
		id = records[i][4]
		location = get_average_lat_lon(shapes[i].points)
		num = records[i][7]
		rows.append(Row(id,location,num))
		
	f = open("..\\data\\data.txt","w")
	f.write("id\tx\ty\tnum\n")
	
	for r in rows:
		f.write("{}\t{}\t{}\t{}\n".format(r.id,r.location[0],r.location[1],r.num))

	f.close()
	
def get_average_lat_lon(points):
	x = []
	y = []
	for p in points:
		x.append(float(p[0])) #The longitude component
		y.append(float(p[1]))	#The latitude component
		
	return sum(x)/float(len(x)),sum(y)/float(len(y))

class Row:
	def __init__(self, id, location, num):
		self.id = id
		self.location = location
		self.num = num

if __name__=="__main__":
	main()

The R script is even shorter, basically a line to read the data, a line to calculate the averages, and then the ggplot output.

oh<-read.table("data.txt",header=T,sep="\t")
ohCenter <- c(sum(oh$num * oh$x *cos(oh$y))/sum(oh$num * cos(oh$y)),sum(oh$num * oh$y)/sum(oh$num))

library(ggplot2)
png(filename="..\\images\\ohio_population_center.png",
    width=2100,height=2000,units="px",pointsize=24,type="cairo")
ggplot(oh,aes(x=x,y=y))+
  geom_point(aes(size=num),alpha=0.50)+
  geom_hline(y=ohCenter[2], color="steelblue", size=3)+
  geom_vline(x=ohCenter[1], color="steelblue", size=3)+
  #coord_fixed()+
  theme(panel.background = element_rect(fill="white",color="white"),
        text = element_text(size=32),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())+
  scale_size_continuous(name="People per block")+
  labs(x="",y="",title="Population Center of Ohio")
dev.off()

This map really shows the sprawl of the three largest cities in Ohio and solidifies the concept of “metro area”. When you look at the large version of the map, you can also notice some interesting oddities. For example, to the southwest of Columbus there is a large dot seemingly in the middle of nowhere. I first thought this was an error, but when I plugged the coordinates into Google Maps, I found the Orient prison. What other inferences can you draw about population from this map?

Tagged on: , , ,

5 thoughts on “Population Center of Ohio

  1. Hello

    Sorry to bother, but I tried to calculate the population center of Brazil through the municipalities populations and the result was a bit off(too far west, in my opinion), the information could be wrong, but I checked for some cities and the population seems spot on (the total population as well). Could you try to run your R script on it? https://www.dropbox.com/s/syeajl0s4wnzcea/data

    1. Phillip

      Hmm, it looks OK to me. I got -48.89439 -17.02068. However, because Brazil’s population is so unevenly distributed, perhaps a median population center may be more accurate than a mean population center? The median population center is defined as “the point though which a north-south line and an east-west line each divides the total population of the country in half”. Using my R script this can be calculated this way:

      newx<-rep(oh$x,oh$num)
      newy<-rep(oh$y,oh$num)
      median(newx)
      median(newy)
      

      That gives me -46.45014, -20.08646. (Note: Not sure if we need to factor in the cos(longitude) to this calculation.) See this Wikipedia page for more details.

Leave a Reply

Your email address will not be published. Required fields are marked *