The Census provides shapefiles at their FTP here that describe each Census block (their smallest geographic unit). You can download them all or pick a specific state by looking up its FIPS code.
Next, I processed the shapefile in Python using the shapefile library. This script loads each census block (a shape within the file), finds its average latitude and longitude, and extracts the number of people in that particular block. Finally all the data is printed into a text file for final processing in R.
import shapefile def main(): sf = shapefile.Reader("D:\\oh_shape") shapes = sf.shapes() records = sf.records() rows = [] for i in range(len(shapes)): id = records[i][4] location = get_average_lat_lon(shapes[i].points) num = records[i][7] rows.append(Row(id,location,num)) f = open("..\\data\\data.txt","w") f.write("id\tx\ty\tnum\n") for r in rows: f.write("{}\t{}\t{}\t{}\n".format(r.id,r.location[0],r.location[1],r.num)) f.close() def get_average_lat_lon(points): x = [] y = [] for p in points: x.append(float(p[0])) #The longitude component y.append(float(p[1])) #The latitude component return sum(x)/float(len(x)),sum(y)/float(len(y)) class Row: def __init__(self, id, location, num): self.id = id self.location = location self.num = num if __name__=="__main__": main()
The R script is even shorter, basically a line to read the data, a line to calculate the averages, and then the ggplot output.
oh<-read.table("data.txt",header=T,sep="\t") ohCenter <- c(sum(oh$num * oh$x *cos(oh$y))/sum(oh$num * cos(oh$y)),sum(oh$num * oh$y)/sum(oh$num)) library(ggplot2) png(filename="..\\images\\ohio_population_center.png", width=2100,height=2000,units="px",pointsize=24,type="cairo") ggplot(oh,aes(x=x,y=y))+ geom_point(aes(size=num),alpha=0.50)+ geom_hline(y=ohCenter[2], color="steelblue", size=3)+ geom_vline(x=ohCenter[1], color="steelblue", size=3)+ #coord_fixed()+ theme(panel.background = element_rect(fill="white",color="white"), text = element_text(size=32), axis.text = element_blank(), axis.ticks = element_blank(), panel.grid = element_blank())+ scale_size_continuous(name="People per block")+ labs(x="",y="",title="Population Center of Ohio") dev.off()
This map really shows the sprawl of the three largest cities in Ohio and solidifies the concept of “metro area”. When you look at the large version of the map, you can also notice some interesting oddities. For example, to the southwest of Columbus there is a large dot seemingly in the middle of nowhere. I first thought this was an error, but when I plugged the coordinates into Google Maps, I found the Orient prison. What other inferences can you draw about population from this map?
Sorry to bother, but I tried to calculate the population center of Brazil through the municipalities populations and the result was a bit off(too far west, in my opinion), the information could be wrong, but I checked for some cities and the population seems spot on (the total population as well). Could you try to run your R script on it? https://www.dropbox.com/s/syeajl0s4wnzcea/data
Hmm, it looks OK to me. I got -48.89439 -17.02068. However, because Brazil’s population is so unevenly distributed, perhaps a median population center may be more accurate than a mean population center? The median population center is defined as “the point though which a north-south line and an east-west line each divides the total population of the country in half”. Using my R script this can be calculated this way:
That gives me -46.45014, -20.08646. (Note: Not sure if we need to factor in the cos(longitude) to this calculation.) See this Wikipedia page for more details.
Thanks, that’s what I thought, but it’s still somewhat mind boggling.
I’m curious – what are the latitude/longitude of the ‘crosshairs’ of your final result? (I grew up about 20 miles west of the spot, as far as I can tell.)
40.45521, -82.34894