Spatial Patterns of Urban Innovation and Productivity

Embed Size (px)

Citation preview

  1. 1. Radu Stancut Foundations of Urban Science Assignment #5 Final Paper Spatial Patterns of Urban Innovation and Productivity The purpose of creating a science of cities is to bring a fact-based rigor and standardization to a critical human subject: the way we live with others and the planet. To the extent that we can fruitfully observe our urban environment, capture accurate readings, and allow for hypothesis testing we are beholden to do so. To affect our surroundings in an intentional and predictive manner, ideally for the mutual benefit of our civilization and the environment, grants us greater control over our long-term success as a species. Many practices may come to bear when developing this new science and we should be opportunistic in taking what works in other fields, and applying their techniques. Jane Jacobs famously tackled the topic of what kind of a problem a city is.1 Whether or not we come to agree with her assessment, that cities are problems of organized complexity, we should follow her rationale: identify the features and functions of the urban environment, see what analogous problems we have tackled in other areas, most especially the sciences, and apply similar approaches and methods, modified appropriately for the urban field and that most messy of subject matters: people. Recent increases in technological capabilities, such as storage, computational power, and easy access to data, coupled with a belief that there are valuable and actionable insights to be found in data have ushered in the concept of a science of cities. This paper takes the notion of a science of cities to mean that urban environments may now be considered objects of study within a 1 Jacobs, J. 1961. The Death and Life of Great American Cities. New York: Random House, Inc.
  2. 2. scientific framework, where the structure and behavior of cities may be systematically studied via observation and experiment.2 Lit Review Any science of cities approach would appear to require delving into big data. The availability of new forms and sources of data are opening up the possibility of taking measurements at a speed never previously available in human history.3 The belief would seem to be that with enough data we will be able to identify patterns and delve deeper,4 perhaps identifying underlying principles and laws. Big data is certainly a social phenomenon,5 but its effectiveness will depend on how it is used and the principles put in place. We have for instance the following challenges to consider:6 Exponential data growth New types of data Privacy and access Institutional barriers Use and relevance This paper deals primarily with the last item and uses data to attempt to extract insights on urban behavior and outcomes through a modest analysis of GDP and patent information. The goal is to pick up on potential power laws and see if they hold and can tell us something about how a city behaves.7 2 http://www.oxforddictionaries.com/us/definition/american_english/science 3 Koonin, S. Big data and city living - what can it do for us?. 2012 The Royal Statistical Society 4 http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory 5 danah boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG DATA, Information, Communication & Society, 15:5, 662-679, DOI: 10.1080/1369118X.2012.678878 6 Koonin, S. Big data and city living - what can it do for us?. 2012 The Royal Statistical Society 7 Bettencourt, L.M.A. West, G. (2011) Bigger Cities do More with Less. Scientific American
  3. 3. Materials and Methods In exploring cities for patterns and regularities this paper focused on economic, population, and innovation features of Metropolitan Statistical Areas (MSAs). The unit of analysis for all research below was the MSA, unless specified otherwise. The three main sections below correspond to the following questions, and will be expanded on during the analysis in each respective portion: 1. What is the relationship between patenting performance and economic performance? 2. What is the technological profile of the New York MSA? How does this profile contrast/compare with that of (Boston, Houston, and the San Jose MSAs)? 3. How diverse are the metropolitan patenting portfolios and what does the resulting pattern reveal about patenting across metropolitan areas? Data was collected from three main sources: the Bureau of Economic Analysis (BEA), which provided GDP per capita numbers by MSAs and broke down the technology classes found within MSAs; the Census, for population numbers; and the U.S. Patent Office (USPTO), where patents by both technology class and MSAs could be found. The variables for each section are described below, as part of the methods of data manipulation and analysis. The table below outlines the breakdown of data by section and may be used as a reference guide. Section Unit of Analysis Variables Sources I. Patents and Economic Development MSA 1) per capita real GDP 2) population 3) patent intensity BEA; Census; USPTO II. Technological Profiles of Metropolitan Areas MSA 1) patents by MSA by technology class USPTO III. Technological Heat Maps of Metropolitan Areas MSA 1) Tally of technology class patents by MSA BEA; USPTO
  4. 4. I. Patents and Economic Development Per capita real GDP was compared against patent intensity, within each MSA, in order to better understand the relationship between patenting performance and economic performance. Per capita real GDP was acquired directly from the BEA, while patent intensity had to be constructed from MSA patents (USPTO) and population (Census). Patent intensity was defined as (MSA patents / MSA population) x 100,000 (since the numbers tend to be small for some locations). Data was collected from the sources mentioned above, uploaded into Python, and the disparate resulting data tables matched on MSA ID codes/FIPS. Having merged the data sets together we now had in one table information on population, per capita GDP, and, patent counts, spanning from 2001 through 2012, inclusive. As suggested in the assignment document, per capita GDP and patents were averaged over a five year window. The five year time frame was used to smooth the numbers and help minimize distortions. Additionally, different time frames were used for the patents (2001-2005) and GDP (2008-2012) to account for the time delay in patents coming on-line and to set up the analysis for possible causality, with patents leading to greater economic activity and not the other way around. With the data merged, and the averages calculated, it was now possible to construct the patent intensity variable (formula above) and generate plots. Plotting the log of both GDP and patent intensity shows a positive correlation (coefficient of 3.45; R- squared: 0.875; see Appendix). Subsequent plots show additional positive correlations between population and patent intensity (Appendix: Population Influence on Patent Intensity) and population and average GDP (Appendix: 'Population and MSA GDP'). All three plots/numbers provide evidence that size does matter and that it is likely the larger the MSA, the more patent activity there exists and the higher the GDP.
  5. 5. II. Technological Profiles of Metropolitan Areas Below I describe the technological profile of the New York MSA and compare it with three other metropolitan areas: Boston, Houston, and the San Jose. In each instance, the variables analyzed were counts of patents by technology class within each MSA. All data for this section was acquired from the USPTO and uploaded to Python where numbers were tabulated and plots/graphs generated. The focus was on the top 10 technologies of each MSA and what could be ascertained from this information. New York The top 10 technologies of New York account for nearly a third (32%, Appendix) of all patent technologies. This is in line with what we will see from the other three MSAs below. As for each MSA, an index was created on the top 10 technologies, pegged against the top technology and we see a marked drop off of ~40% from the top technology (Drug, Bio-Affecting and Body Treating Compositions) to the technology in second place (Multiplex Communications). This drop off is also not uncommon for the selected MSAs, with one exception (San Jose).
  6. 6. The final exploratory step was to plot collected (2000-2011) to get a better idea cities may be found in the Appendix, reference. was to plotting the top 10 technologies for each city through the years 2011) to get a better idea of patent activity over time. Plots for each of the subsequent cities may be found in the Appendix, the New York ones were presented here for convenience and through the years of patent activity over time. Plots for each of the subsequent for convenience and as a
  7. 7. Boston Of the three additionally selected MSAs, Boston is the one most in line with New York. Boston technologies account for a third of all patent activity and there is a similar (Drug, Bio-Affecting and Body Treating Compositions and Microbiology). Houston Houston introduces our first difference northeast MSAs, with the top 10 technologies accounting for 43% of all patent activity and the drop off from the first place patent class (Wells [ second (Synthetic Resins or Natural Rubbers innovatively, with respect to patent activity over the past decade, of report. San Jose San Jose is also unlike the northeast MSA to be both more concentrated and more diverse Of the three additionally selected MSAs, Boston is the one most in line with New York. Boston a third of all patent activity and there is a similar drop off from technology #1 Affecting and Body Treating Compositions) to technology #2 (Chemistry: Molecular Biology Houston introduces our first differences in the MSA comparison: it is more top heavy technologies accounting for 43% of all patent activity and the drop off Wells [shafts or deep borings in the earth, e.g., for oil and gas] Synthetic Resins or Natural Rubbers) is over 60%. Houston would appear to be tent activity over the past decade, of the four MSAs highlighted in this the northeast MSAs but in a different way than Houston. San Jose would appear to be both more concentrated and more diverse than New York, a paradox revealed by the numbers. Of the three additionally selected MSAs, Boston is the one most in line with New York. Bostons top ten drop off from technology #1 Chemistry: Molecular Biology top heavy than the two technologies accounting for 43% of all patent activity and the drop off he earth, e.g., for oil and gas]) to the ) is over 60%. Houston would appear to be the least diverse s highlighted in this in a different way than Houston. San Jose would appear revealed by the numbers.
  8. 8. San Joses top 10 technologies account for nearly 40% of all patent activity, but within this group the patents are more evenly distributed; five of the San Jose industries are within 40% of the lead patent category (Semiconductor Device Manufacturing: Process), while New York and Boston only have one such industry each within their MSA Lastly, the plotting of patents by year shows that 2010 and 2011 were exceptional for all four MSAs in the following technological areas, something that would require additional research to explain: New York Multiplex Communications; DP: Financial, Business Practice, Management, or Cost/Price Determination (Data Processing) Boston Multiplex Communications; Multicomputer Data Transferring (Electrical Computers and Digital Processing Systems) Houston Boring or Penetrating the Earth San Jose Multiplex Communications; Multicomputer Data Transferring (Electrical Computers and Digital Processing Systems); DP: Database and File Management or Data Structures (Data Processing) III. Technological Heat Maps of Metropolitan Areas Here again we take a global look at MSAs and through the use of a different visualization, a heat map, attempt to glean a better understanding of urban innovation by comparing tallies of technology class patents by MSAs. Two variables were mapped against one another, patent technology classes on the vertical axis and MSAs on the horizontal axis, both from the USPTO. This resulted in a large grid, a 481 (patent technology classes) x 367 (MSAs) matrix. A for loop was implemented in Python to read each instance of a technology class per MSA and where a match was found a Y was placed in that respective patent/MSA cell. Following the completion of the for loop the Y instances were summed by MSA and the grid was sorted along the horizontal axis (MSAs) from least Ys to most. Due to the density of the matrix, Y cells were further highlighted in green to provide a clearer visual representation.
  9. 9. Above we can see the green areas, instances of patent activity by MSAs within technology areas, picking up or becoming denser as we scan from left to right. This is the expected and uninteresting part; what is non-trivial, however, are the gaps or black areas shown above. Based on the image above and corresponding data we can report that MSAs are lagging in several patent areas (listed in Appendix).
  10. 10. Conclusion Based on the plots and numbers presented I would tentatively argue that MSAs, at least in the United States show a consistent and super-linear effect in relation to population and GDP per capita and population and patent intensity. Throughout our group we saw increases above the ratio of 1.0 suggesting that greater populations lead to greater returns, in this case on wealth and innovation as measured by our proxy statistics. Additional data could be collected to investigate the topics pointed out in the Materials and Methods section more thoroughly. So far, what has been shown is correlation; it would be interesting to test for causality and see in which direction the effect is more pronounced: GDP to patent intensity or vice versa. Population was investigated on an MSA level but not taken into consideration by land area, in other words by density. Digging into population density could be helpful in identifying if there is an optimal MSA for innovation. Patents, and specifically the top 10 patents, can be delved into deeper, specifically by comparing performance against industry payroll and C-level employees due to outsourcing of industry, as well as reviewing changes in MSA top 10 patents over time to review changes in innovation and economic drivers over decades.
  11. 11. MSA Patenting and Economic Performance OLS Regression Results ============================================================================== Dep. Variable: GDP Avg 2008 Model: OLS Adj. R Method: Least Squares F Date: Tue, 16 Dec 2014 Prob (F Time: 12:09:49 Log No. Observations: 344 AIC: 1886. Df Residuals: 343 BIC: 1889. Df Model: 1 ===================================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------------- Pat Int 2001-2005 3.4530 0.070 49.088 0.000 3.315 3.591 ============================================================================== Omnibus: 3.021 Durbin Prob(Omnibus): Skew: Kurtosis: 3.222 Cond. No. 1.00 Appendix MSA Patenting and Economic Performance OLS Regression Results ============================================================================== Dep. Variable: GDP Avg 2008-2012 R-squared: 0.875 Model: OLS Adj. R-squared: 0.875 Method: Least Squares F-statistic: 2410. Date: Tue, 16 Dec 2014 Prob (F-statistic): 3.55e Time: 12:09:49 Log-Likelihood: - No. Observations: 344 AIC: 1886. Df Residuals: 343 BIC: 1889. Df Model: 1 =============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------------- 2005 3.4530 0.070 49.088 0.000 3.315 3.591 ============================================================================== Omnibus: 3.021 Durbin-Watson: 1.740 0.221 Jarque-Bera (JB): 2.732 Skew: -0.188 Prob(JB): 0.255 Kurtosis: 3.222 Cond. No. 1.00 ============================================================================== squared: 0.875 squared: 0.875 statistic: 2410. statistic): 3.55e-157 -941.79 No. Observations: 344 AIC: 1886. Df Residuals: 343 BIC: 1889. Df Model: 1 =============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------------- 2005 3.4530 0.070 49.088 0.000 3.315 3.591 ============================================================================== Watson: 1.740 Bera (JB): 2.732 0.188 Prob(JB): 0.255 Kurtosis: 3.222 Cond. No. 1.00
  12. 12. Population Influence on Patent Intensity OLS Regression Results ============================================================================== Dep. Variable: Pat Int 2001 Model: OLS Adj. R Method: Least Squares F Date: Tue, 16 Dec 2014 Prob (F Time: 12:09:14 Log No. Observations: 344 AIC: 968.7 Df Residuals: 343 BIC: 972 Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Pop 2000 0.2132 0.004 50.610 0.000 0.205 0.222 ============================================================================== Omnibus: 9.2 Prob(Omnibus): 0.010 Jarque Skew: 0.314 Prob(JB): 0.00660 Kurtosis: 3.553 Cond. No. ============================================================================== Population Influence on Patent Intensity OLS Regression Results ============================================================================== Dep. Variable: Pat Int 2001-2005 R-squared: 0.882 Model: OLS Adj. R-squared: 0.882 Method: Least Squares F-statistic: 2561. Date: Tue, 16 Dec 2014 Prob (F-statistic): 3.54e Time: 12:09:14 Log-Likelihood: - No. Observations: 344 AIC: 968.7 Df Residuals: 343 BIC: 972 Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] -------------------------------------------------------------- Pop 2000 0.2132 0.004 50.610 0.000 0.205 0.222 ============================================================================== Omnibus: 9.290 Durbin-Watson: 1.987 Prob(Omnibus): 0.010 Jarque-Bera (JB): 10.041 Skew: 0.314 Prob(JB): 0.00660 Kurtosis: 3.553 Cond. No. ============================================================================== ============================================================================== squared: 0.882 squared: 0.882 statistic: 2561. ): 3.54e-161 -483.37 No. Observations: 344 AIC: 968.7 Df Residuals: 343 BIC: 972.6 Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] -------------------------------------------------------------- Pop 2000 0.2132 0.004 50.610 0.000 0.205 0.222 ============================================================================== Watson: 1.987 Bera (JB): 10.041 Skew: 0.314 Prob(JB): 0.00660 1.00 ==============================================================================
  13. 13. Population and MSA GDP OLS Regression Results ============================================================================ Dep. Variable: GDP Avg 2008 Model: OLS Adj. R Method: Least Squares F Date: Tue, 16 Dec 2014 Prob (F Time: 12:08:19 Log No. Observations: 344 AIC: 7370. Df Residuals: 3 Df Model: 1 ============================================================================== coef std err t P>|t| ------------------------------------------------------------------------------ Pop 2000 3282.5482 46.276 70.935 0.000 3191.528 3373.568 ========================================================================= Omnibus: 120.603 Durbin Prob(Omnibus): 0.000 Jarque Skew: 1.506 Prob(JB): 1.18e Kurtosis: ============================================================================== OLS Regression Results ============================================================================ Dep. Variable: GDP Avg 2008-2012 R-squared: 0.936 Model: OLS Adj. R-squared: 0.936 Method: Least Squares F-statistic: 5032. Tue, 16 Dec 2014 Prob (F-statistic): 4.95e Time: 12:08:19 Log-Likelihood: - No. Observations: 344 AIC: 7370. Df Residuals: 343 BIC: 7374. Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Pop 2000 3282.5482 46.276 70.935 0.000 3191.528 3373.568 ========================================================================= Omnibus: 120.603 Durbin-Watson: 1.757 Prob(Omnibus): 0.000 Jarque-Bera (JB): 455.576 Skew: 1.506 Prob(JB): 1.18e 7.766 Cond. No. 1.00 ============================================================================== ============================================================================== squared: 0.936 squared: 0.936 statistic: 5032. statistic): 4.95e-207 -3684.0 No. Observations: 344 AIC: 7370. 43 BIC: 7374. Df Model: 1 ============================================================================== [95.0% Conf. Int.] ------------------------------------------------------------------------------ Pop 2000 3282.5482 46.276 70.935 0.000 3191.528 3373.568 ============================================================================== Watson: 1.757 Bera (JB): 455.576 Skew: 1.506 Prob(JB): 1.18e-99 7.766 Cond. No. 1.00 ==============================================================================
  14. 14. New York Class Class Title Total Class % Class IDX of Top 424 Drug, Bio-Affecting and Body Treating Compositions (includes Class 514) 5212 8.462824947 1 370 Multiplex Communications 3138 5.095231136 0.602072141 705 DP: Financial, Business Practice, Management, or Cost/Price Determination (Data Processing) 1848 3.000633251 0.354566385 455 Telecommunications 1589 2.580089954 0.304873369 435 Chemistry: Molecular Biology and Microbiology 1509 2.450192411 0.289524175 532 Organic Compounds (includes Classes 532-570) 1473 2.391738516 0.282617038 375 Pulse or Digital Communications 1416 2.299186517 0.271680737 709 Multicomputer Data Transferring (Electrical Computers and Digital Processing Systems) 1323 2.148180623 0.253837299 438 Semiconductor Device Manufacturing: Process 1295 2.102716482 0.248465081 707 DP: Database and File Management or Data Structures (Data Processing) 1139 1.849416273 0.218534152 Top 10% of total 32.38021011 Boston Class Class Title Total Class % Class IDX of Top 424 Drug, Bio-Affecting and Body Treating Compositions (includes Class 514) 3326 8.274661027 1 435 Chemistry: Molecular Biology and Microbiology 2143 5.331508894 0.644317498 370 Multiplex Communications 1397 3.475556661 0.420024053 709 Multicomputer Data Transferring (Electrical Computers and Digital Processing Systems) 1136 2.826222167 0.341551413 128 Surgery (includes Class 600) 1089 2.709292201 0.327420325 250 Radiant Energy 1004 2.497823112 0.301864101 707 DP: Database and File Management or Data Structures (Data Processing) 994 2.472944396 0.298857486 606 Surgery (instruments) 871 2.166936186 0.261876127 532 Organic Compounds (includes Classes 532-570) 847 2.107227267 0.254660253 382 Image Analysis 631 1.569846996 0.189717378 Top 10% of total 33.43201891
  15. 15. Houston Class Class Title Total Class % Class IDX of Top 166 Wells (shafts or deep borings in the earth, e.g., for oil and gas) 3259 15.49322558 1 520 Synthetic Resins or Natural Rubbers (includes Classes 520- 528) 1272 6.047064416 0.390303774 175 Boring or Penetrating the Earth 1049 4.986926551 0.321877877 702 DP: Measuring, Calibrating, or Testing (Data Processing) 636 3.023532208 0.195151887 424 Drug, Bio-Affecting and Body Treating Compositions (includes Class 514) 551 2.619443784 0.169070267 324 Electricity: Measuring and Testing 537 2.552888044 0.164774471 585 Chemistry of Hydrocarbon Compounds 502 2.386498693 0.15403498 532 Organic Compounds (includes Classes 532-570) 479 2.277157119 0.1469776 73 Measuring and Testing 468 2.224863323 0.143602332 507 Earth Boring, Well Treating, and Oil Field Chemistry 391 1.858806751 0.119975453 Top 10% of total 43.47040647 San Jose Class Class Title Total Class % Class IDX of Top 438 Semiconductor Device Manufacturing: Process 5418 6.050453952 1 370 Multiplex Communications 4785 5.343562598 0.88316722 257 Active Solid-State Devices (e.g., Transistors, Solid-State Diodes) 3695 4.126324723 0.681985973 365 Static Information Storage and Retrieval 3466 3.870593096 0.639719454 709 Multicomputer Data Transferring (Electrical Computers and Digital Processing Systems) 3420 3.819223425 0.631229236 707 DP: Database and File Management or Data Structures (Data Processing) 3219 3.594760293 0.594130676 360 Dynamic Magnetic Information Storage or Retrieval 2789 3.114565535 0.514765596 711 Memory (Electrical Computers and Digital Processing Systems) 2578 2.878935084 0.475821336 345 Computer Graphics Processing and Selective Visual Display Systems 2416 2.698024501 0.445921004 714 Error Detection/Correction and Fault Detection/Recovery 2221 2.480261762 0.409929863 Top 10% of total 37.97670497
  16. 16. Gaps in Patent Activity 901 Robots 902 Electronic funds transfer 903 Hybrid electric vehicles (hevs) 930 Peptide or protein sequence 968 Horology 976 Nuclear technology 977 Nanotechnology 984 Musical instruments 987 Organic compounds containing a bi, sb, as, or p atom or containing a metal atom of the 6th to 8th group of the periodic system D01 Edible products D02 Apparel and haberdashery D03 Travel goods and personal belongings D04 Brushware D05 Textile or paper yard goods; sheet material D06 Furnishings D07 Equipment for preparing or serving food or drink not elsewhere specified D08 Tools and hardware D09 Packages and containers for goods D10 Measuring, testing, or signalling instruments D11 Jewelry, symbolic insignia, and ornaments D12 Transportation D13 Equipment for production, distribution, or transformation of energy D14 Recording, communication, or information retrieval equipment D15 Machines not elsewhere specified D16 Photography and optical equipment D17 Musical instruments D18 Printing and office machinery D19 Office supplies; artists and teachers materials D20 Sales and advertising equipment D21 Games, toys, and sports goods D22 Arms, pyrotechnics, hunting and fishing equipment D23 Environmental heating and cooling; fluid handling and sanitary equipment D24 Medical and laboratory equipment D25 Building units and construction elements
  17. 17. D26 Lighting D27 Tobacco and smokers' supplies D28 Cosmetic products and toilet articles D29 Equipment for safety, protection, and rescue D30 Animal husbandry D32 Washing, cleaning, or drying machine D34 Material or article handling equipment D99 Miscellaneous G9B INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER PLT Plants