Upload
joshua-campbell
View
1.058
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Summarizes the results of my MA thesis on archaeological predictive modeling. In addition to a summary of the southwest Kansas model, this presentation also contains reference to previous work at Fort Hood, Texas and possible model enhancements.
Citation preview
Archaeological Predictive Archaeological Predictive Model for the High Plains of Model for the High Plains of
Southwestern KansasSouthwestern Kansas
Plains Conference 2006 Plains Conference 2006 Topeka, KansasTopeka, Kansas
Joshua S. CampbellJoshua S. CampbellDepartment of GeographyDepartment of Geography
University of KansasUniversity of Kansas
Project OutlineProject Outline
• Use existing data sources to develop an empirical Use existing data sources to develop an empirical predictive model of open-air archaeological sites on predictive model of open-air archaeological sites on the High Plains of southwestern Kansasthe High Plains of southwestern Kansas
• The binary logistic regression model relates the The binary logistic regression model relates the presence or absence of archaeological material to presence or absence of archaeological material to geographic variables extracted from modern map datageographic variables extracted from modern map data
• Final output is a probability surface in which each Final output is a probability surface in which each raster cell contains a probability score describing its raster cell contains a probability score describing its environmental similarity to known site locationsenvironmental similarity to known site locations
Point of RocksPoint of Rocks
Middle SpringMiddle Spring
Archaeological Sample Selection Archaeological Sample Selection
• Training SampleTraining Sample• Site (1) = 151 sites (7,917 cells)Site (1) = 151 sites (7,917 cells)• Nonsite (0) = 12,303 random cellsNonsite (0) = 12,303 random cells
• Testing SampleTesting Sample• Site (1) = 75 sites (3,344 cells)Site (1) = 75 sites (3,344 cells)• Nonsite (0) = 3,142 random cellsNonsite (0) = 3,142 random cells
• Over 90% of sites are less than 2,000 yrs (Brown, Over 90% of sites are less than 2,000 yrs (Brown, 1978)1978)
Apriori ProbabilitiesApriori Probabilities
• Site-present {S} event class : Site-present {S} event class : • Pr(S) = 11,261/20,440,315 = 000550 Pr(S) = 11,261/20,440,315 = 000550 • ((0.05%0.05% of all 30m of all 30m22 land parcels contain a site) land parcels contain a site)
• Morton County = 0.3%Morton County = 0.3%
• Site-absent class {S’} as: Site-absent class {S’} as: • Pr(S’) = 20,429,054/20,440,315 =0.999449 Pr(S’) = 20,429,054/20,440,315 =0.999449 • ((99.95%99.95% of all land parcels do not contain a site) of all land parcels do not contain a site)
Geographic VariablesGeographic Variables
• SlopeSlope
• Relief (3): 150m, 300m, and 600m radiusRelief (3): 150m, 300m, and 600m radius
• Shelter IndexShelter Index
• Distance to Intermittent WaterDistance to Intermittent Water
• Distance to Perennial WaterDistance to Perennial Water
• Distance to Playa LakeDistance to Playa Lake
• LandformsLandforms
Landform VariableLandform Variable
• Generated by reclassifying the 229 soil series Generated by reclassifying the 229 soil series in the study area into 6 landform categoriesin the study area into 6 landform categories• UplandUpland• SlopesSlopes• FloodplainFloodplain• Sand DunesSand Dunes• Semi-SandSemi-Sand• Playas (buffered to 90m to represent activity zone)Playas (buffered to 90m to represent activity zone)
Landform – Site DistributionLandform – Site DistributionSites are differentially located with respect to landform, Sites are differentially located with respect to landform, while the Non-Site sample reflects a random sampling of while the Non-Site sample reflects a random sampling of
the overall landscapethe overall landscape
Entire Landscape All Sites Non-Sites
Floodplain 6.0% 9.1% 5.9%
Upland 55.8% 14.7% 56.2%
Semi-Sand 13.0% 9.6% 12.5%
Slopes 7.5% 39.1% 6.7%
Sand 15.5% 27.0% 16.7%
Playa 2.2% 0.5% 2.0%
Predictive ModelPredictive Model
• The ‘Z’ equation in then defined as:The ‘Z’ equation in then defined as:
Z = -2.701459 + (Dist. to Inter. * -0.000328) + (Dist. Z = -2.701459 + (Dist. to Inter. * -0.000328) + (Dist. to Perr. * -0.000053) + (Floodplain * 0.005184) + to Perr. * -0.000053) + (Floodplain * 0.005184) + (Upland * -0.969463) + (Semi-Sand * 0.509768) + (Upland * -0.969463) + (Semi-Sand * 0.509768) + (Slopes * 1.535854) + (Sand * 0.867705) + (Slopes * 1.535854) + (Sand * 0.867705) + (Relief150 * -0.009843) + (Relief600 * -0.012883) + (Relief150 * -0.009843) + (Relief600 * -0.012883) + (Shelter Index * 0.001617) + (Slope * -0.025225)(Shelter Index * 0.001617) + (Slope * -0.025225)
• This equation was entered into the GIS and then This equation was entered into the GIS and then further modified by the probability equation, further modified by the probability equation, Probability (event) = 1 / (1 + Probability (event) = 1 / (1 + ee-Z-Z))
Model EvaluationModel Evaluation
• Determine the number of sample locations Determine the number of sample locations accurately predicted in the Site-Present and accurately predicted in the Site-Present and Site-Absent event classesSite-Absent event classes• Testing uses an set of sites and non-sites withheld Testing uses an set of sites and non-sites withheld
from model developmentfrom model development
• A ‘cut-point’ is established at the value in A ‘cut-point’ is established at the value in which 85% of sites are correctly classifiedwhich 85% of sites are correctly classified
Predicted Probability of Known Non-Sites
0
200
400
600
800
1000
1200
1400
1600
1800
2000
5 15 25 35 45 55 65 75 85 95
Probability %
Nu
mb
er o
f C
ells
Testing Non-Sites
Predicted Probability of Known Sites
0
100
200
300
400
500
600
700
5 15 25 35 45 55 65 75 85 95
Probability %
Nu
mb
er o
f C
ells
Testing Sites
Cumulative Distribution of Testing Pixels
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Probability Score
Per
cen
tag
e o
f C
orr
ect
Sam
ple
s
Non-Sites Sites
Cut-PointCut-Point
• 85% of the Site-Present testing sample were 85% of the Site-Present testing sample were accurately predicted at the probability value accurately predicted at the probability value 0.110.11
• At the 0.11 cut-point, 41% of the landscape is At the 0.11 cut-point, 41% of the landscape is classified as in the Site-Present classclassified as in the Site-Present class• Optimally this value would be 33% or less but the Optimally this value would be 33% or less but the
current results are inline with other published current results are inline with other published resultsresults
ConclusionsConclusions
• Probability of finding a site in a cell classified as Site-Probability of finding a site in a cell classified as Site-Present is 2.15x more likely than random chance Present is 2.15x more likely than random chance alonealone• Probability of finding a site in the Site-Absent class is .25x Probability of finding a site in the Site-Absent class is .25x
as likely as random chanceas likely as random chance• Model performs equally well in Morton County, even with Model performs equally well in Morton County, even with
the higher base-rate probabilitythe higher base-rate probability
• Considering there are over 20 million cells in the Considering there are over 20 million cells in the study area, this represents a significant increase over study area, this represents a significant increase over randomrandom
Model EnhancementsModel Enhancements
• Qualitative / ‘non-universe’ data integrationQualitative / ‘non-universe’ data integration
• Model results could be negatively impacted by Model results could be negatively impacted by including a dataset in which the complete universe including a dataset in which the complete universe is not known, or that has a very low occurrenceis not known, or that has a very low occurrence
• Springs, quarries, trails, rockshelters, rock art, etc.Springs, quarries, trails, rockshelters, rock art, etc.
• A synthetic approach is proposed that A synthetic approach is proposed that integrates a statistical model as a ‘baseline’ integrates a statistical model as a ‘baseline’ with the addition of specific featureswith the addition of specific features
Model EnhancementsModel Enhancements
• Landscape change / Subsurface modelingLandscape change / Subsurface modeling• Predict the age and cultural significance of a Predict the age and cultural significance of a
subsurface location (voxel)subsurface location (voxel)• Accurate paleo-landscape reconstructions (Fort Riley – Accurate paleo-landscape reconstructions (Fort Riley –
CHILD model)CHILD model)• Multi-temporal archaeological models based on the Multi-temporal archaeological models based on the
paleo-landscape reconstructionspaleo-landscape reconstructions
• A synthetic approach combining a A synthetic approach combining a geoarchaeological and statistical surface model has geoarchaeological and statistical surface model has been developed for Fort Hood, Texasbeen developed for Fort Hood, Texas
Figure 8
Local Collectors Local Collectors (What about the living rooms?)(What about the living rooms?)
• Significant collections are in the hands of an Significant collections are in the hands of an aging group of collectorsaging group of collectors
• Archaeological resources are out there, it is up Archaeological resources are out there, it is up to the research community and the State to put to the research community and the State to put together an integrated approach to address the together an integrated approach to address the issueissue
• It is hoped this model will help in that processIt is hoped this model will help in that process
Joshua S. CampbellJoshua S. CampbellEmail: [email protected]: [email protected]
Department of GeographyDepartment of GeographyUniversity of KansasUniversity of Kansas