Archaeological Predictive Model for the High Plains of Southwestern Kansas

Archaeological Predictive Archaeological Predictive Model for the High Plains of Model for the High Plains of

Southwestern KansasSouthwestern Kansas

Plains Conference 2006 Plains Conference 2006 Topeka, KansasTopeka, Kansas

Joshua S. CampbellJoshua S. CampbellDepartment of GeographyDepartment of Geography

University of KansasUniversity of Kansas

Project OutlineProject Outline

• Use existing data sources to develop an empirical Use existing data sources to develop an empirical predictive model of open-air archaeological sites on predictive model of open-air archaeological sites on the High Plains of southwestern Kansasthe High Plains of southwestern Kansas

• The binary logistic regression model relates the The binary logistic regression model relates the presence or absence of archaeological material to presence or absence of archaeological material to geographic variables extracted from modern map datageographic variables extracted from modern map data

• Final output is a probability surface in which each Final output is a probability surface in which each raster cell contains a probability score describing its raster cell contains a probability score describing its environmental similarity to known site locationsenvironmental similarity to known site locations

Point of RocksPoint of Rocks

Middle SpringMiddle Spring

Archaeological Sample Selection Archaeological Sample Selection

• Training SampleTraining Sample• Site (1) = 151 sites (7,917 cells)Site (1) = 151 sites (7,917 cells)• Nonsite (0) = 12,303 random cellsNonsite (0) = 12,303 random cells

• Testing SampleTesting Sample• Site (1) = 75 sites (3,344 cells)Site (1) = 75 sites (3,344 cells)• Nonsite (0) = 3,142 random cellsNonsite (0) = 3,142 random cells

• Over 90% of sites are less than 2,000 yrs (Brown, Over 90% of sites are less than 2,000 yrs (Brown, 1978)1978)

Apriori ProbabilitiesApriori Probabilities

• Site-present {S} event class : Site-present {S} event class : • Pr(S) = 11,261/20,440,315 = 000550 Pr(S) = 11,261/20,440,315 = 000550 • ((0.05%0.05% of all 30m of all 30m22 land parcels contain a site) land parcels contain a site)

• Morton County = 0.3%Morton County = 0.3%

• Site-absent class {S’} as: Site-absent class {S’} as: • Pr(S’) = 20,429,054/20,440,315 =0.999449 Pr(S’) = 20,429,054/20,440,315 =0.999449 • ((99.95%99.95% of all land parcels do not contain a site) of all land parcels do not contain a site)

Geographic VariablesGeographic Variables

• SlopeSlope

• Relief (3): 150m, 300m, and 600m radiusRelief (3): 150m, 300m, and 600m radius

• Shelter IndexShelter Index

• Distance to Intermittent WaterDistance to Intermittent Water

• Distance to Perennial WaterDistance to Perennial Water

• Distance to Playa LakeDistance to Playa Lake

• LandformsLandforms

Landform VariableLandform Variable

• Generated by reclassifying the 229 soil series Generated by reclassifying the 229 soil series in the study area into 6 landform categoriesin the study area into 6 landform categories• UplandUpland• SlopesSlopes• FloodplainFloodplain• Sand DunesSand Dunes• Semi-SandSemi-Sand• Playas (buffered to 90m to represent activity zone)Playas (buffered to 90m to represent activity zone)

Landform – Site DistributionLandform – Site DistributionSites are differentially located with respect to landform, Sites are differentially located with respect to landform, while the Non-Site sample reflects a random sampling of while the Non-Site sample reflects a random sampling of

the overall landscapethe overall landscape

Entire Landscape All Sites Non-Sites

Floodplain 6.0% 9.1% 5.9%

Upland 55.8% 14.7% 56.2%

Semi-Sand 13.0% 9.6% 12.5%

Slopes 7.5% 39.1% 6.7%

Sand 15.5% 27.0% 16.7%

Playa 2.2% 0.5% 2.0%

Predictive ModelPredictive Model

• The ‘Z’ equation in then defined as:The ‘Z’ equation in then defined as:

Z = -2.701459 + (Dist. to Inter. * -0.000328) + (Dist. Z = -2.701459 + (Dist. to Inter. * -0.000328) + (Dist. to Perr. * -0.000053) + (Floodplain * 0.005184) + to Perr. * -0.000053) + (Floodplain * 0.005184) + (Upland * -0.969463) + (Semi-Sand * 0.509768) + (Upland * -0.969463) + (Semi-Sand * 0.509768) + (Slopes * 1.535854) + (Sand * 0.867705) + (Slopes * 1.535854) + (Sand * 0.867705) + (Relief150 * -0.009843) + (Relief600 * -0.012883) + (Relief150 * -0.009843) + (Relief600 * -0.012883) + (Shelter Index * 0.001617) + (Slope * -0.025225)(Shelter Index * 0.001617) + (Slope * -0.025225)

• This equation was entered into the GIS and then This equation was entered into the GIS and then further modified by the probability equation, further modified by the probability equation, Probability (event) = 1 / (1 + Probability (event) = 1 / (1 + ee-Z-Z))

Model EvaluationModel Evaluation

• Determine the number of sample locations Determine the number of sample locations accurately predicted in the Site-Present and accurately predicted in the Site-Present and Site-Absent event classesSite-Absent event classes• Testing uses an set of sites and non-sites withheld Testing uses an set of sites and non-sites withheld

from model developmentfrom model development

• A ‘cut-point’ is established at the value in A ‘cut-point’ is established at the value in which 85% of sites are correctly classifiedwhich 85% of sites are correctly classified

Predicted Probability of Known Non-Sites

0

200

400

600

800

1000

1200

1400

1600

1800

2000

5 15 25 35 45 55 65 75 85 95

Probability %

Nu

mb

er o

f C

ells

Testing Non-Sites

Predicted Probability of Known Sites

0

100

200

300

400

500

600

700

5 15 25 35 45 55 65 75 85 95

Probability %

Nu

mb

er o

f C

ells

Testing Sites

Cumulative Distribution of Testing Pixels

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Probability Score

Per

cen

tag

e o

f C

orr

ect

Sam

ple

s

Non-Sites Sites

Cut-PointCut-Point

• 85% of the Site-Present testing sample were 85% of the Site-Present testing sample were accurately predicted at the probability value accurately predicted at the probability value 0.110.11

• At the 0.11 cut-point, 41% of the landscape is At the 0.11 cut-point, 41% of the landscape is classified as in the Site-Present classclassified as in the Site-Present class• Optimally this value would be 33% or less but the Optimally this value would be 33% or less but the

current results are inline with other published current results are inline with other published resultsresults

ConclusionsConclusions

• Probability of finding a site in a cell classified as Site-Probability of finding a site in a cell classified as Site-Present is 2.15x more likely than random chance Present is 2.15x more likely than random chance alonealone• Probability of finding a site in the Site-Absent class is .25x Probability of finding a site in the Site-Absent class is .25x

as likely as random chanceas likely as random chance• Model performs equally well in Morton County, even with Model performs equally well in Morton County, even with

the higher base-rate probabilitythe higher base-rate probability

• Considering there are over 20 million cells in the Considering there are over 20 million cells in the study area, this represents a significant increase over study area, this represents a significant increase over randomrandom

Model EnhancementsModel Enhancements

• Qualitative / ‘non-universe’ data integrationQualitative / ‘non-universe’ data integration

• Model results could be negatively impacted by Model results could be negatively impacted by including a dataset in which the complete universe including a dataset in which the complete universe is not known, or that has a very low occurrenceis not known, or that has a very low occurrence

• Springs, quarries, trails, rockshelters, rock art, etc.Springs, quarries, trails, rockshelters, rock art, etc.

• A synthetic approach is proposed that A synthetic approach is proposed that integrates a statistical model as a ‘baseline’ integrates a statistical model as a ‘baseline’ with the addition of specific featureswith the addition of specific features

Model EnhancementsModel Enhancements

• Landscape change / Subsurface modelingLandscape change / Subsurface modeling• Predict the age and cultural significance of a Predict the age and cultural significance of a

subsurface location (voxel)subsurface location (voxel)• Accurate paleo-landscape reconstructions (Fort Riley – Accurate paleo-landscape reconstructions (Fort Riley –

CHILD model)CHILD model)• Multi-temporal archaeological models based on the Multi-temporal archaeological models based on the

paleo-landscape reconstructionspaleo-landscape reconstructions

• A synthetic approach combining a A synthetic approach combining a geoarchaeological and statistical surface model has geoarchaeological and statistical surface model has been developed for Fort Hood, Texasbeen developed for Fort Hood, Texas

Figure 8

Local Collectors Local Collectors (What about the living rooms?)(What about the living rooms?)

• Significant collections are in the hands of an Significant collections are in the hands of an aging group of collectorsaging group of collectors

• Archaeological resources are out there, it is up Archaeological resources are out there, it is up to the research community and the State to put to the research community and the State to put together an integrated approach to address the together an integrated approach to address the issueissue

• It is hoped this model will help in that processIt is hoped this model will help in that process

Joshua S. CampbellJoshua S. CampbellEmail: [email protected]: [email protected]

Department of GeographyDepartment of GeographyUniversity of KansasUniversity of Kansas

Technology

Archaeological Predictive Model for the High Plains of Southwestern Kansas