30
Social Science-Conscious Analysis Case Study: The Cost of Public School Riley H

Social Science-Conscious Analysis Case Study: The Cost of Public School

  • Upload
    riley-h

  • View
    555

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Social Science-Conscious Analysis Case Study: The Cost of Public School

Social Science-Conscious Analysis Case Study: The Cost of Public School

Riley H

Page 2: Social Science-Conscious Analysis Case Study: The Cost of Public School

Why The Cost of Public School?

New York City has some of the best and worst schools in all of the state as well as the country.

Sometimes these are right next to each other.

Page 3: Social Science-Conscious Analysis Case Study: The Cost of Public School

A Closer Look at Adjacent Schools

P.S. 11 in Midtown West performed worse than 60% of all schools in New York State.

P.S. 59 in Midtown East, a 10 minute walk away, is the 19th best elementary school in the state.

Page 4: Social Science-Conscious Analysis Case Study: The Cost of Public School

The Problem

Page 5: Social Science-Conscious Analysis Case Study: The Cost of Public School

In a perfect world, how would you answer your question?

For us, the perfect solution involved selling identical houses right across a school zone from each other.We’d then measure the price difference. It was important to make sure that other factors of a neighborhood that drive price are as stable as possible between the two, allowing us to collect only the price difference associated with the school.

Page 6: Social Science-Conscious Analysis Case Study: The Cost of Public School

With unlimited data, how would you demonstrate your hypothesis was true?

Identifying an exact method to nail down the problem we want to solve is sometimes the hardest step.

Start by detailing your “ideal experiment”; what you would do with all the data you could ever want.

From there, you can break it down into pieces that are possible.

Page 7: Social Science-Conscious Analysis Case Study: The Cost of Public School

What can you actually acquire?

High quality data and computational time are in extremely short supply with few exceptions!

Cut down your question based on what data you can acquire, but make sure you remain true to the core social issue!

Page 8: Social Science-Conscious Analysis Case Study: The Cost of Public School

For The Cost of Public School Project

We focused on the following: ● What data do we need on housing?● What types of housing can we acquire, and how will

the data we can't get affect the impact of the experiment?

● What factors other than housing could affect the cost of housing, and how can we grab accurate data for them and quantify them?

Page 9: Social Science-Conscious Analysis Case Study: The Cost of Public School

The Data

Page 10: Social Science-Conscious Analysis Case Study: The Cost of Public School

Community Data Sites

Community sites are great if they’re available. They can be a godsend for projects like these if the community in question has been diligent in upgrading their processes.

Unfortunately, most cities are still using handwritten forms for a lot of their workings, leaving details scanned into the system in the dreaded pdf format with barely readable font. In other words, useless.

Page 11: Social Science-Conscious Analysis Case Study: The Cost of Public School

NOOOO! NOT HANDWRITTEN PDFS!

OUR PRECIOUS DATA…

LOST TO THE ETHER! D:

Page 12: Social Science-Conscious Analysis Case Study: The Cost of Public School

Caveats of Third Party Sites

● May not be free and clear to use, even just for research purposes. Make sure you check the terms!

● Limits on how much data you can get in a period of time.

● May require a sign up and approval process before allowing API usage.

● API may be slow.● Pulling data in general moves slowly.

Page 13: Social Science-Conscious Analysis Case Study: The Cost of Public School

Fixing the Data: Sometimes Your Research Needs Researching

Preliminary data exploration is important to make sure what you have makes sense.

But what does “sense” refer to?

In some cases, it will be obvious, but not in all of them. Cross-referencing what you have with other sources of information may save you trouble later!

Page 14: Social Science-Conscious Analysis Case Study: The Cost of Public School

Well, the data looks okay...

Cursory summaries of the data (means, medians, quartiles and ranges) may not show anything particularly strange...even when it is there.

Check for duplicate data lines and wrong information that is obscured to the point of looking realistic!

These are common side-effects of using an API from a third party site, and won’t be so easy to find!

Page 15: Social Science-Conscious Analysis Case Study: The Cost of Public School

Feature or Flat Wrong?

After coming up with odd results in our regression models, we looked back to the data and found many listings with very small square footage listed. Some were clearly wrong, like listings with 10 square feet. Others were dubious, especially for tiny NYC living.

Where should we have drawn the line? You may find yourself making this sort of judgement, and that’s where your community research comes in handy!

Page 16: Social Science-Conscious Analysis Case Study: The Cost of Public School

Reasonable results don’t always mean you have good data.

Page 17: Social Science-Conscious Analysis Case Study: The Cost of Public School

The Model

Page 18: Social Science-Conscious Analysis Case Study: The Cost of Public School

Yay! It’s a Clean Dataset!

After a lot of hard work, we finally have what we need to proceed, a beautiful, clean data set.

At this point, you probably notice that your clean data is substantially smaller than what you originally had, maybe too small to enact your original experiment idea.

You can try to find more data, or use a model!

Page 19: Social Science-Conscious Analysis Case Study: The Cost of Public School

Modeling For a New Purpose

Our model was used to help us create data that we were missing for the purpose of actually completing the experiment, rather than have the predictions we acquired used directly.

With our secondary experiment in mind, we constructed a set of “fake” housing data to give us price averages in areas of New York City that our third party site did not care about.

Page 20: Social Science-Conscious Analysis Case Study: The Cost of Public School

Problems with this Approach

Page 21: Social Science-Conscious Analysis Case Study: The Cost of Public School

The Actual Model

Ours was a linear regression model including the following features. Make sure that the type of model

and the features involved work for your project.

Page 22: Social Science-Conscious Analysis Case Study: The Cost of Public School

The Analysis

Page 23: Social Science-Conscious Analysis Case Study: The Cost of Public School

Variety Helps Catch ErrorsAnalysis can be one of the most intense parts of a social science project. It's more than just getting averages and crunching numbers; not only do you have to know what the numbers mean, but what they are defining SOCIALLY.

This is where a diverse team comes in handy! Personal experience may be an indication of where to go next and what you've missed.

Page 24: Social Science-Conscious Analysis Case Study: The Cost of Public School

Don’t Forget the People AspectWe specifically brought in people who know a lot about certain areas of NYC, former realtors who are now researchers, and people who own property in the areas we were examining closely.

We also used our own experiences as residents of the city to guide our choices.

We found that our numbers were in fact reflecting lived experiences.

Page 25: Social Science-Conscious Analysis Case Study: The Cost of Public School

Don’t forget the community you want to serve.

They should be driving your research direction.

Page 26: Social Science-Conscious Analysis Case Study: The Cost of Public School

Look For the Reasons WhyIf it turns out that your research doesn’t reflect lived experience, examine why!

It could mean a drastic error in either your question, its framing, the data set, or your analysis of the results!

Use the community to your advantage rather than work against them.

Page 27: Social Science-Conscious Analysis Case Study: The Cost of Public School

Back to Square One

Page 28: Social Science-Conscious Analysis Case Study: The Cost of Public School

Thank You

To my team at Microsoft, Glenda Ascencio, Anastassiya Neznanova, and Thomas Patino, and our leads, Jake Hofman, Amit Sharma, and Jenn Wortman Vaughn.To Microsoft's Data Science Summer School, headed by Jennifer Chayes at Microsoft Research.And to everyone who encouraged me to give a data science talk!

Page 29: Social Science-Conscious Analysis Case Study: The Cost of Public School

More about MyselfI am a student at CUNY Queens College graduating in May with a BS in Computer Science and BA in Mathematics.If you have questions, comments, or want to recruit, please contact [email protected]://github.com/techiecheckie https://www.linkedin.com/in/techiecheckie

Page 30: Social Science-Conscious Analysis Case Study: The Cost of Public School

Bibliography

1. NYCOpenData, nycopendata.socrata.com2. GreatSchools, greatschools.org 3. StreetEasy, streeteasy.com4. NYC GeoClient API,

developer.cityofnewyork.us/api/geoclient-api5. Microsoft Data Science Summer School,

ds3.research.microsoft.com