17
Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Embed Size (px)

Citation preview

Page 1: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Our Data, Ourselves Hack-Day

Department of Digital Humanities

Giles Greenway

Tobias BlankeJenifer PybusMark Cote

Page 2: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

The Project:

• What and how much data do smartphone apps collect?• What can it say about us and how is it used?• What do we think about this?• Can we put it to better use?• ~20 Young Rewired State coders issued with Android

'phones. • Custom MobileMiner app reports on app usage.• Sends data to a modified CKAN instance.• CKAN is written in Python, based on the Pylons

framework.• Released by the Open Knowledge Foundation:

http://ckan.org/

Page 3: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

The Data:

• Poll /proc/<pid>/net/<tcp/udp>• Look for sockets/ports.• Count transmitted/received

bytes.• GSM cell ids.• Mobile and wireless networks.• App notifications.• Periodically save data to an

internal SQLite database that users can access.

• Upload data to a CKAN instance

.

MobileMiner App: http://kingsbsd.github.io/MobileMiner

.

Page 4: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

GSM Cell Tower Locations: http://opencellid.org

• Full GPS is too invasive, and consumes power.

• Avoid use of Google location API.

• OpenCellId provides locations of (many) cell towers.

• Currently include UK database within the app.

● Next: Bridge MobileMiner to cell DB via CKAN API?

Page 5: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

CKAN:

Page 6: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

CKAN:

Page 7: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Processing The Data:

• Aggregate app usage per user per day.

• Cluster GSM cells by k-means using SciKitLearn Python library

• Label clusters using OpenStreetMaps.

• Gather app data by scraping the Play Store. (BeautifulSoup, PhantomJS & Selenium )

Page 8: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Docker: https://www.docker.com/

• Docker Linux Containers: Dockerfile->Image->Container

• Installs CKAN, packages, libraries.• Link to containers for Postgress and Solr.• Create users and database tables.• Provide access to the data via Ipython Notebooks.• Provide tools like Numpy, SciKitLearn and NLTK.• Allows users to experiment.• Documents the software environment.• Allows for easy deployment.• Free public image hosting.

Page 9: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Questions:

• Can we link app usage to physical locations?• Can we make use of cells whose locations are

unknown?• Can we cluster on a spatial AND temporal basis?• Do apps with certain permissions use more data?

Page 10: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

The Line!

Page 11: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Getting an .apk package:

http://apps.evozi.com/apk-downloader/

Page 12: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Fighting Back?

• Grab the app's .apk package file from a rooted phone?• Decompress the package and examine

AndroidManifest.xml.• Decompile the app and examine the source code.

Page 13: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Fighting back: Decompressing the .apk:

apktool d com.onetouchgame.TheLine.apk

http://code.google.com/p/android-apktool/

Page 14: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

AndroidManifest.xml

<receiver android:enabled="true" android:name="com.simplecreator.app.RemoteNotificationReceiver">

<intent-filter>

<action android:name="cn.jpush.android.intent.REGISTRATION"/>

<action android:name="cn.jpush.android.intent.UNREGISTRATION"/>

<action android:name="cn.jpush.android.intent.MESSAGE_RECEIVED"/>

<action android:name="cn.jpush.android.intent.NOTIFICATION_RECEIVED"/>

<action android:name="cn.jpush.android.intent.NOTIFICATION_OPENED"/>

<action android:name="cn.jpush.android.intent.ACTION_RICHPUSH_CALLBACK"/>

<category android:name="com.onetouchgame.TheLine"/>

</intent-filter>

</receiver>

<service android:name="com.umeng.update.net.DownloadingService" android:process=":DownloadingService"/>

<activity android:name="com.umeng.update.UpdateDialogActivity" android:theme="@android:style/Theme.Translucent.NoTitleBar"/>

• The app receives intents from the push notification service jpush.cn. There is a mobile analytics service.

• Is that why it had open sockets on port 3000?

.

Page 15: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Fighting Back: Decompile the App

http://code.google.com/p/dex2jar/

dex2jar.sh com.onetouchgame.TheLine

Decompile the .jar file:

Page 16: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Fighting Back: “The Usual Suspects”

Look for PhoneStateListeners and LocationListeners:

if (paramLocation != null) { d1 = paramLocation.getLatitude(); d2 = paramLocation.getLongitude(); boolean bool1 = d1 < 29.999998211860657D;Classes provided by tencent.com (a mobile ad service) reference latitutude and longitude.Classes provided by jpush.cn and umeng.com also reference LocationListeners.

Page 17: Our Data, Ourselves Hack-Day Department of Digital Humanities Giles Greenway Tobias Blanke Jenifer Pybus Mark Cote

Download our app:

Follow us on Twitter: @KingsBSD

Read our blog:

Slideshare:http://www.slideshare.net/kingsBSD/