Challenges in A/B Testing Mobile Native Apps

Challenges in A/B Testing Mobile Native Apps

Eric FlorenzanoRuPy Brno 2012

Why am I interested in this?

• Started a company to make iOS apps

• Ended up building A/B testing tools

• Twitter acquired our IP

• All open sourced github.com/clutchio

Show of hands

• Who does “web stuff?”

Show of hands

• Who does “web stuff?”

• Keep your hand up if you’ve done A/B testing on the web.

Show of hands

• Who does iOS or Android apps?

Show of hands

• Who does iOS or Android apps?

• Keep your hand up if you’ve done A/B testing on these apps.

What is A/B Testing?

• Show users different variations of your app

• Track those users and their actions

• Analyze their actions to find the most effective variation

Examples of A/B Tests

• Button text: do people understand “Register” better than “Sign Up”?

• On a mobile website, try replacing the e-mail field with phone number field and verify by text message.

• Does sign-up conversion go up or down if we give people more introductory reading material?

A/B Tests on the Web Today

• Backend determines A/B test bucket

• Front-end changes display

Example of A/B Test on Web

{% ab user “phone-signup” %} <input name=”phone” type=”tel” placeholder=”Phone Number” />{% else %} <input name=”email” type=”email” placeholder=”E-Mail Address”/>{% endab %}

Now Mobile

if(ab(user, @“phone-signup”)) { label.text = @“Phone Number”;} else { label.text = @”E-Mail Address”;}

However...

• How does this ab() function work on mobile?

• What can our latency be on this call?

• Can it talk to a database?

• What happens when you are offline?

Solution: Manifest Upfront

• Download manifest of all A/B tests on launch

• Client must be smart enough to make its own decision, immediately, offline

Problem: First Launch

• Manifest downloads asynchronously at launch

• What about the first launch?

• What about A/B tests on that first screen?

• One possible answer: bundle a manifest with the app. Sucks. Adds extra step to build process.

Goal Tracking

• How do we determine success or failure?

• When is a test completed?

Goal Tracking

goal_reached(user, “phone-signup”)

Goal Tracking

• Must have capability on frontend and backend

• Frontend example: registration page completed successfully

• Backend example: e-mail verified successfully

Goal Tracking Mobile Gotcha

• What happens when the phone is offline?

Goal Tracking on Mobile

• Store all results in a local phone database

• Upload all of the information periodically

• If possible, upload everything when the app is quit

Note: Complications

• May receive some data twice -- have to query to double-check first

• May receive very old data -- working set of data now very wide

• How long is too long to wait for data to come back?

Side-Note: User/Test Consistency

• It’s important for a user to have a consistent experience

• Once a user is placed in a test bucket, they should remain there for that session

• How long is a “session”? On mobile, a good heuristic is “forever, until they update their app.”

So...Minimum Requirements

• Download manifest up-front

• Make weighted random decisions offline

• Track goals and store progress in a local db

• Peg users to the same bucket during the session

• Periodically upload progress

Problem: Slow Release Cycle

• Each release might take weeks or even months

• A/B testing at this time scale is frustrating

• Can anything be done to improve it?

Solution: Parameterized Tests

• Instead of a simple boolean if-else, pass back data instead

• Now you can change the tests on the server

• Still need to think ahead, but this can add lots of flexibility

Parameterized Test Example

[AB test:@"login" data:^(NSDictionary *data) { btn.title = [data objectForKey:@”title”];}];

Note

• Remember, this decision still needs to be made instantly, and off-line

• So all of this data now becomes part of the manifest

• Data must be kept compact and can’t store e.g. a lot of binary

Interpreting the Data

• We care about two things:

• Which variation is winning?

• How confident are we about it?

Which variation is winning?

• Can easily calculate a ratio for each variation:

• How many people have seen this variation?

• How many of those people have reached the goal?

How confident are we about it?

• Statistics. My worst subject in school.

• Must choose a “p-value” - the higher the value, the less results you need, but lower accuracy

• Now compute a confidence interval

• I’m told the Agresti-Coull Interval is a good choice for calculating confidence interval

• Open source JavaScript ABBA library is great!

ABBA Examplehttp://www.thumbtack.com/labs/abba/

http://www.thumbtack.com/labs/abba/

http://www.thumbtack.com/labs/abba/

Questions?

• @ericflo on Twitter

• github.com/clutchio

Documents

Challenges in A/B Testing Mobile Native Apps