Upload
michael-robbins
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Post. doc., Center for Computer Games Research, IT University of Copenhagen Don’t pay much, but opportunity for ”blue
sky research”
General expertise with the user experience Game testing Game design Etc. etc. etc.
Primer on empirical research methods
Game Testing 101: General principles
Game testing during the production cycle With introduction to several key methods
Focus: Theory, Practice & Tools
Scientific theory
Empirical research approaches
Empirical game studies
All technology testing is based on empirical research and evaluaiton methods
To understand what games testing really is, you must understand empirical research approaches
If not: Blind use methods you do not understand
Science: Any systematic knowledge or practice.
Science generally refers to a way of acquiring knowledge through the scientific method, as well as the organized body of knowledge gained through such research.
Adheres to positivist philosophy: Only authentic knowledge is scientific knowledge
Science = Logic + Observation
Three types of science: Natural science: The study of natural phenomena Social science: The study of human behavior and
societies Formal science: Mathematics – uses a priori rather
than empirical methods, includes statistics and logic
Two first are empirical sciences, third a mixture, however all feed into each other
▪ A priori = deductive knowledge (independent of experience)
▪ A posteriori = Inductive knowledge (dependent on experience)
Experimental science: Another term for empirical sciences
Applied science: Application of scientific research to specific human needs – such as game testing!
The two are often combined
Empirical sciences Knowledge obtained from observable
phenomena Reproduceable: Phenomena must be
reproduceable under experimental conditions by other scientits, in order to validated.
Careful, objective and systematic study of an area of knowledge
Must follow the scientific method
The scientific method A body of techniques for investigating
phenomena, acquiring knowledge
Collection of data through observation and experimentation, and the formulation and testing of hypotheses
Evidence must be observable, empirical and measureable, subject to principles of reasoning
Empirical research must follow:1. Define the question 2. Gather information and resources (observe) 3. Form hypothesis 4. Perform experiment and collect data 5. Analyze data 6. Interpret data and draw conclusions that serve as a
starting point for new hypothesis 7. Redo entire cycle if necessary8. Publish results 9. Retest (frequently done by other scientists)
Alternative: Explorative approach – similar requirements on
objectivity and reasoning, but forgoes hypothesis forming.
Example1. Are there any bugs with this feature of game X?2. Get the game, set up a lab and assimilate knowledge from
other test cases 3. Hypothesis: There probably are some bugs in our game .... 4. Run tests and collect test data5. Analyze data 6. Interpret test data and draw conclusions: We found X number
of bugs – do we have reason to believe the bugs all been found?
7. Redo entire cycle if necessary8. Publish results to bug database and get designers to fix them 9. Retest to see if bugs have been fixed
Game testing should always follow the scientific method!
A hypothesis defines an expected relationship between variables, which can be empirically tested.
For example: Eliminating the minimap in StarCraft will
increase player engagement Making the bazooka do more damage will
balance the weapons in this game There are no bugs in this level
Empirical research methods come in two forms:
Quantitative methods: Collect numerical data, strictly objective, analyzed using statistical methods
Qualitative methods: Collect data in the form of text, images, sounds etc. Drawn from observations, interviews, documentary
evidence etc., analyzed using qualitative data analysis methods (e.g. content coding)
Data and analysis can be subjective: Relies on researcher experience
Qualitative: More appropriate in early stages of research
(exploratory research) and for theory building Qualitative methods applies well in real world
setting, but lack validity and control Problem with subjective interpretation of the data
Examples▪ Case study: Observations carried out in a real world setting▪ Action research: Applying a research idea in practice,
evaluate results, modify idea (cross btw. experiment and case study)
Quantitative: Appropriate when theory is well developed. Theory testing and refinement
Examples:▪ Experiment: Apply treatment, measure results: This is
the only method that can demonstrate causal relationship between variables. Associated with the scientific method
▪ Survey: Asking rated questions in an interview▪ HIstorical data: Patterns in investments
Most quality research include both types of methods
Method selection is critical to success of any project
Selection must be driven by state of knowledge
Determining whether a hypothesis/theory is supported – easy with bugs, hard with game balancing
Quantitative data analysis: Use of statistical methods to identify patterns and relationships in the data
Qualitative data analysis: More subjective, relies on the researcher’s knowledge to identify patterns, extract themes and make generalizations
Data is objective – otherwise it is information
Processed (refined) information is termed knowledge
Generally: Data Information Knowledge
Foundational principle for all IT industries
QA is a knowledge acquisition process
Summarized: QA is the empirical process of
acquiring data, refining the data into information, and converting it to knowledge that can be implemented by company stakeholders (design, marketing etc.)
A reason why companies hire lots of testers
during crunch time ...
QA in game the games industry
Components of game testing
General purposes of game testing
Testing phases: Intro
Purpose of game testing:
To see how specific components of, or the entirety of, a game is played by people
The litmus test that allows developers to evaluate the state of the game and the quality of the gaming experience
The CompanySort of the company ...
QA is not a part of the main company by necessity:Keeping QA separate eliminates bias
QA is viewed as a necessary evil – low pay and crappy conditions are common
QA informs what is wrong in games under development (causing frustration)
Many forget QA can also tell what is good (causing happiness)
General software industry: QA takes 8-12% total resource
Games industry: less than 1% ....
General software industry: QA throughout production
Games industry: QA often delegated to secondary position in production pipeline
Result:
Digital games has horrible quality compared to
e.g. desktop applications
Non-technical game testing falls within HCI
HCI: Human-Computer Interaction
Mixture of computer science, psychology etc.
Many different types of measures – quantitative and qualitative
20+ years of use in the software industry
Purpose: Technical, content, functional
Phase: Positioning in the development cycle
Testing method: e.g. usability, bug hunting
Game feature: The element being tested
Technical Issues relating to the game engine itself and hardware Well-established methods common to software
development
Functional Bug hunting, stability, integrity of game assests,
gameplay, localization issues, controls, interface
Content Presentation, graphics, level design, game story, user
experience
Game production cycle has 3 general steps: Pre-production Production Post-production
Most game companies follow agile development Sprints and Scrums Rapid iterations of game elements Requires QA to follow same iterative nature
Pre-production Focus group tests Benchmarking
Production Metrics Bug hunting Playtesting Usability testing Game test labs
Post-production Post-mortems & managing communities
Important phase, but often overlooked
Testing of design and concepts: Story, character, world, artwork
Two typical methods: Focus groups Benchmarking
Popular method, but problematic
Good use can lead to valuable insights, bad use to disaster
Good for generating ideas, player impressions, norms/values of the audience
Bad for providing concrete feedback to specific issues
Intensive design: Few units but lots of variables
Central weakness: Non-representativeness of the group participants
Analytical selection: Group participants should display the characteristics required to illustrate the case at hand
Size: from 3-12
Less ruins interaction, more makes them impossible to manage
Testers: build a good tester database
Screen people before adding them
Cover target audience and outside it
Types of participants
Internal: From the company Literature advises against using people we
know in focus groups But some internal testers can be treated as
expert testers
External: From outside the company Fans and non-fans: The problem with bias
Practical considerations for running focus groups
Homogenous or inhomogenous structure?
Should participants know each other? Less likely to speak freely if they do Easier to get people to talk if they do
Group size
Small groups: Good for digging deep into associations of players Low degree of moderation, loose structure
Large groups: Good for gathering many different perspectives High degree of moderation, tight structure
Running a focus group
Prepare in advance: interview guide, purpose
Decide loose or tight structure Loose structure harder to compare across
groups Tight structure less chance of new knowledge
Visual aids should be ready
The moderator
Monitors and moderates the focus group
Incredibly important: must be a good listener and highly attentive to the participants and the social interaction
Usually teamed with an observer
A note of warning:
Focus groups are often run by marketing, game testing by QA – this is BAD!
Consumer testing (on e.g. the box art) should be run by marketing, but NOT game test focus groups
Little used in the industry, early-phase
A form of requirements analysis
Methodical evaluation of competing games, recording what works and what does not
Provides the minimum benchmark the new game must meet
Vast majority of testing during this phase
Early testing of game controls and specific game elements
Later testing of alpha builds, mechanics, story etc.
Iterative test pattern following agile development
E.g. on a bi-weekly basis Defining tests needed Run tests on newest builds Collate and analyze data Deliver reports Log results in test database
Numerical data drawn from client installs or servers
Tracking what players do when they play e.g. Who shoots whom where and when?
(heatmaps) Which areas of the map to players explore? Is the balancing between weapons working?
Immensely useful in games!
Logging the X,Y (Z) position of the player in real time and what people do in that time
Technique developed extensively by Microsoft Game Labs for Halo III (used by others before though)
Questions this can answer, e.g.: Which areas of the level map do players utilize? What do players do with their time? (is it what we
thought?) Does the level promote the behavior we anticipated? Where do players experience problems progressing
through a level? Do our players move through the map as we expected?
Logging the weapons used, the target, effect and the position when used.
Weapon balance is a key aspect of multi-player games, notably PvP.
Questions this can answer, e.g.: Is this weapon too hard or to easy to use? Is a specific weapon too effective, or too
ineffective? (balancing) Are there specific maps where specific weapons
are too effective? Do players use all the weapons the game offers? Do players use the weapons as the developers
intented? (if not, how can we use this?)
Logging how players fare in the face of game challenges – and other players
Balancing player tasks and challenge levels is one of the hardest design tasks
Questions this can answer, e.g.: How long do players survive on this map? Do players ever complete the map objectives? Is this map favorable to a specific side/team? Are there any patterns in the way people play
the game?
Metrics can inform WHAT players are doing
Metrics cannot inform WHY players are doing something
For this we need other types of tests
Bug hunting is a heavily structured process of locating game flaws and reporting them
Bug hunting is important because of the myriad opportunity for conflicts in the game code, objects etc.
Usually done by professional game testers Lousy job, huge turnaround, lots of issues here
Two overall purposes
Finding new bugs Hmm, I wonder what happens if I try running
into this door whilst firing the bazooka?
Trying to recreate old bugs that may have been fixed Hold ”up” when entering or leaving a room
and any currently-held items will be dropped
”The design equivalent of bug hunting” (Rouse, 2003)
When players see if the game is fun and try to find faults in the mechanics themselves AI, Controls, Balancing, Etc.
Playtesting is typically done with stable builds, feedback gathered via structured questionnaires and game logs
Playtesting is a form of user experience analysis:
Evaluating the impact of factors, deciding the experience of using a product
Focused on content and functionality
More than 20 years of method history within the software- and consumer products industry
Established methods, that are beginning to be adapted to the unique nature of digital games
Focus groupsBenchmarking
Usability testing(Technical QA)
Focus groupsBug huntingPlaytesting
Bug huntingPlaytestingBenchmarking
Playtesting must provide specific, structured feedback ”The third boss was hard to kill” - is not specific
”The third boss on the second map was hard to kill – I had no idea what to do or whether I had missed a special weapon. Perhaps I should have used the big rock, but I was not sure” – is specific
“Problem: Players do not find the rocket launcher for killing [boss name] on [map name]. It is not obvious to them that they should sidetrack to locate this weapon before the encounter” – is specific and structured
Key to working with playtesters:
Knowing when to take their opinions seriously, and when not to
Understanding the biases they operate under
Need different kinds of playtesters Internal playtesters – experts, but
subjective Professional playtesters – game testers
who also provide feedback on gameplay etc. Typically hard core gamers only
Amateur playtesters – shows us how players will react when meeting the game
Non-gamers – locates gameplay issues that are non-intuitive, and overlooked by experienced gamers
Some peope to avoid as playtesters People you know personally Your boss Hard-core fans of the game/company Idiots : ” The fourth type of people that you do
not want to have testing your game is idiots. Idiots tend to say idiotic things and have idiotic opinions, and as a result will not be of much help to you” (Rouse).
But who are the idiots? The people who disagree with the designer?
Early playtesting Best done by experienced people who can
overlook obvious flaws Generally guided playtesting
Late playtesting Good with inexperienced testers, when the
game is tuned and balanced Ideally all kinds of testers at alpha and beta Generally unguided playtesting – players
roam around
Interactive products need to facilitate the tasks the user is performing with it
The goals: Accessibility, utility and ease of use (and satisfaction)
In software, usability traditionally applied to interface design (quick, intuitive, easy)
Usability: In the domain of user-centered design
In games, pure usability is not enough (but really nice!)
The interactive experience must also be fun, immersive and engaging
Usability methods therefore adapted to gaming
Purpose: To find out how players interact with the game and track functional problems as well as content problems
In practice, many different ways of running usability tests, e.g. Expert test Think-aloud-test Task completion test Interface and functionality test
User position
Observer position
Data gathered from usability testing: Screen capture Metrics Comments from the player Error rates Behavior analysis Survey data Interview data And yet more ...
Task completion tests
Small samples are used: 6-10 people in a single test round
Rationale: 10 people locate 80-95% of the interface problems a group of 30 would find
Session times: 60-90 minutes
Two main phases
1) Player spends some unstructured time, familiarizing themselves with the game and controls
2) Player goes through a specific set of pre-defined tasks
Tasks based on pre-planned use cases
Can be done on a paper mock-up, prototype or alpha/beta build
Either observer next to the player or in a neighbouring room
Often followed by a structured survey or interview
Heuristic evaluation: Much used in interface design
Heuristics are design principles, e.g. Games should be easy to learn, hard to master Game interfaces should be intuitive to the player
Heuristics often used with expert usability tests, notably early in the design phase
Problems found earlier are easier to fix
Still not a good list of heuristics for games
Heuristic: Things the player needs to see should stand out. (i.e. everything the player needs to see needs to be big enough to be perceived)
The only object in this car chase game that can damage you is the tiny red dot just to the left of your car.
The object is too small for some players to see. The challenge should be evading the bullet, not seeing it in the first place.
Post-mortems Documentation of the experiences during the game
production, for new productions Rounded tests of the game, setting benchmarks for
the next game Community feedback
Collecting metrics for e.g. rankings, score board Running support and updating
Running tests of patch updates Client data: Fantastic wealth of metrical data! MMOs: Running tests of new content, data mining
on player activities etc. etc. etc.
Game testing is carried out in game testing labs
Laboratories provide a controlled setting, however, it also removes the player from his/her natural environment
Fundamental assumption of lab-based research:
Conditions within correspond to conditions without
3 general types, often mixed Focus group labs: Any multi-media
capable room
Playtesting labs: Banks of computers with wall partitions, open setups for multi-player
Usability labs: Two rooms connected by one-way mirrors and cameras. Tester in one room, observers in the other
Networked PCs
One-way mirrors
Room 1
Room 2 Room 3
Observation area
Observation area
Ceiling microphones
CONE – 180 degree WTW screen
High-end PCs
Coffee maker
Motion Capture Suits, VR Gloves, Stereoscopic goggles, head mounted displays ...
More high-end PCs
QA is vital in the games industry for numerous reasons, e.g.
- In meeting high consumer standards and minimizing returns- and support costs
- “No longer confined to the production domain of bug-hunting, testers are expanding into the territory of usability and focus group testing to help ensure higher customer expectations are met” [Gamasutra] (and the expectations are rising dramatically).
- QA is a breeding ground for talent of all types and a key internal recruitment resource
Game testing is not game design by committee
Input that comes from game testing should not be used and implemented without careful consideration
Testing may indicate something should change, but if your gut instinct say no, think about it first
You cannot please everyone
Game testers usually have more experience than focus group participants - their opinions should matter more Use them for more than bug hunting
QA has low status in the industry – game designers do not always listen or care No easy way to solve this Studios such as Lionhead are changing this
perspective in the industry
Game characters/avatars have: Visually developed Behaviour emphasises character theme Audio
Game characters/avatars often lack: Distinct, defined personalities Backgrounds In-depth integration into the game world
Of course there are exceptions!
“… Full character design, but with a necessarily one-dimensional personality so that the player can flesh out its motivations. The trick is to strike a balance between establishing the actor’s personality without letting that personality disturb the player” [Guard].
“At the end of the day, a game character shouldn't have anything more than superficial personality traits since, whatever the point of view, the player needs to retain as much control as possible.” [Rolling & Morris].
Question: Could characters with personalities, backgrounds etc. be useful in
Game design?
Hypothesis: Game characters with distinct personalities and backgrounds different from the player will prevent players from being entertained by
and utilize the characters
Follow-up questions: 1. If hypothesis is disproven, how broad the solution space for the design
of complex characters? 2. Furthermore, are there any character elements that should be avoided
in order not to alienate players?
Empirical testing of game character-player interaction
Focus on multi-player games, across digital/tabletop format Non-digital RPG Digital RPG Digital RPG with GM
Focus on Role-Playing Games (RPGs) – obvious target genre for character development
Potential problem: Need a measure for checking that players comprehend their characters (otherwise hypothesis cannot be disproven/proven)
Assumptions:
1.Laboratory conditions do not affect playstyle
2.Sample is representative of the population
3.Variables known and controlled
Character design Recall: Requirements on logical and
consistent approach
Approach: Used the EPAQ model as a foundation for
defining personality/background elements Integrated characters via game story Used popular D20 system to define
stats/rules-components All characters designed using same template
The EPAQ Model
Describes personality via adjectives/behaviors on a sliding scale
Agency-Communion form core of scale, both have positive and negative features E.g. degree of self-assuredness Degree of dependency on others
Unmitigated communion and unmitigated Agency are unhealthy extremes E.g. cannot be happy without others being happy Egoistical to the detriment of others
A character: Old-fashioned army lieutenant
C character: Compassionate reporter
UA character: Egomaniacal, cynical wartime cameraman
UC character:- Self-sacrificing politicianMIX character: Conflicted mascot of a major soft-drink
company
Obtaining player-character personality differences:
Obtained by comparing EPAQ point scores of characters and players
Across 4 components of EPAQ model (UA, UC, C, A)
Analyzed total of approx. 140 player-character pairs across 3 game setups
Measures:
FUN model [modified from Newman 2005] Multi-component measure of the quality of the gaming
experience. Includes immersion/engagement
SYMPA (player-character engagement) Experience (for each platform) Group Dynamics (player-player) Other questionnaires (various character-evaluation
aspects)
Coding of transcribed verbal & chat communication Game logs of chat and behavior Recording (audio-visual) of in-game behavior
Performed an initial pilot experiment Used to verify experiment setup and
procedure
Qualitative and quantitative methods used
Questionnaires, recordings, game logs, transcription and coding, interviews, focus groups …
Correlation, STDEV, multi-variate statistics, ANOVA, cluster methods, factor analysis
Data evaluation: Internal consistency of questionnaire constructs Variance across results Correlation of results
Qualitative: Interviews etc. Formed second venue of information, acting as
a qualifier on quantitative methods
No correlation between the personality of the character, that of the player and the FUN or SYMPA constructs
Indicates that (adult) players are not negatively impacted by playing characters with personalities different than their own (or similar to).
No indication that a complex character ruins the gaming experience in any of the three formats investigated
Hypothesis has been disproven
Interviews show players comprehended their characters (qualitative methods good for this type of problem)
Game formats impacts on the way that players utilize characters Games activate/promote different behavior and
character element use Players need to be prompted/have opportunity,
to activate the elements of characters in order to engage with these elements.
Broad character activation increases engagement with the gameplaying activity
Other results:
Player-character relationship is a key influence on the gaming experience Strong correlation between SYMPA and
FUN across formats
Experiment limitations:
Have looked at MPGs, not SPGs Have looked at RPGs – not FPS or similar games
Not a comparative study!• Conclusions do not say that characters with
complex psychologies are better than characters without
Next iteration of experiments could ideally be a comparative study