Agility for Big Data My journey implementing an Agile method to Big Data applications
Who I am
What is the hardest part about bringing agility to your big data applications?
The more data you give the business, the more questions they will ask Jose Carlos Eiras Served as CIO at Kraft Foods, Philip Morris, General Motors and DHL
Reporting over Workable Software
Reporting over Workable Software Problems experienced Customer dont know about they want until they see that Very long feedback cycle because of waiting for quality data Developing workable software is much more expensive than generating a report manually Workable software without data to use is even more expensive Switching cost between tasks is high, but the switching cost between projects is even higher Releasing a feature to All Users will result in more questions coming in, either because of data quality or other valid reasons Very low product success rate, lots of resources wasted and low team spirit
Reporting over Workable Software Solutions Focus on a very specific customer group and generate reports for them Collect data that targets a very specific customer group, like: parents in Box Hill area who work in IT Manually generated reports Data quality easier to control over a small amount of data Deliver reports to end users in the most cost effective way: eg face to face, email, or open source BI tools Get feedback and test hypothesis Focus on a subset of data while discovering the value of existing data Apply new methodology to a subset of data in a much more effective way Data quality easier to control on a subset of data Focus on one customer and get feedback from the client Test hypothesis
Reporting over Workable Software Solutions Data Freedom - Empowering people (example - data scientists exploring data values) Provide an SQL-like interface for users to easily access the data Provide semantic schema so that users can easily find where to find right data Document your data if necessary to help other people understand, decipher and use data Provide easy-to-use report designs for accessing data like Pentaho, Jasper Report Provide easy to use scheduling tools like Oozie, or general BI tools Mentally, developers should provide support for other people to freely explor data in ways they like In the scenario that data must be accessed through developers, those developers should think about what stops other users from accessing data Safeguard to prevent cluster overloading The overall result will be to increase the speed of feedback - dramatically
Reporting over Workable Software More to try Automated data quality control Explore different ways for the customer service team to address data quality issues Sampling data for product discovery programs Explore ways to test a hypothesis in an even quicker manner example: customer centric data collection and reporting Explore a wider scale of data freedom through web service
Continuous Delivery Continuous delivery, where to start? Problems: legacy systems, low unit test coverage, low functional/ integration test coverage, no acceptance testing, not enough testing data, and so on Start with an easy problem so that it is achievable and will help to build team trust Must have testing data and integration testing suites
Continuous Delivery Build pipeline //dev box//build//daily build server//alpha//beta//production// Testing Data - you will never cover all scenarios, so what do you do? Hybrid data fixtures with data manual produced, generated, and from production Versioning Data Keep data clean as code, refactor your data often Backward and forward compatibility Vertical slicing story, architecture and teams NoSQL database engines Start continuous delivery for some components NOW and learn from
Deployment != Release Separate deployment from release Tips Data batch toggles Feature toggles Customer/ Country/ Region releases Manually generated report area Dont forget about exclusive toggles Leave release up to the production manager. They release and they organize press releases.
Q&A What is the hardest part about bringing agility to your big data applications?
My Personal Information LinkedIn Profile: http://au.linkedin.com/pub/charlie-cheng/24/92/978/ Twitter: @charlie_cheng Are you looking for some training and find it is hard to select the right one? We are running a customer discovery program on it at StudyIsFun. Please contact me at firstname.lastname@example.org if you are interested.