Forecasting Bookings using Machine LearningGuide to Data Science 4.Forecasting: principles and practice 5.Time-Critical Decision Making for Business Administration 6.Fbprophet, an

RESEARCH POSTER PRESENTATION DESIGN © 2015

www.PosterPresentations.com

One of the most important sales metrics for companies to track and analyzeis bookings revenue. Quite simply, this represents how much (in both valueand deal count) was booked by the customer partners. Companies fail togive accurate revenue forecasts to Wall Street impacting stockprice. Understanding these forecasts is essential for marketplace expansionand determining the most effective product mix per opportunity. Forecastsare vital in estimating demand and creating effective channels. The CEOexecutive committee needs this information at two levels. First, they needto know how well the company is doing on a total sales basis. Secondly,they need to know how the respective sales orgs are doing againsttheir quota and bookings goal. A predictive forecast model is developedwhich is trained with the historic earnings data aggregated by product typeand category, region and split by services and product revenue that gives arange of expected revenue values for the forthcoming months and quarters.

ABSTRACT

SUPPORTING DATASETSThe raw data is ingested into the Snowflake data store which stores

all the Enterprise data. After data preprocessing, it is consumed by thepredefined events of the compute cluster comprised of a Apache sparkmulti-node cluster on the AWS EMR. The time-series model is developedon the Apache Spark cluster using the Spark Streaming and MLibframeworks. We have leveraged the features and compatibility of ApacheZeppelin with the Spark cluster for real time visual analytics.

SYSTEM ARCHITECTURE

We have developed this model considering these two main applications,1. Obtain an understanding of the underlying data and major trend points

affecting the growth of the business.2. Fit a predictive model and proceed to forecasting, monitoring or even

feedback and alerts.The model is mainly developed for the following applications:• Economic Forecasting• Sales Forecasting• Budgetary Analysis• Stock Market Analysis• Yield Projections• Process and Quality Control• Inventory Studies• Workload ProjectionsHowever in this project because of the difficulties in aggregating data in a time limitation, we have limited our exploration to Sales forecasting and Bookings predictions.

APPLICATION & USE CASESCONCLUSION

Actuals Vs Predictions

REFINING THE MODEL

The project can be further improved by adding variable parameters withinthe same model using ARIMA-X method and decision tress.Some additional parameters that can be introduced to this model are• Customer rep productivity• Opportunity ranking• Sales pipeline

ACKNOWLEDGEMENTS

• Shantanu Biswas & Badrinarayan Jagannathan for their unending support, Pavan Rangavajhala, Sreenivasa Pocha, Suman Shanthakumarand the Enterprise Data Platform team.

• FB opensource community for their work on fbprophet.• Apache Zeppelin and Apache Spark open source dev community.• AWS Infrastructure team• Ben Chen, Grace Ng and the University Talent Program team.

DATA PREPROCESSING

1Intern, 2Director, Enterprise Data Platform, Juniper Networks, California, USARaghunandana Jayarama Reddy1, Ameet Ubhayaker2

Forecasting Bookings using Machine Learning

Data preparation and preprocessing is an importantrequirement for this model and involves a lot of querying and ETLprocess. In other words, whenever the data is gathered from differentsources it is collected in raw format which is not feasible for theanalysis. All the ingested data from the core applications and 3rd partyapps are stored into Snowflake, the Enterprise Data Store for ETLprocessing and analytics through the Amazon S3 buckets viaMiddleware.The data preprocessing step involves cleaning, integration, reduction,transformation and is important for mainly the reasons below:• Inaccurate data (missing data)• The presence of noisy data (erroneous data and outliers)• Inconsistent data

Data Integration

Data Cleaning

Data Reduction

Data Transformation (Normalization, Aggregation, Generalization)

Figure 3: System architecture diagram

Figure 1: Datasets used in the forecast analytics model

Figure 2: The data preprocessing workflow

1. Visual Interactive Pipeline:We introduce a visual-interactive system for the generation of time

series preprocessing pipelines, the conceptual workflow is shown in Figure3. In the following, we call the preprocessing pipeline a time seriesscenario. We choose a generalizable approach for time seriespreprocessing. Beginning with the selection of raw data a variety ofpreprocessing operations can be added to the pipeline and (re-)arrangedin arbitrary order.

2. System and Views:We aim to make the different operations as exchangeable and

compatible as possible. Hence, the data model of our input time seriesconsists of a list of so-called time-value pairs, each containing a timestamp and a corresponding value. This data model is able to representvirtually all possible characteristics of time series data like non-equidistanttime stamps or missing values. Straight forward, the user can select thefavored normalization variant with a single click.• Views based on month, quarter and years

o Statistical viewo Detailed view

GENERAL PROCESS TO DERIVE FORECASTS

Figure 4: A generic workflow of the forecast model

MODEL FEATURES

RESULTS

Figure 5: Logistic regression compute results

Figure 6: Model training results

The following figures show the computation results for the algorithms being run on the model.

The following results are the visual plots extracted from thezeppelin notebook which is running on the EMR Spark computecluster.

Figure 7b: Forecast series with regressors

Figure 7a: Forecast series without regressors

As seen from the above figures, The default range values are often notappropriate and seem very large, but they can be reduced when theseasonality needs to fit higher-frequency changes, and generally be lesssmooth. Specifying custom seasonality trends, product sales in a specificgeographic area, identifying and normalizing the effect of outliers usingadditional regressors are the additional features implemented in themodel which are seen in the figure 7b.

The prediction ranges appeared to be accurate however with a certainpercentage Error(SSE) of upto ± 10%. The prediction ranges arerelatively large because of the comparatively smaller dataset size andthe number of parameters introduced to the model. However these timeseries data are affected by a number of parameters and the modelcannot predict the change in values if the parameters are notpredefined. The correctness of values are clearly dependent on thedefinition of variable parameters and the accuracy of data.

REFERENCES

1. Introduction to Time Series models2. Introduction to Time Series and Forecasting3. Modeling Techniques in Predictive Analytics with Python and R: A

Guide to Data Science4. Forecasting: principles and practice5. Time-Critical Decision Making for Business Administration6. Fbprophet, an opensource tool by FB open source community.

3. User-support for Parameter Setting:We give details about the parameterization of a single module,

which is a problem in itself. Each preprocessing module in the systemprovides ensembles of n alternative parameter values, as appropriate,whereupon n is a user parameter. The time series arising from alternativeparameterizations are visualized as line chart bundle in the detail view.

Figure 8: Experimentation results for different datasets

https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm

http://amzn.to/2rY3Wrg

http://amzn.to/2qEgqA7

https://www.otexts.org/book/fpp

http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm

Documents

Forecasting Bookings using Machine LearningGuide to Data Science 4.Forecasting: principles and practice 5.Time-Critical Decision Making for Business Administration 6.Fbprophet, an