Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
RESEARCH POSTER PRESENTATION DESIGN © 2015
www.PosterPresentations.com
One of the most important sales metrics for companies to track and analyzeis bookings revenue. Quite simply, this represents how much (in both valueand deal count) was booked by the customer partners. Companies fail togive accurate revenue forecasts to Wall Street impacting stockprice. Understanding these forecasts is essential for marketplace expansionand determining the most effective product mix per opportunity. Forecastsare vital in estimating demand and creating effective channels. The CEOexecutive committee needs this information at two levels. First, they needto know how well the company is doing on a total sales basis. Secondly,they need to know how the respective sales orgs are doing againsttheir quota and bookings goal. A predictive forecast model is developedwhich is trained with the historic earnings data aggregated by product typeand category, region and split by services and product revenue that gives arange of expected revenue values for the forthcoming months and quarters.
ABSTRACT
SUPPORTING DATASETSThe raw data is ingested into the Snowflake data store which stores
all the Enterprise data. After data preprocessing, it is consumed by thepredefined events of the compute cluster comprised of a Apache sparkmulti-node cluster on the AWS EMR. The time-series model is developedon the Apache Spark cluster using the Spark Streaming and MLibframeworks. We have leveraged the features and compatibility of ApacheZeppelin with the Spark cluster for real time visual analytics.
SYSTEM ARCHITECTURE
We have developed this model considering these two main applications,1. Obtain an understanding of the underlying data and major trend points
affecting the growth of the business.2. Fit a predictive model and proceed to forecasting, monitoring or even
feedback and alerts.The model is mainly developed for the following applications:• Economic Forecasting• Sales Forecasting• Budgetary Analysis• Stock Market Analysis• Yield Projections• Process and Quality Control• Inventory Studies• Workload ProjectionsHowever in this project because of the difficulties in aggregating data in a time limitation, we have limited our exploration to Sales forecasting and Bookings predictions.
APPLICATION & USE CASESCONCLUSION
Actuals Vs Predictions
REFINING THE MODEL
The project can be further improved by adding variable parameters withinthe same model using ARIMA-X method and decision tress.Some additional parameters that can be introduced to this model are• Customer rep productivity• Opportunity ranking• Sales pipeline
ACKNOWLEDGEMENTS
• Shantanu Biswas & Badrinarayan Jagannathan for their unending support, Pavan Rangavajhala, Sreenivasa Pocha, Suman Shanthakumarand the Enterprise Data Platform team.
• FB opensource community for their work on fbprophet.• Apache Zeppelin and Apache Spark open source dev community.• AWS Infrastructure team• Ben Chen, Grace Ng and the University Talent Program team.
DATA PREPROCESSING
1Intern, 2Director, Enterprise Data Platform, Juniper Networks, California, USARaghunandana Jayarama Reddy1, Ameet Ubhayaker2
Forecasting Bookings using Machine Learning
Data preparation and preprocessing is an importantrequirement for this model and involves a lot of querying and ETLprocess. In other words, whenever the data is gathered from differentsources it is collected in raw format which is not feasible for theanalysis. All the ingested data from the core applications and 3rd partyapps are stored into Snowflake, the Enterprise Data Store for ETLprocessing and analytics through the Amazon S3 buckets viaMiddleware.The data preprocessing step involves cleaning, integration, reduction,transformation and is important for mainly the reasons below:• Inaccurate data (missing data)• The presence of noisy data (erroneous data and outliers)• Inconsistent data
Data Integration
Data Cleaning
Data Reduction
Data Transformation (Normalization, Aggregation, Generalization)
Figure 3: System architecture diagram
Figure 1: Datasets used in the forecast analytics model
Figure 2: The data preprocessing workflow
1. Visual Interactive Pipeline:We introduce a visual-interactive system for the generation of time
series preprocessing pipelines, the conceptual workflow is shown in Figure3. In the following, we call the preprocessing pipeline a time seriesscenario. We choose a generalizable approach for time seriespreprocessing. Beginning with the selection of raw data a variety ofpreprocessing operations can be added to the pipeline and (re-)arrangedin arbitrary order.
2. System and Views:We aim to make the different operations as exchangeable and
compatible as possible. Hence, the data model of our input time seriesconsists of a list of so-called time-value pairs, each containing a timestamp and a corresponding value. This data model is able to representvirtually all possible characteristics of time series data like non-equidistanttime stamps or missing values. Straight forward, the user can select thefavored normalization variant with a single click.• Views based on month, quarter and years
o Statistical viewo Detailed view
GENERAL PROCESS TO DERIVE FORECASTS
Figure 4: A generic workflow of the forecast model
MODEL FEATURES
RESULTS
Figure 5: Logistic regression compute results
Figure 6: Model training results
The following figures show the computation results for the algorithms being run on the model.
The following results are the visual plots extracted from thezeppelin notebook which is running on the EMR Spark computecluster.
Figure 7b: Forecast series with regressors
Figure 7a: Forecast series without regressors
As seen from the above figures, The default range values are often notappropriate and seem very large, but they can be reduced when theseasonality needs to fit higher-frequency changes, and generally be lesssmooth. Specifying custom seasonality trends, product sales in a specificgeographic area, identifying and normalizing the effect of outliers usingadditional regressors are the additional features implemented in themodel which are seen in the figure 7b.
The prediction ranges appeared to be accurate however with a certainpercentage Error(SSE) of upto ± 10%. The prediction ranges arerelatively large because of the comparatively smaller dataset size andthe number of parameters introduced to the model. However these timeseries data are affected by a number of parameters and the modelcannot predict the change in values if the parameters are notpredefined. The correctness of values are clearly dependent on thedefinition of variable parameters and the accuracy of data.
REFERENCES
1. Introduction to Time Series models2. Introduction to Time Series and Forecasting3. Modeling Techniques in Predictive Analytics with Python and R: A
Guide to Data Science4. Forecasting: principles and practice5. Time-Critical Decision Making for Business Administration6. Fbprophet, an opensource tool by FB open source community.
3. User-support for Parameter Setting:We give details about the parameterization of a single module,
which is a problem in itself. Each preprocessing module in the systemprovides ensembles of n alternative parameter values, as appropriate,whereupon n is a user parameter. The time series arising from alternativeparameterizations are visualized as line chart bundle in the detail view.
Figure 8: Experimentation results for different datasets