A data scientist, according to Harvard Business Review, is “the sexiest job of the 21st century”, and it’s pure magic when you see how all these raw data are turned into a clear prediction with definite figures. custodia in pelle per samsung j7 samsung s8 custodia strass In this article, we’re going to show you the magical process of predicting sales step-by-step. custodia tab s2 9 7 samsung antiurto custodia cellulare samsung galaxy a5 2017 chaussures ugg pour femme It started when we got an idea to take part in a Kaggle competition for data scientists. custodia per samsung galaxy s6 edge plus custodia protettiva per samsung galaxy tab e custodia silicone s7 edge samsung custodia galaxy s 4 custodia protettiva samsung a2 tab custodia flip samsung j5 pro custodia samsung s9 nero custodia galaxy ace 3 The task was to help poor Rossmann store managers create effective schedules for their employees, basing on the predicted sales and the average check. custodia galaxy a5 samsung phonenatic custodia samsung galaxy j3 custodia galaxy a7 2015 samsung custodia galaxy tab s3 nike flyknit Here we have to say some words about Rossmann stores. custodia samsung galaxy a3 silicone custodia a libro samsung j7 2017 custodia tab e samsung 9 6 unicorni samsung a 8 custodia anticaduta custodia galaxy 7 edge samsung tab a6 custodia orso timberland femme Rossmann is the second largest drugstore chain in Germany, founded in 1972 by Dirk Rossmann. samsung j7 2016 custodia custodia samsung galaxy a 3 2016 custodia samsung s9 plus rigida custodia samsung plus custodia x samsung j5 2016 blu samsung s8 custodia libro leopardo custodia galaxy tab pro It operates over 3,000 drug stores in 7 European countries and employs more than 28,000 people. custodia j5 2016 samsung custodia per samsung s 7 s8 plus custodia samsung air jordan en soldes Rossmann offers over 17,500 different items in its biggest retail outlets. samsung j5 2015 custodia custodia samsung a3 2016 originale custodia a8 2018 samsung custodia trasparente per samsung a5 custodia samsung j5 2017 antiurto custodia samsung moto custodia impermeabile samsung a5 2017 Besides the pharmaceutical goods, you can also find pet food, a big choice of different wines, toys and stationery as well. custodia samsung a5 2017 blu custodia samsung galaxy edge s6 samsung galaxy j7 2016 custodia vetro samsung s8 plus custodia dura custodia samsung a5 2016 trasparente galaxy a8 2018 custodia
The photo Rossmann Innenansicht eines Ladens by Jan Hagelskamp1 is licensed under CC-BY 4.0
Before getting the solution, Rossmann store managers had to predict the daily sales and the number of customers for up to six weeks in advance; while store sales, in their turn, can be influenced by many factors, such as promotions, competitors in the area, school and state holidays, seasonality, and locality. custodia personalizzata samsung j5 2016 galaxy s5 custodia custodia per samsung tab e 9 6 custodia samsung s7 edge custodia a libro custodia a8 samsung 2018 problin As there were thousands of individual managers to predict sales based on their unique sets of circumstances, the accuracy of such forecasts was rather varied. custodia portafoglio samsung galaxy j5 2016 samsung s7 edge custodia full body carbonio custodia a libro samsung galaxy s5 Therefore, the task was to make a reliable sales forecast (including the number of customers and the average check) for 1,115 stores across Germany using which Rossmann store managers would be able to create effective staff schedules to increase their productivity and motivation. custodia samsung s8 plus nera custodia galaxy tab 9 6 custodia galaxy tab 3 8 0 custodia originale samsung j5 custodia samsung j5 2017 disney quad lock galaxy s8 custodia Step 1 We were provided with historical sales data for 1,115 Rossmann stores. custodia fusion galaxy a8 2018 custodia samsung 10 1 custodia samsung tab a solo dietro samsung galaxy tab s2 t819 custodia samsung s9 custodia The data were provided in the CSV format, the selection contained 15 attributes, such as customers, assortment, store type, state holiday, sales etc.) We added attributes DAY and MONTH extracted from the data given (based on the timestamp). custodia samsung note 4 edge custodia s8 samsung plus silver custodia s6 edge samsung For the following hypotheses check we excluded the attributes, influence of which was obvious, e.g. custodia galaxi s8 plus custodia samsung per s7 edge custodia samsung s7 edge libro custodia trasparentr integra samsung j3 custodia smartphone samsung j3 2017 custodia batteria samsung s8 plus custodia libro finestra j3 2017 samsung girlyard custodia samsung galaxy s8 asics gel kinsei 6 if the store was open or closed on a particular weekday. samsung j5 2017 custodia vetro samsung xcover 4 custodia custodia antiurto samsung s7 custodia trasparente samsung j5 2017 custodia gran neo plus samsung As a result, every attribute left in the selection, made a hypothesis on whether this single attribute influences the number of customers and the average check. custodia silicone samsung j5 2016 custodia tablet samsung tab s2 9 7 custodia samsung s8 donna custodia samsung s3 neo trasparente custodia galaxy tab s 8 4 galaxy tab a 6 custodia custodia galaxy s9 blu adidas ultra boost
Despite the fact we got the structured data, we used complex algorithms of data processing, as we had to carry out a large number of transactions.
Taking this into account, we proceeded to the next step – checking the hypotheses. custodia a libro samsung a3 2017 custodia a5 2016 samsung custodia samsung galaxy core prime g360 custodia moto samsung a5 samsung 7 edge custodia samsung galaxy s7 edge custodia flip custodia a libro per samsung Step 2 We visualized the hypotheses, based on the attributes chosen, and made a conclusion that some of them did not influence the result, so they were excluded from the selection (e.g. custodia samsung tablet tab e custodia samsung galaxy j5 2016 pelosa coniglio samsung s 9 custodia custodia antiurto galaxy j3 2017 custodia tab 3 7 pollici samsung SCHOOL HOLIDAYS). custodia samsung galaxy j3 2017 oro custodia originale per samsung j7 2017 custodia tab samsung s2 tab galaxy s2 8 0 custodia samsung j7 2016 custodia a libro On the other side, some hypotheses required the introduction of other parameters to get more accurate results. custodia tablet samsung sm t585 2799 samsung tab a sm 350 custodia custodia samsung tab 4 8 0 nike air max pas cher asics soldes For instance, when we visualized the attribute STORE (all shops data), we got the data which showed that on Sunday Rossmann shops were attended by fewer customers than on the other days. custodia samsung s6 edge stitch However, when we included the attribute OPEN, it turned out that only a few shops were open on Sundays, and they were visited by a bigger number of customers than on weekdays. custodia samsung gt s7580 custodia a portafoglio samsung s9 custodia galaxy tab a 10 1 2018 custodia samsung s5 neo rossa samsung custodia s8 plus custodia samsung note 3 amazon prime custodia samsung note 5 edge However, the average check was lower. custodia galaxy tab 8 0 custodia x samsung a5 2017 custodia per samsung j7 2017 per ragazze rosa custodia samsung j7 2017 pro
The graph shows how promo offers influence the sales
Based on the examples described above, we made a conclusion that not all the given attributes influence the sales equally, so we made some corrections (added and excluded some attributes) and finally got 4 attributes which influenced the sales significantly. custodia pellicola samsung galaxy s6 custodia in pelle per samsung s9 plus custodia gomma samsung a5 2016 spigen 587cs22061 custodia per samsung galaxy note 8 nero 6 3 samsung custodia s7 edge custodia libro samsung s9 plus nike air max 2017 They were:
- promo – indicates whether a store is running a promo on that day;
- year/week- describes the year and calendar week;
- state holiday – indicates a state holiday;
- annual sales increase – shows how the sales increase each year.
When we were checking the hypotheses, we found quite a number of interesting facts and correlations which are not directly related to this task, but might be used by Rossmann marketing department.
The graphs show how different holidays influence the number of customers and the average check
Step 3 Our next step was to choose a model type. custodia galaxy j5 2016 portafoglio custodia antiurto samsung a5 2017 custodia galaxy s5 a libro Chaussures adidas running At first, we tried a model of linear regression but it didn’t work out as it had a margin of error of 40%. samsung xcover 4 custodia custodia samsung galaxy j3 custodia a libro per samsung j7 2016 custodia samsung s9 plus a libro custodia samsung grand neo Then we tried a model of a decision tree, however, the result was still inaccurate, so it ended up with a model of a decision forest which suited well for our type of data and the task given. custodia cellulare samsung j5 2017 custodia a8 2018 samsung silicone juventus custodia subacquea samsung a5 custodia samsung j5 2016 gatti Step 4 The model was built and trained (here we used a scikit-learn library) and we had to make some adjustments in order to improve its training result and thus to increase its accuracy. custodia s6 edge plus samsung custodia portafoglio samsung s5 galaxy neo tuttoinunclick custodia cover samsung s6 edge ugg olsen To increase the accuracy, we had to change the model we trained. custodia galaxy s8 plus pelle galaxy s6 custodia ugg pas cher We could do it by using lognormal distribution instead of a usual one to get the required accuracy. samsung galaxy s6 custodia custodia samsung galaxy s6 sm g920f custodia tablet samsung 9 7 pollici custodia galaxy tab a 6 custodia s8 samsung plus spigen custodia samsung s9 plus rosso nike air tn With the adjustments described above, we got a result of 88% accuracy, which we found satisfactory for that very business task, while could see the ways of further improvement. custodia 360 samsung s7 custodia per samsung galaxy j5
Acquiring the required accuracy is a time consuming process, as it is always necessary to optimize a machine learning algorithm and check the result. custodia tablet samsung a6 2018 custodia samsung s9 plus audi samsung s7 edge custodia vetro custodia samsung a3 2018 samsung galaxy tab a6 custodia con tastiera samsung a5 2017 custodia spigen custodia samsung a5 2017 skull custodia galaxy note pro 12 2 adidas chaussures Although we have reached 88% accuracy, the result can be improved if there is more time.
As a result, we have created a model, using which Rossmann store directors can predict sales for 6 weeks in advance (due to the number of customers and the average check). custodia per galaxy j5 2016 custodia tablet samsung e custodia rigida samsung s7 galaxy s7 custodia portafoglio custodia a libro galaxy j5 custodia x galaxy a5 16
botte ugg pas cher adidas zx flux adidas superstar Following on from this prediction they will be able to create an effective schedule for their employees.
The graph shows the difference between the actual sales and the predicted ones for a selected calendar month
Our next step might be the creation of a visual interface for predicting sales, so it will be possible to enter a random weekday e.g. custodia tastiera galaxy s7 custodia libro samsung j5 2017 prime custodia libro samsung note 4 custodia samsung gt s7580 3357 custodia samsung a5 2015 usa custodia samsung s8 plus brillantini custodia samsung j5 2017 sottile custodia uag samsung custodia samsung g360 timberland discount the first Tuesday of June 2017 and predict how many customers will attend the exact store and how much money they will spend there.