项目作者: vinay-jaju

项目描述 :
Outlier/Anomaly detection using Prophet and Altair
高级语言: Jupyter Notebook
项目地址: git://github.com/vinay-jaju/Outlier-Anomaly-detection.git
创建时间: 2019-06-05T12:24:19Z
项目社区:https://github.com/vinay-jaju/Outlier-Anomaly-detection

开源协议:

下载


Anomaly detection using Prophet and Altair

View/Fork on Kaggle

Packages-used: fbprophet and altair

Data Used:

Data on sunspots since 1749: Sunspots.txt

Using Prophet:

  • Prophet expects data to have two columns

    • Timestamp/Date - ds
    • Variable to check for Anomaly/Forecasting - y

Fit and Predict:

  1. def fit_predict_model(dataframe, interval_width = 0.99, changepoint_range = 0.8):
  2. m = Prophet(daily_seasonality = False, yearly_seasonality = False,
  3. weekly_seasonality = False,
  4. seasonality_mode = 'multiplicative',
  5. interval_width = interval_width,
  6. changepoint_range = changepoint_range)
  7. m = m.fit(dataframe)
  8. forecast = m.predict(dataframe)
  9. forecast['fact'] = dataframe['y'].reset_index(drop = True)
  10. print('Displaying Prophet plot')
  11. fig1 = m.plot(forecast)
  12. return forecast
  13. pred = fit_predict_model(clean_df)

Detecting Anomalies:

  • The light blue boundaries in the above graph are yhat_upper and yhat_lower.
  • If y value is greater than yhat_upper and less than yhat lower then it is an anomaly.
  • Also getting the importance of that anomaly based on its distance from yhat_upper and yhat_lower.
    ```
    def detect_anomalies(forecast):
    forecasted = forecast[[‘ds’,’trend’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’, ‘fact’]].copy()

    forecast[‘fact’] = df[‘y’]

    forecasted[‘anomaly’] = 0
    forecasted.loc[forecasted[‘fact’] > forecasted[‘yhat_upper’], ‘anomaly’] = 1
    forecasted.loc[forecasted[‘fact’] < forecasted[‘yhat_lower’], ‘anomaly’] = -1

    anomaly importances

    forecasted[‘importance’] = 0
    forecasted.loc[forecasted[‘anomaly’] ==1, ‘importance’] = \

    1. (forecasted['fact'] - forecasted['yhat_upper'])/forecast['fact']

    forecasted.loc[forecasted[‘anomaly’] ==-1, ‘importance’] = \

    1. (forecasted['yhat_lower'] - forecasted['fact'])/forecast['fact']

    return forecasted

pred = detect_anomalies(pred)
```

Visualize the results: