In this week’s issue, the final entry in our series of data preparation tutorials, we discuss the influence of outliers on our time series analysis. Dealing with outliers can be tricky, as they might result from a number of factors, from simple data entry errors to influential an unexpected “black swan” events. In this tutorial, we discuss the problem of outliers, how to detect them, and what we can do with them.
We start with a basic definition of outliers, which typically appear as a result of human error, instrument error, or natural deviations in populations. Then we explain a few means of testing for outliers and developing robust models capable of dealing with black swan events.