~/devreads

#statistics

29 posts

4 Apr 2019

19 May 2018

Dominic Steinitz 8 min read

Introduction This blog started off life as a blog post on how I use nix but somehow transformed itself into a “how I do data visualisation” blog post. The nix is still here though quietly doing its work in the background. Suppose you want to analyze your local election results and visualize them using a … Continue reading Cartography in…

haskellstatistics

25 Feb 2018

Dominic Steinitz 7 min read

Introduction For the blog post still being written on variatonal methods, I referred to the still excellent Bishop (2006) who uses as his example data, the data available in R for the geyser in Yellowstone National Park called “Old Faithful”. While explaining this to another statistician, they started to ask about the dataset. Since I … Continue reading Reproducibility and…

haskellstatistics

2 Jan 2018

Edwin Wise 5 min read

Recently, during a holiday lull, I decided to look at another way of modeling event stream data (for the purposes of anomaly detection). I’ve dabbled with (simplistic) event stream models before but this time I decided to take a deeper look at Twitter’s anomaly detection algorithm [1], which in turn is based (more or less) […]

big databigdatadata modelingstatistics

4 Jul 2016

Dominic Steinitz 6 min read

Introduction Recall from the previous post that the Hare growth parameter undergoes Brownian motion so that the further into the future we go, the less certain we are about it. In order to ensure that this parameter remains positive, let’s model the log of it to be Brownian motion. where the final equation is a … Continue reading Modelling an…

bayesianstatistics

1 May 2016

Dominic Steinitz 5 min read

Introduction This is a bit different from my usual posts (well apart from my write up of hacking at Odessa) in that it is a log of how I managed to get LibBi (Library for Bayesian Inference) to run on my MacBook and then not totally satisfactorily (as you will see if you read on). … Continue reading Fun with…

bayesianmachine learningstatistics

17 Apr 2016

Dominic Steinitz 2 min read

Introduction In their paper Betancourt et al. (2014), the authors give a corollary which starts with the phrase “Because the manifold is paracompact”. It wasn’t immediately clear why the manifold was paracompact or indeed what paracompactness meant although it was clearly something like compactness which means that every cover has a finite sub-cover. It turns … Continue reading Every Manifold…

bayesiansemi-riemannian manifoldsstatisticssymplectic manifolds

20 Jan 2016

Dominic Steinitz 19 min read

Introduction The equation of motion for a pendulum of unit length subject to Gaussian white noise is We can discretize this via the usual Euler method where and The explanation of the precise form of the covariance matrix will be the subject of another blog post; for the purpose of exposition of forward filtering / … Continue reading Particle Smoothing

bayesianhaskellstatistics

15 Jan 2016

Dominic Steinitz 4 min read

Introduction The equation of motion for a pendulum of unit length subject to Gaussian white noise is We can discretize this via the usual Euler method where and The explanation of the precise form of the covariance matrix will be the subject of another blog post; for the purpose of exposition of using Stan and, … Continue reading Inferring Parameters…

bayesianstatistics

6 Dec 2015

Dominic Steinitz 15 min read

Introduction Let be a (hidden) Markov process. By hidden, we mean that we are not able to observe it. And let be an observable Markov process such that That is the observations are conditionally independent given the state of the hidden process. As an example let us take the one given in Särkkä (2013) where … Continue reading Naive Particle…

bayesianhaskellmachine learningprobabilitystatistics

7 Jun 2015

Dominic Steinitz 1 min read

Here’s the same analysis of estimating population growth using Stan. data { int<lower=0> N; // number of observations vector[N] y; // observed population } parameters { real r; } model { real k; real p0; real deltaT; real sigma; real mu0; real sigma0; vector[N] p; k <- 1.0; p0 <- 0.1; deltaT <- 0.0005; sigma … Continue reading Population Growth…

bayesianmachine learningstatistics

6 Jun 2015

Dominic Steinitz 4 min read

Introduction Let us see if we can estimate the parameter for population growth using MCMC in the example in which we used Kalman filtering. We recall the model. And we are allowed to sample at regular intervals In other words , where is known so the likelihood is Let us assume a prior of then … Continue reading Population Growth…

bayesianhaskellstatisticsmarkov chain monte carlo

31 May 2015

27 Apr 2015

Dominic Steinitz 3 min read

Introduction Suppose you want to sample from the truncated normal distribution. One way to do this is to use rejection sampling. But if you do this naïvely then you will run into performance problems. The excellent Devroye (1986) who references Marsaglia (1964) gives an efficient rejection sampling scheme using the Rayleigh distribution. The random-fu package … Continue reading Rejection Sampling

haskellprobabilitystatistics

11 Mar 2015

Dominic Steinitz 12 min read

Introduction Simple models for e.g. financial option pricing assume that the volatility of an index or a stock is constant, see here for example. However, simple observation of time series show that this is not the case; if it were then the log returns would be white noise One approach which addresses this, GARCH (Generalised AutoRegressive … Continue reading Stochastic…

bayesianfinancehaskellstatistics

9 Sept 2014

Dominic Steinitz 7 min read

Summary An extended Kalman filter in Haskell using type level literals and automatic differentiation to provide some guarantees of correctness. Population Growth Suppose we wish to model population growth of bees via the logistic equation We assume the growth rate is unknown and drawn from a normal distribution but the carrying capacity is known and … Continue reading Fun with…

bayesianhaskellmachine learningprobabilitystatistics

23 Aug 2014

Dominic Steinitz 5 min read

Importance Sampling Suppose we have an random variable with pdf and we wish to find its second moment numerically. However, the random-fu package does not support sampling from such as distribution. We notice that So we can sample from and evaluate > {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-} > {-# OPTIONS_GHC -fno-warn-type-defaults … Continue reading Importance Sampling

bayesianhaskellstatistics

6 Aug 2014

Dominic Steinitz 6 min read

Introduction Suppose we have particle moving in at constant velocity in 1 dimension, where the velocity is sampled from a distribution. We can observe the position of the particle at fixed intervals and we wish to estimate its initial velocity. For generality, let us assume that the positions and the velocities can be perturbed at … Continue reading Fun with…

bayesianhaskellprobabilitystatistics

19 Jul 2014

Dominic Steinitz 3 min read

Suppose we wish to estimate the mean of a sample drawn from a normal distribution. In the Bayesian approach, we know the prior distribution for the mean (it could be a non-informative prior) and then we update this with our observations to create the posterior, the latter giving us improved information about the distribution of … Continue reading Fun with…

bayesianhaskellprobabilitystatistics

14 Jul 2014

lukaseder 1 min read

Imagine you want to collect detailed usage statistics to tune your Oracle database, e.g. if you want to have A-Rows and A-Time values in your execution plans (by default, Oracle only reports E-Rows and E-Time with “E” for “Estimated”. But usually, you will care more about the “A” for “Actual”). All you have to do … Continue reading Logon Triggers:…

sqllogon triggersoraclestatisticsstatistics level

15 Jun 2014

Dominic Steinitz 10 min read

This is really intended as a draft chapter for our book. Given the diverse natures of the intended intended audiences, it is probably a bit light on explanation of the Haskell (use of monad transformers) for those with a background in numerical methods. It is hoped that the explanation of the mathematics is adequate for … Continue reading Gibbs Sampling…

bayesianhaskellstatistics

9 Apr 2014

Dominic Steinitz 8 min read

Introduction It’s possible to Gibbs sampling in most languages and since I am doing some work in R and some work in Haskell, I thought I’d present a simple example in both languages: estimating the mean from a normal distribution with unknown mean and variance. Although one can do Gibbs sampling directly in R, it … Continue reading Gibbs Sampling…

bayesianhaskellstatistics

2 Apr 2014

Dominic Steinitz 4 min read

Introduction The other speaker at the Machine Learning Meetup at which I gave my talk on automatic differentiation gave a very interesting talk on A/B testing. Apparently this is big business these days as attested by the fact I got 3 ads above the wikipedia entry when I googled for it. It seems that people … Continue reading Student’s T…

haskellstatistics

10 Jan 2014

Dominic Steinitz 2 min read

I have recently started providing consultancy to a hedge fund and as far as I can see, R looks like it has a good set of libraries for this domain. In my previous job I used an embedded domain specific language in Haskell (Frankau et al. 2009). I’d like to be able to use Haskell … Continue reading Getting Financial…

statistics

23 Oct 2013

Dominic Steinitz 16 min read

I had a fun weekend analysing car parking data in Westminster at the Future Cities Hackathon along with Amit Nandi Bart Baddeley Jackie Steinitz Ian Ozsvald Mateusz Łapsa-Malawski Apparently in the world of car parking where Westminster leads the rest of UK follows. For example Westminster is rolling out individual parking bay monitors. Our analysis … Continue reading Parking in…

haskellstatistics

13 Oct 2013

Dominic Steinitz 12 min read

Preface The intended audience of this article is someone who knows something about Machine Learning and Artifical Neural Networks (ANNs) in particular and who recalls that fitting an ANN required a technique called backpropagation. The goal of this post is to refresh the reader’s knowledge of ANNs and backpropagation and to show that the latter … Continue reading Backpropogation is…

haskellmachine learningnumerical methodsstatistics

30 Apr 2013

Dominic Steinitz 7 min read

Introduction Having shown how to use automated differentiation to estimate parameters in the case of linear regression let us now turn our attention to the problem of classification. For example, we might have some data about people’s social networking such as volume of twitter interactions and number of twitter followers together with a label which … Continue reading Logistic Regression…

haskellmachine learningprobabilitystatistics

26 Apr 2013

Dominic Steinitz 6 min read

Introduction Automated differentiation was developed in the 1960’s but even now does not seem to be that widely used. Even experienced and knowledgeable practitioners often assume it is either a finite difference method or symbolic computation when it is neither. This article gives a very simple application of it in a machine learning / statistics … Continue reading Regression and…

haskellmachine learningprobabilitystatisticsuncategorized

31 Dec 2012

Henrik Warne 1 min read

The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog. Here’s an excerpt: About 55,000 tourists visit Liechtenstein every year. This blog was viewed about 170,000 times in 2012. If it were Liechtenstein, it would take about … Continue reading →

uncategorizedmetastatisticsstats