Saturday, October 26, 2013


I was converting a PNG (which represented partials sums of the Weierstrass elliptic function) to a GIF and was confused about why the PNG looked fine at low resolution but the GIF looked bad.

Then I learned that GIFs can only use 256 colors and we have to do something to map the many colors to the fewer colors. We could just use the closest available color in the new set of colors, but that results in sharp jumps and "color banding" where the colors change.
So instead we use (and ImageMagick's default is) dithering, which assigns pixels to a mixture of color values that locally average to the appropriate shade. I had a great time reading about this in ImageMagick's documentation! Here's a description of the default method:
The pixel is assigned the color closest to that pixels value, and any difference between the pixels original color and the selected color, is saved and added to the next pixels color values (which is always a neighbouring pixel) before a new color is again selected. In this way any color variations between selected colors and the images original color is distributed to the other pixels in the same area. The result is that while only specific colors will be assigned to the final image, the same basic overall color for that area will closely match the original image.
When there aren't enough dots, the picture looks blurry.
Resolution: 800x600 Resolution: 200x150
PNG (lots of colors)
no dither (GIF, 256 colors)
default dither (GIF, 256 colors)
Here's my code on Github, and here are the ImageMagick commands that produced these:
convert -resize 100% -dither None ./zoomed_in/Weierstrass001.png ./res100no_dither.gif
convert -resize 50% -dither None ./zoomed_in/Weierstrass001.png ./res50no_dither.gif
convert -resize 100%  ./zoomed_in/Weierstrass001.png ./res100default_dither.gif
convert -resize 25%  ./zoomed_in/Weierstrass001.png ./res25default_dither.gif
convert -resize 100%  ./zoomed_in/Weierstrass001.png ./res100many_colors.png
convert -resize 25%  ./zoomed_in/Weierstrass001.png ./res25many_colors.png

Wednesday, October 16, 2013

A Bayesian Model for a Function Increasing by Chi-Squared Jumps (in Stan)

This post will describe a way I came up with of fitting a function that's constrained to be increasing, using Stan. If you want practical help, standard statistical approaches, or expert research, this isn't the place for you (look up “isotonic regression” or “Bayesian isotonic regression” or David Dunson, whose work Andrew Gelman pointed me to). This is the place for you if you want to read about how I thought about setting up a model, implemented the model in Stan, and created graphics to understand what was going on.

Here's the source for this post, including markdown and and R code, and here's the Stan model it uses.

Many thanks to Naftali Harris for comments and insights on an earlier version.

I recently read a paper that Andrew Gelman wrote back in 1996 about theoretical considerations that can restrict model choice even “in the abence of data with minimal applied context”. As one example, he describes the problem fitting a function \( f(t) \) that is known to be increasing and where \( f(0)=0 \) and \( f(1)=1 \) are known. We estimate the function given noisy observations of its value at \( (N-1) \) equally spaced sample points \( \theta_i=f(\frac{i}{N}) \) (for \( i \in \{1,2,3,...,N-1\} \)).

What prior should we use for the \( \{\theta_i\} \)? It might seem like a reasonable “uninformative” prior would make the \( \{\theta_i\} \) independent and uniform on \( [0,1] \), with the restriction that non-increasing sets of \( \{\theta_i\} \) are given zero probability. But this turns out badly! Gelman notes that this prior on the \( \{\theta_i\} \) is the same as the distribution of the order statistics of a samples of size \( N-1 \) from the uniform distribution on \( [0,1] \). As the number of points in the discretization increases, the mass of the prior concentrates around a straight line, which will overwhelm the data.

To illustrate, here are some samples from this distribution, for various \( N \):

plot of chunk sample_from_restricted_uniform
This weird! Under the uniform distribution, each (legal) path is equally likely. That's why if we fit this model to noisy data, the maximum likelihood estimate (which is the mode of the posterior) would be reasonable. But it turns out that for large \( k \), almost all legal paths are basically straight lines, so the bulk of the posterior would be basically a straight line no matter what the data say. It reminds me a bit of statistical mechanics, which is largely about micro-level randomness determining macro-level properties. (I'm no expert, but I recommend this book.)