0xDE
01 October 2007 @ 08:18 pm

I saw this afternoon an interesting talk by Lawrence Brown on baseball statistics, as part of the UCI statistics seminar.

The title of the talk is "In-Season Prediction of Batting Averages: A Field-test of Simple Empirical Bayes and Bayes Methodologies". In simpler words, the problem Brown considers is the following: at a certain point in the season, your favorite player has accumulated a certain batting average (ratio of the number of hits he made to the number of times he was at bat). What should you predict as the probability of a hit for his next at-bat?

Read more... )
 
 
0xDE
03 April 2007 @ 09:22 pm
Thought of the day: as described in Regression Depth and Center Points, Tukey depth (a combinatorial description of the quality of a statistical estimate of the location of a cloud of points) and regression depth (a similar description of the quality of fit of a hyperplane to a set of points) can both be described in terms of distances in the dual graph to an arrangement: For Tukey depth, the point to be evaluated is dual to a hyperplane, and its depth is the length of the shortest path to some particular chamber "at vertical infinity" in the dual arrangement, while for regression depth the hyperplane to be evaluated is dual to a point in some chamber and its depth is the length of the shortest path to some particular plane "at infinity" in the arrangement.

But hyperplane arrangements are a special case of partial cubes, graph distance is defined more generally in any partial cube, and hyperplanes in arrangements generalize to Djokovic classes in partial cubes (aka tokens in media). So, Tukey depth and regression depth both have natural definitions in partial cubes, at least once a vertex or Djokovic class is designated as "infinite": they are just the graph distance from this infinite object.

I'm not sure what this is good for, but maybe applying the same definitions to other partial cubes will lead to interesting notions of robust statistical estimation for the systems modeled by those graphs.
 
 
0xDE
02 August 2006 @ 04:50 pm
Six of last year's ten most frequently stolen car models are Acura Integras. My 1990 model doesn't make the list, but probably only because there are fewer of them on the road than the more recent ones. At least it's a stick, so the thieves who only know how to drive automatic will be stuck... Time to start locking the door when I leave it parked around town? (Via Fark)
 
 
0xDE
10 April 2006 @ 09:30 pm
Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition. When my students first saw this title they were sure it had been caused by a spellchecker run amok, and they spent some time and amusement trying to figure out what "pants" really means. It's really pants, and I don't mean the British slang for "bad": a pair of pants is a topological surface with three boundary curves. I'd copy the abstract here but it's only a link away. It's on approximation algorithms for hierarchical clustering, with various definitions of what it means to be a good clustering, anyway.

This is the paper for which I needed the entropy inequality that I was discussing here a while back. The inequality comes up in proving the approximation ratio for one of the problems, in that my algorithm has solution quality upper bounded by one kind of entropy and the problem has solution quality lower bounded by the other kind of entropy.

I also have slides from a departmental seminar on the same results available. They used to be dry and technical but Mike Goodrich persuaded me to add some clip art of Spongebob. So now they're dry and technical with a thin veneer of cartoon humor on top.

ETA: Jeff finds some prior art
 
 
0xDE
20 March 2006 @ 06:37 pm
Some progress on the problem I mentioned earlier, of density near the origin in central limits. I can now handle another fairly broad case, that of centrally symmetric distributions. In fact, for these distributions, a stronger statement about density near the origin can be made, applying to all bounded-radius balls rather than merely to sufficiently large ones:

Read more... )
 
 
0xDE
06 March 2006 @ 05:29 pm

More mathematics in which I'm undereducated: local central limit theorems. That is, if we add together a bunch of independent identically distributed random variables, the central limit theorem tells us the distribution of the sum will look Gaussian on a large scale. A local central limit theorem will tell us that the distribution will look Gaussian on a small scale, in small neighborhoods.

Read more... )
 
 
0xDE
02 February 2006 @ 12:52 pm
A great example from college sports of misleading statistics.

Apparently, especially athletic college freshmen are scored on a 10-point scale by their ability (where these numbers come from and whether they mean anything: unknown to me) and then colleges are ranked by comparing the vector of scores of their incoming frosh. But the comparison is done by the average of the nonzero scores, so a vector like (6,0,0,0,...) (average: 6) ranks better than a vector that completely dominates it like (9,9,8,6,4,2,2,1,1,1,1,1,1,1,1,1,1,1) (average: 2.83). And this actually happens and skews the rankings...