urn:lj:livejournal.com:atom1:110111100xDE0xDE0xDE2015-05-20T07:39:03Zurn:lj:livejournal.com:atom1:11011110:310028Graham on Erdős on Egyptian fractions2015-05-20T07:37:43Z2015-05-20T07:39:03ZIn a recent paper Ron Graham <a href="http://www.math.ucsd.edu/~ronspubs/13_03_Egyptian.pdf">surveys the work of Paul Erdős on Egyptian fractions</a>. Did you know that Erdős' second paper was on the subject? I didn't. It proved that the sum of a harmonic progression can never form an Egyptian fraction representation of an integer (there is always at least one prime that appears in only one term). Graham himself is also a fan, having studied Egyptian fractions in his Ph.D. thesis.<br /><br />Another of Erdős' papers surveyed by Graham is also somewhat related to the subject of my recent blog posts on sequences of highly composite numbers. This paper (famous for formulating the Erdős–Straus 4/n = 1/x + 1/y + 1/z conjecture) included another conjecture that every rational number x/y (between 0 and 1) has an Egyptian fraction representation with O(log log y) terms. However, the best bound known so far is larger, O(sqrt log y).<br /><br />For any number z, let D(z) be the smallest number with the property that every positive integer less than z can be expressed as a sum of at most D(z) divisors of z (not necessarily distinct). Then a stronger version of Erdős' conjecture (for which the same bounds are known) is that, for every y, there exists a number z larger than y (but not too much larger) with D(z) = O(log log z). With such a z, you can split x/y into floor(xz/y)/z + remainder/yz and then use the sum-of-divisors property of z to split each of these two terms into a small number of unit fractions.<br /><br />Computing D(z) for small values of z is not particularly hard, using a dynamic programming algorithm for the subset sum problem. So, based on the guess that the highly composite numbers would have small values of D(z), I tried looking for the biggest highly composite number with each value. In this way I found that D(24) = 3; D(180) = 4; D(5040) = 5; and D(1081080) = 6. That is, every positive integer less than 1081080 can be represented as a sum of at most six divisors of 1081080, and some require exactly six. Based on this, every x/y with y at most 1081080 can be represented as at most a 12-term Egyptian fraction.<br /><br />Each number in the sequence 2, 6, 24, 180, 5040, 1081080, ... is within a small factor of the 1.6 power of the previous number; another way of saying the same thing is that the numbers in this sequence obey an approximate multiplicative Fibonacci recurrence in which each number is approximately the product of the previous two. The next number in the sequence might still be within reach of calculation, using a faster programming language than my Python implementation. If that 1.6-power pattern could be shown to continue forever, then Erdős' log-log conjecture would be true.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:309894Mid-May linkage2015-05-16T05:36:58Z2015-05-16T05:37:38Z<ul><li><a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2560572">An economic analysis of public domain photos on Wikipedia</a> shows that "massive social harm was done by the most recent copyright term extension that has prevented millions of works from falling into the public domain since 1998" (<a href="https://plus.google.com/100003628603413742554/posts/7DaUpswgdY2">G+</a>)</li><br /><li><a href="http://blogs.ams.org/visualinsight/2015/05/01/twin-dodecahedra/">An infinite tree of regular dodecahedra</a> sharing a cube of vertices between each neighboring pair (<a href="https://plus.google.com/100003628603413742554/posts/UpKo5xNmZn9">G+</a>)</li><br /><li><a href="https://igorpak.wordpress.com/2015/05/02/you-should-watch-combinatorics-videos/">Combinatorics videos</a> collected by Igor Pak (<a href="https://plus.google.com/100003628603413742554/posts/KuGUzGoTqZw">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=a3QqKBWHarA">The inspiration for some of Man Ray's art in a collection of mathematical models</a> (<a href="https://plus.google.com/100003628603413742554/posts/UhfwavcvYdo">G+</a>)</li><br /><li><a href="https://plus.google.com/+FrancoisDorais/posts/E3Yh9YwQTQN">Did you know you could get bibtex directly from a doi?</a> (<a href="https://plus.google.com/100003628603413742554/posts/Qbx6xEERaup">G+</a>)</li><br /><li><a href="http://blogs.plos.org/everyone/2015/05/01/plos-one-update-peer-review-investigation/">Journal editor canned for using sexist referee report</a> (<a href="https://plus.google.com/100003628603413742554/posts/jmxuXn5GZ1W">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=ploETyBDM7I">Trilingual powers of two</a> in a video on street-vendor cookie-making (<a href="https://plus.google.com/100003628603413742554/posts/5HQMVbkkJ9S">G+</a>)</li><br /><li><a href="http://www.metafilter.com/149171/The-International-Journal-of-Proof-of-Concept-or-Get-The-Fuck-Out">Winner, best name of an actual publication</a> (hacker zine PoC||GTFO; <a href="https://plus.google.com/100003628603413742554/posts/SgfjCrpmKPP">G+</a>)</li><br /><li><a href="https://www.chromeexperiments.com/experiment/100000-stars">3d visualization of nearby stars</a> (<a href="https://plus.google.com/100003628603413742554/posts/XQsfYnUMWsE">G+</a>)</li><br /><li><a href="https://doajournals.wordpress.com/2015/05/11/historical-apc-data-from-before-the-april-upgrade">Over 2/3 of listed open access journals charge no author fees</a> (<a href="https://plus.google.com/100003628603413742554/posts/geLxaXBzBge">G+</a>)</li><br /><li><a href="http://www.theguardian.com/technology/2015/may/14/dear-google-open-letter-from-80-academics-on-right-to-be-forgotten">Open letter to Google by 80 academics</a> asking for greater transparency on "right to be forgotten" (<a href="https://plus.google.com/100003628603413742554/posts/WGj2wwQtU1J">G+</a>)</li></ul>urn:lj:livejournal.com:atom1:11011110:309622Parametric knapsacks for number-theoretic sequences2015-05-15T20:32:41Z2015-05-16T00:34:59ZOne of the key principles of <a href="http://11011110.livejournal.com/307881.html">parametric optimization</a> is that, when you are faced with optimizing the nonlinear combination of two linear values (sums of element weights, costs, etc) you should instead look at the set of optima for all possible linear combinations of the same two values. Let's see how this applies to the <a href="http://11011110.livejournal.com/305481.html">number-theoretic knapsack problems</a> I posted about <a href="http://11011110.livejournal.com/309343.html">earlier this week</a>.<br /><br />In the knapsack problem, we are trying to optimize the total profit of a subset of the given elements, subject to the condition that their total size is at most a given threshold. This can be expressed as a nonlinear combination of these two linear values in which the function of profit and size is the identity function on profit when the size is small enough and zero otherwise. This isn't the nice sort of quasiconvex function that parametric methods are best-suited for, but the fractional knapsack problem instead involves a greedy algorithm for maximizing the profit/size ratio, and this sort of ratio is quasiconvex. So in any case, following the parametric approach, let's replace both of these nonlinear combinations by the linear combination profit − λ·size, let the parameter λ vary, and see what solutions we get.<br /><br />For any particular value of λ, the answer is very simple: the optimal solutions are the ones that take all elements for which profit/size > λ (the ones that make a positive contribution to the solution value), and any subset of the elements for which profit/size = λ (the ones whose contribution is zero). The smallest-size optimal solution is the one that takes only the elements for which profit/size > λ. So the set of all smallest-size optimal solutions is almost exactly the same as the set of solutions generated by the greedy algorithm that adds one element at a time in order by the profit/size ratio. To make this algorithm generate exactly the smallest-size optimal solutions, we need to modify it so that when there are ties in profit/size ratio it adds all tied elements at once rather than adding them one at a time. When the set of profit/size values is discrete (as it is in our problems) this set of solutions also has the property that each solution is the unique optimal solution for a nonempty range of parameter values.<br /><br />Now suppose we go back to the number-theoretic sequences that I started with (the highly abundant numbers and the highly composite numbers), expand out the definition of the profit and size functions in the parametric optimization functions profit − λ·size, and eliminate the logs in these functions by exponentiating. Then the sequence of smallest-size optimal solutions for these objective functions are exactly how the colossally abundant numbers and superior highly composite numbers are defined. That is, it is no coincidence that starting with the knapsack-problem formulations of the highly abundant and highly composite numbers, and then applying the greedy algorithm to the resulting knapsack problems, gave these other two sequences: it falls out directly from the parametric analysis above and the definitions of these sequences.<br /><br />However, OEIS states that <a href="http://oeis.org/A073751">the correctness of the generation algorithm</a> for the successive factors of the colossally abundant numbers is still conjectural rather than proven. How can this be, when we have seen above that the greedy algorithm always works for sequences like this? The part that must still be unknown concerns the possibility of ties: is it ever possible for two or more knapsack elements to have the same profit/cost ratio? If so we must take both or all of them at once rather than letting them be chosen one at a time. And this is problematic from the algorithmic point of view because it involves testing complicated expressions involving logarithms for exact equality.<br /><br />Specifically, in the highly abundant number version of the problem, we need to know whether there can exist two prime powers <i>p<sup>i</sup></i> with the same value of the expression log<sub><i>p</i></sub>(<i>p</i><sup><i>i</i> + 1</sup> − 1)/(<i>p</i><sup><i>i</i></sup> − 1). In the highly composite number version of the problem, we need to know whether there can exist two prime powers with the same value of the expression log<sub><i>p</i></sub>(<i>i</i> + 1)/<i>i</i>. In both cases, it seems unlikely, but obviously that's not a proof. More generally, Alaoglu and Erdős conjectured in 1944 (in connection with this problem) that two expressions log<sub><i>p</i></sub><i>q</i> with different prime bases and rational arguments can only be equal if they're both integers, but (although it is known that there can be no three-way ties) this remains unproven.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:309343Fractional knapsacks and colossal abundance2015-05-14T05:04:42Z2015-05-14T17:59:01ZIn <a href="http://11011110.livejournal.com/305481.html">a recent post</a> I observed that the largest <a href="https://en.wikipedia.org/wiki/Highly_abundant_number#References">highly abundant number</a> below some threshold <i>n</i> could be found as the optimal solution of a certain knapsack problem in which the items to be packed into the knapsack are prime powers <i>p<sup>i</sup></i> with profit log (<i>p</i><sup><i>i</i> + 1</sup> − 1)/(<i>p</i><sup><i>i</i></sup> − 1) and size log <i>p</i> (both independent of <i>n</i>), and with knapsack capacity log <i>n</i>. In particular every highly abundant number has a factorization that can be generated as the solution to this knapsack problem with the number itself as the threshold.<br /><br />Unfortunately, the knapsack problem is NP-complete, making its solutions vary in complicated ways, and making it tricky to extract useful information about highly abundant numbers and their factorizations from this formulation. But fortunately, there's a class of knapsack problems that are really easy to solve: the ones where the optimal fractional solution is the same as the optimal integer solution. These are the solutions that you get by a greedy algorithm that at each step chooses the item with maximum profit/size. This greedy strategy is not optimal for all capacities, but it is optimal when the capacity happens to equal the solution size. So which highly abundant numbers have this greedy property?<br /><br />To test this, I wrote <a href="http://www.ics.uci.edu/~eppstein/0xDE/frachab.py">a simple piece of Python code</a> that starts with the number 1, repeatedly chooses a not-already-chosen prime power that maximizes the profit/size ratio defined above, and multiplies the current number by the base of the chosen prime power. I computed the profits and sizes sloppily using floating point numbers and the built-in log function, but that seems to be good enough for small prime powers. Here are the first few results:<pre>
1
2
6
12
60
120
360
2520
5040
55440
720720
1441440
4324320
21621600
367567200
6983776800
160626866400
321253732800
9316358251200
288807105787200
2021649740510400
6064949221531200
224403121196654400
9200527969062830400
395622702669701707200
791245405339403414400
37188534050951960476800
1970992304700453905270400
116288545977326780410953600
581442729886633902054768000
35468006523084668025340848000
2376356437046672757697836816000
168721307030313765796546413936000
12316655413212904903147888217328000
135483209545341953934626770390608000
10703173554082014360835514860858032000
21406347108164028721671029721716064000
1776726809977614383898695466902433312000
5330180429932843151696086400707299936000
474386058264023040500951689662949694304000
46015447651610234928592313897306120347488000
598200819470933054071700080664979564517344000
60418282766564238461241708147162936016251744000
6223083124956116561507895939157782409673929632000
665869894370304472081344865489882717835110470624000
72579818486363187456866590338397216244027041298016000
8201519488959040182625924708238885435575055666675808000</pre>This calculation was essentially instantaneous; I cut it off here because it was a conveniently-sized screenfull of numbers rather than out of any difficulty in continuing the sequence for many more terms.<br /><br />When I tried looking this up in OEIS, I had two surprises. First (except for the leading one) this exactly matches all the known terms of the sequence of <a href="http://oeis.org/A004490">colossally abundant numbers</a>, which have quite a different definition from the highly abundant numbers. <s>Why? I don't know. Do this sequence and the sequence of colossally abundant numbers stay equal forever? I also don't know. And second, this calculation goes much farther than the known entries for the colossally abundant numbers in OEIS (about half of the terms shown above). The computation was so quick that I would tag the sequence "easy" if I were adding it as a new one to OEIS, but the colossally abundant numbers aren't tagged easy and have no listed algorithm for generating their sequence. Does this give a new easy way to calculate the colossally abundant numbers?</s> Update: <a href="http://oeis.org/A073751">this sequence of factors</a> looks like it is calculated the same way, so this method does seem to be known, but still somewhat conjectural. It's not clear whether it was obtained using the greedy knapsack idea or through some other reasoning.<br /><br />The same knapsack formulation applies to other sequences of numbers maximizing multiplicative functions, and the same fractional-knapsack greedy trick can be used to find easy-to-compute subsequences of those other sequences. For instance, the <a href="https://en.wikipedia.org/wiki/Highly_composite_number">highly composite numbers</a> have knapsack problems with profit log (<i>i</i> + 1)/<i>i</i>, and the greedy knapsack method applied to this profit function gives what looks like the sequence of <a href="http://oeis.org/A002201">superior highly composite numbers</a>. Are others as interesting? I also don't know.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:309102Congratulations, Dr. Bannister!2015-05-12T01:08:11Z2015-05-12T01:08:11ZMy student <a href="http://www.ics.uci.edu/~mbannist/">Michael Bannister</a> passed his thesis defense this afternoon. Michael has published nearly a dozen papers on topics involving graph algorithms and computational geometry (see his home page for a complete listing). His thesis research involved lower bounds and fixed-parameter upper bounds for graph drawing: inapproximability of layout compaction, the use of Galois theory to prove the nonexistence of exact algorithms for optimizing the vertex placement in many styles of graph drawing, and parameterized algorithms for one-page and two-page crossing minimization.<br /><br />Michael has also been one of our most popular teaching assistants and has enthusiastically encouraged undergraduates to take part in research projects, leading to a poster at last year's Graph Drawing symposium and an ongoing project that we hope to turn into another publication. Next year he'll be putting those skills to good use as a visiting assistant professor at <a href="https://en.wikipedia.org/wiki/Pomona_College">Pomona College</a>, a highly selective private school also located in Southern California, while his wife (another theoretician, Jenny Lam) finishes her own doctorate.<br /><br />Congratulations, Michael, and congratulations Pomona! Our loss is your gain.urn:lj:livejournal.com:atom1:11011110:308857Tallying preference ballots efficiently2015-05-08T06:15:11Z2015-05-08T06:20:39ZThe <a href="https://en.wikipedia.org/wiki/Schulze_method">Schulze method</a> for determining the results of multiway votes has three parts:<br /><br />1. Use the ballots to determine the results (winner and margin of victory) of each possible head-to-head contest.<br />2. Perform an all-pairs <a href="https://en.wikipedia.org/wiki/Widest_path_problem">widest path</a> computation on a directed complete graph weighted by the margins of victory.<br />3. Find the candidate with wider outgoing than incoming paths to all other candidates.<br /><br />The second part can be done in cubic time using the <a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm">Floyd-Warshall algorithm</a> (the choice in practice) or faster using fast matrix multiplication. And the third part is easy. But what about the first part? Here, some Wikipedia editor <a href="https://en.wikipedia.org/w/index.php?title=Schulze_method&diff=next&oldid=428562163">wrote in 2011</a> that the first part, "if implemented in the most straightforward way, takes time proportional to C<sup>2</sup> times the number of voters" (where C is the number of candidates). But then last year some other editor <a href="https://en.wikipedia.org/w/index.php?title=Schulze_method&type=revision&diff=629989153&oldid=628267803">tagged this claim</a> as being original research.<br /><br />This raised some questions for me. Given how straightforward it is, can this really be considered to be original research? Is it possible to find a published source for the time analysis of this step that can be used to untag it? (If you know of one, please tell me or add it to the article.) Is the algorithm with this time bound really "the most straightforward way"? And if this is the time bound you get by doing things straightforwardly, can we get a better time bound by trying to be more clever?<br /><br />To begin with, I think the most straightforward way of solving this is the following. I'll assume that each ballot is stored as a sorted array of the candidates, most-preferred first. For each pair of candidates, loop over all ballots, and search the ballot array sequentially to find the first position that has one of the two candidates in the pair; tally that ballot as a win for the candidate that was found. When you've looped through all the ballots, compare the tallies for the two candidates to determine the winner, and subtract the tallies to determine the margin of victory. But this takes time O(C<sup>3</sup>n), not O(C<sup>2</sup>n).<br /><br />The O(C<sup>2</sup>n)-time method that was intended is presumably something like the following one. We will make a matrix M[i,j] that will eventually store the number of voters who prefer candidate i to candidate j, initially all zeros. We then loop through the ballots one at a time. For each ballot B, each i in the range from 1 to C, and each j in the range from i+1 to C, we add one to the count for M[B[i],B[j]]. Finally, after computing this matrix, we can compare M[i,j] to M[j,i] as before to determine each pairwise winner, or subtract these two numbers to determine the margin of victory.<br /><br />But when the number of voters is big (larger than C!) there's a different way to tally the votes that's more efficient. First, sort the ballots, so that all people who voted the same way are collected into the same group. (This can be done by treating each vote as a number in the <a href="https://en.wikipedia.org/wiki/Factorial_number_system">factorial number system</a> and applying <a href="https://en.wikipedia.org/wiki/Counting_sort">counting sort</a> to these numbers). Then, apply the O(C<sup>2</sup>n)-time method to the grouped ballots, looping over all groups rather than all individual ballots and changing the part that adds one to M[B[i],B[j]] so that instead it adds the size of a group of ballots. The running time is O(Cn) to number and sort the ballots, plus O(C<sup>2</sup>C!) to tally them. So we've reduced the dependence on n down to linear in C, at the expense of adding another term that is a much larger function of C. For systems like the Oscars or Hugos that have only five candidates and thousands of voters, this could be a win.<br /><br />It's not possible to achieve a time of just O(Cn), without the extra term, because even when n is tiny the output size is C<sup>2</sup>. But it is possible to trade off between the grouped and ungrouped tallying methods, when n is intermediate in size. To do so, group the candidates (arbitrarily) into blocks of B candidates (preferably a power of two; we'll pick the right size for B later). We can partition a voter's preferences into blocks in time O(Cn) by using bucket sort to partition the candidates into blocks in their preference order, and we can determine the voter's preferences between the candidates in the union of two blocks in time O(B) by applying a merge algorithm, comparing candidates using a reverse index of the positions of each candidate in the voter's preference list. There are O((C/B)^2) pairs of blocks, so combining the times for splitting votes into blocks and for applying the factorial method to each pair of blocks gives a total runtime of O(Cn + C<sup>2</sup>n/B + C<sup>2</sup>(2B)!). The right choice for B is the one which makes the second and last terms of this runtime approximately equal (B proportional to log n/loglog n) and this logarithmic factor is the amount by which the middle term of the time bound is faster than the "straightforward" O(C<sup>2</sup>n)-time method.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:308604Linkage2015-05-01T05:01:52Z2015-05-01T05:01:52ZSome good discussions this time over on G+, especially for the vote-off-the-island post but also on the golden spiral, P=NP counterexample, and election-system posts.<br /><ul><li><a href="http://www.thisiscolossal.com/2015/04/raw-rendered-experimental-3d-artworks-by-joey-camacho/">3d rendered art by Joey Camacho</a> (<a href="https://plus.google.com/100003628603413742554/posts/VrV9SCVLukg">G+</a>)</li><br /><li><a href="http://www.theguardian.com/science/alexs-adventures-in-numberland/2015/mar/14/pi-day-2015-pi-rivers-truth-grime">Rivers don't actually approximate semicircles</a> (<a href="https://plus.google.com/100003628603413742554/posts/c5UuWWziNxM">G+</a>)</li><br /><li><a href="http://www.usatoday.com/story/tech/2015/04/19/chris-roberts-one-world-labs-united-rsa-computer-security-tweets/26036397/">Intimidating researchers from discussing known vulnerabilities in fly-by-wire systems</a> (<a href="https://plus.google.com/100003628603413742554/posts/iTDu67mZXtg">G+</a>)</li><br /><li><a href="http://www.kurims.kyoto-u.ac.jp/icalp2015/accepted-ICALP-A.html">ICALP accepted papers</a> (<a href="https://plus.google.com/100003628603413742554/posts/dc3a2EMZ9NJ">G+</a>)</li><br /><li><a href="http://makezine.com/2015/04/20/understand-1700-mechanical-linkages-helpful-animations/">Animations of mechanical linkages</a> (<a href="https://plus.google.com/100003628603413742554/posts/KMVL1uVadxp">G+</a>)</li><br /><li><a href="https://plus.google.com/+DavidRoberts/posts/5scZ4Hvzh5d">If impact factors are so obviously irrelevant, why do we still use them?</a> (<a href="https://plus.google.com/100003628603413742554/posts/bCAYQMsUL4x">G+</a>)</li><br /><li><a href="http://hechingerreport.org/californias-multi-million-dollar-online-education-flop-is-another-blow-for-moocs/">California MOOC boondoggle flops</a> (<a href="https://plus.google.com/100003628603413742554/posts/CLSjy3GgSVD">G+</a>)</li><br /><li><a href="http://chronicle.com/article/Iowa-Legislator-Wants-to-Give/229589/">Enabling students to vote disliked instructors off the island</a> (<a href="https://plus.google.com/100003628603413742554/posts/2ycUKjQuGBi">G+</a>)</li><br /><li><a href="http://shorts2014.quantumlah.org/">Festival of short films on quantum mechanics</a> (<a href="https://plus.google.com/u/0/100003628603413742554/posts/3hbTaLaE7Um">G+</a>)</li><br /><li><a href="https://xkcd.com/spiral/">What do Don Sheehy, a sewing machine, and the golden spiral have to do with each other?</a> (<a href="https://plus.google.com/100003628603413742554/posts/9YqCMRdFW2Y">G+</a>)</li><br /><li><a href="http://arxiv.org/abs/1504.06890">Undergraduates publish counterexamples to P=NP proofs</a> as a result of a research seminar at Rochester conducted by Lane Hemaspaandra (<a href="https://plus.google.com/100003628603413742554/posts/gX9ETXHXGod">G+</a>)</li><br /><li><a href="http://www.cut-the-knot.org/Curriculum/SocialScience/%28171%292015.pdf">Deciding elections by who has the best median-voter score</a> (<a href="https://plus.google.com/100003628603413742554/posts/A62YhjhCfEt">G+</a>)</li></ul>urn:lj:livejournal.com:atom1:11011110:308431Perturbing weighted elements to make set weights distinct2015-04-21T01:11:54Z2015-04-21T01:11:54ZSuppose you have a polynomial-time algorithm that operates on sets of weighted elements, and involves comparisons of the weights of different sets. (This describes many different algorithms for shortest paths, minimum spanning trees, minimum weight matchings, <a href="http://11011110.livejournal.com/307881.html">closures</a>, etc.) But suppose also that your algorithm is only guaranteed to work correctly when different sets always have distinct total weights. When comparisons could come out equal, your algorithm could crash or produce incorrect results. But equal weights are likely to happen when the element weights are small integers, for instance. Is there some semi-automatic way of patching your algorithm to work in this case, without knowing any details about how it works?<br /><br />An obvious thing to try is to add small distinct powers of two to the element weights. If these numbers are small enough they won't affect initially-unequal comparisons. And if they're distinct powers of two then their sums are also distinct, so each two sets get a different perturbation. But this method involves computing with numbers that have an additional <i>n</i> bits of precision (where <i>n</i> is the number of elements in the problem), and a realistic analysis of this method would give it a near-linear slowdown compared to the unperturbed algorithm. Can we do better?<br /><br />Exactly this issue comes up in my latest preprint, "<a href="http://arxiv.org/abs/1504.04931">Rooted Cycle Bases</a>" (with McCarthy and Parrish, arXiv:1504.04931, to appear at WADS). The paper is motivated by some problems concerning <a href="http://11011110.livejournal.com/279049.html">kinematic chains</a>, and studies problems of finding a cycle basis of a given graph in which all basis cycles are constrained to contain a specific edge. When all cycles have distinct weights a simple greedy algorithm can be used to find a minimum-weight basis, but if there are ties then this algorithm can easily go astray. Its analysis is complicated enough that, rather than trying to add special case tie-breaking rules to the algorithm and proving that they still work correctly, I'd like a general-purpose method for converting algorithms that work for distinct path and cycle weights into algorithms that don't require distinctness.<br /><br />If randomization is allowed, it's not difficult to perturb the weights efficiently, so that additions and comparisons of weights still take constant time. Just let ε be a sufficiently small number (or by symbolic computation treat it as an infinitesimal) and perturb each element weight by a randomly chosen integer multiple of ε where the random integers of this scheme have polynomial magnitude. These integers are small enough that (on a machine capable of addressing its own memory) they fit into a machine word, so adding them and comparing their sums takes constant time per operation. And by choosing the polynomial to be large enough, we can ensure that with high probability each two sets that we compare will have different perturbations. (We don't care about the many other pairs of sets that we don't compare.)<br /><br /><div align="center"><img src="http://www.ics.uci.edu/~eppstein/0xDE/set-comparison.png"></div><br /><br />The deterministic case is trickier. To solve it (in an appendix of the preprint) I define a data structure that can build up a persistent collection of sets, by adding one element at a time to a previously-constructed set, and then can answer queries that seek the smallest index of an element that belongs to one set and not another. Essentially, it involves a binary tree structure imposed on the elements, and a recursive representation of each set that follows the tree structure but shares substructures with other sets, so that differing elements can be found by tracing down through the tree looking for non-shared substructures. The figure above (from the paper) illustrates in a schematic way what it looks like; see the appendix for details. This allows the power-of-two technique to work, by replacing numerical comparisons on high-precision numbers by these set queries. It would also be possible to add element-removal operations, although I didn't need these for the cycle basis problem. But it's a bit cumbersome and slow: comparing two sets with this method takes logarithmic time, and adding an element to a set is slightly slower than that. And the details involve deterministic integer dictionary data structures that are theoretically efficient but for practical problem sizes probably worse than binary search trees. So I think there's definitely scope for coming up with a cleaner and faster solution.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:307976The red door2015-04-17T07:16:44Z2015-04-17T07:16:44ZI couldn't resist photographing this door to a lecture hall in the science sector of the UCI campus. I'm not sure what the pink paint brushmarks are: vandalism? Rustoleum? But they make a nice pattern.<br /><br /><div align="center"><img src="http://www.ics.uci.edu/~eppstein/pix/reddoor/2-m.jpg" border="2" style="border-color:black;" /></div><br /><br /><b>( <a href="http://www.ics.uci.edu/~eppstein/pix/reddoor/1.html">Another shot of the same door</a> )</b>urn:lj:livejournal.com:atom1:11011110:307881Parametric closures2015-04-17T01:08:46Z2015-04-17T01:08:46ZMy latest arXiv preprint, <a href="http://arxiv.org/abs/1504.04073">The Parametric Closure Problem</a> (arXiv:1504.04073, to appear at WADS) concerns an old optimization problem that can be used, among other applications, in the planning process for open-pit mining.<br /><br />Suppose you have the mining rights to a three-dimensional patch of earth and rock, in which the ore is of a type and depth that make it appropriate to remove the ore by digging down to it from above rather than by tunneling. You can make a three-dimensional model of your mining area, in which different three-dimensional blocks of material might represent ore of different values or worthless overburden (the stuff on top of the ore that you have to remove to get to the ore). Each block has its own value: the profit that can be extracted from its ore minus the cost of digging it out and processing it. Additionally, each block has some blocks above it (maybe staggered in a three-dimensional brick-wall pattern) that have to be removed first before you can get to it. Some blocks are worth digging for; others are buried so deeply under other worthless material that it would cost more to dig them out than you would get in profit from them. How should you go about deciding which blocks to excavate and which to leave in place?<br /><br />This can be modeled mathematically by the <a href="https://en.wikipedia.org/wiki/Closure_problem">closure problem</a>, in which you have as input a partially ordered set (the blocks of the mine, ordered by which ones have to be excavated first before you can get to which other ones) with weights on each element (the net profit of excavating each block). The goal is to find a downward-closed subset of the partial order (a set of blocks such that, whenever a block is in the set, so is all of its overburden) with maximum total weight. Alternatively, instead of a partial order, you can think about a directed acyclic graph, in which you have to find a set of vertices with no outgoing edges; the problem is essentially the same. It has long been known that this can be solved in polynomial time using a transformation to the minimum cut problem.<br /><br />Ok, but that assumes that the price of the material you're extracting (gold, say) is fixed. What happens as the price of gold varies? If gold is more expensive, it will be worthwhile to dig deeper for it; if it is cheap enough, you might even prefer to shut down the whole mine. How many different mining plans do you need for different prices of gold, and how can you compute them all? This is an example of a parametric optimization problem, one in which the weight of each element depends continuously on a parameter rather than being a fixed number.<br /><br />Alternatively, what if you want to optimize a quantity that isn't just a sum of element weights? Suppose, for instance, that it takes a certain up-front cost to extract a block of ore, but that you only get the value of the gold in the ore later. How can you choose a mining plan that maximizes your return-on-investment, the ratio between the profit you expect and the cost you have to pay now? This can also be modeled as a parametric problem, where the weight of a block has the form C × profit − cost for an unknown parameter C. If you can find all the different mining plans that would be obtained by different choices of C, you can then search through them to choose the plan with the optimal return-on-investment, and this turns out to be optimal.<br /><br />My paper defines the parametric (and bicriterion) closure problems, but I was only able to find polynomial-time solutions (and polynomial bounds on the number of different solutions to be found) for some special cases of partial orders, including series-parallel partial orders, semiorders, and orders of bounded width. However, the partial orders arising in the mining problem are unlikely to be any of these, so a lot more remains to be done. In particular I'd like to know whether there can exist a partial order whose parametric closure problem has exponentially many solutions, or whether they all have only a polynomial number of solutions. (Anything in between would also be interesting.)<br /><br />Incidentally, it's tempting to try to generalize closures of partial orders to feasible sets of antimatroids, and ask for an algorithm that can find the maximum weight feasible set. Unfortunately, this antimatroid closure problem is NP-complete. Consider, for instance, an antimatroid defined from a family of sets <i>S<sub>i</sub></i> in which there is one antimatroid element <i>x<sub>i</sub></i> corresponding to each set <i>S<sub>i</sub></i>, another antimatroid element <i>y<sub>j</sub></i> corresponding to each element of a set, and the feasible sets consist of any subset of the <i>x<sub>i</sub></i>'s together with any of the <i>y<sub>j</sub></i>'s that are covered by sets among the chosen <i>x<sub>i</sub></i>'s. If we give the <i>x<sub>i</sub></i>'s small equal negative weights and the <i>y<sub>j</sub></i>'s big equal positive weights, then the optimal feasible set is given by the optimal solution to a set cover problem. Although this complexity result doesn't prove anything about the number of solutions to the corresponding parametric problem, it makes me think that the parametric antimatroid problem is likely to be exponential.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:307498Linkage for tax day2015-04-16T05:41:47Z2015-04-16T05:41:47Z<ul><li><p><a href="https://www.youtube.com/watch?v=RYH_KXhF1SY">Fractal flat torus flyover video</a> (<a href="https://plus.google.com/100003628603413742554/posts/ESNzvcmNWPL">G+</a>)</p></li>
<li><p><a href="http://retractionwatch.com/2015/04/01/you-cant-make-this-stuff-up-plagiarism-guideline-paper-retracted-for-plagiarism/">The author of the plagiarized article on plagiarism turns out to have himself been a past victim of plagiarism</a> (<a href="https://plus.google.com/100003628603413742554/posts/fuW4FPEHzuC">G+</a>)</p></li>
<li><p><a href="http://arxiv.org/abs/1501.03837">2048 NP-hardness</a> (<a href="https://plus.google.com/100003628603413742554/posts/UXrCSYbdW4c">G+</a>)</p></li>
<li><p><a href="http://mkukla.com/Stone/stone_07_1.html">Michael Kukla's organically-shaped stone sculpture</a> (<a href="https://plus.google.com/100003628603413742554/posts/Yudx1zunsN6">G+</a>)</p></li>
<li><p><a href="http://greenupgrader.com/15763/water-saving-tip-the-shower-bucket/">Save water: shower with a bucket</a> (<a href="https://plus.google.com/100003628603413742554/posts/53q1YH9JpBx">G+</a>)</p></li>
<li><p><a href="http://www.nytimes.com/2015/04/03/opinion/south-koreas-invasion-of-privacy.html">The reach of the all-seeing eye extends to the land of morning calm</a> (<a href="https://plus.google.com/100003628603413742554/posts/gFrJMvfEECz">G+</a>)</p></li>
<li><p><a href="http://research.cs.queensu.ca/cccg2015/">CCCG call for papers</a> (<a href="https://plus.google.com/100003628603413742554/posts/CP2rxPCRTLB">G+</a>)</p></li>
<li><p><a href="http://www.thisiscolossal.com/2015/04/layered-glass-sculptures-niyoko-ikuta/">Layered cut-glass sculpture by Niyoko Ikuta</a> (<a href="https://plus.google.com/100003628603413742554/posts/LexAfZzuV1L">G+</a>)</p></li>
<li><p><a href="http://theconversation.com/using-wikipedia-a-scholar-redraws-academic-lines-by-including-it-in-his-syllabus-39103">Editing Wikipedia as course assignment</a> (<a href="https://plus.google.com/100003628603413742554/posts/8pBR5Vx3pzv">G+</a>)</p></li>
<li><p><a href="https://dl.dropboxusercontent.com/u/73307148/www.wads.org/Home/accepted.html">WADS accepted papers</a> (<a href="https://plus.google.com/100003628603413742554/posts/Wxn6MF5orK7">G+</a>)</p></li>
<li><p><a href="http://www.dataisnature.com/?p=2138">The fractal architecture and algorithmic design of Hindu temples</a> (<a href="https://plus.google.com/100003628603413742554/posts/9FWZLhi5P2c">G+</a>)</p></li>
<li><p><a href="http://m759.net/wordpress/?p=49049">4d hypercube in a 4x4 planar grid</a> (<a href="https://plus.google.com/100003628603413742554/posts/8GTbEjYwvFK">G+</a>)</p></li>
<li><p><a href="http://boingboing.net/2015/04/13/village-has-a-model-village-wh.html">Self-containing model village Droste effect</a> (<a href="https://plus.google.com/100003628603413742554/posts/CZiE7z5P6ch">G+</a>)</p></li>
<li><p><a href="http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext">CACM on formal methods at Amazon</a> (<a href="https://plus.google.com/100003628603413742554/posts/6CrVgj1zCsn">G+</a>)</p></li>
<li><p><a href="https://plus.google.com/117663015413546257905/posts/E4cfuyhawYh">Infinite reflection within a mirrored sphere</a> (<a href="https://plus.google.com/100003628603413742554/posts/YskmJuQUcik">G+</a>)</p></li></ul>urn:lj:livejournal.com:atom1:11011110:307408Linkage2015-04-01T03:56:27Z2015-04-01T03:56:27Z<ul><br /><li><a href="http://www.bbc.com/news/technology-31302312">Non-uniformly-random playlists sound more random than random ones</a> (<a href="https://plus.google.com/100003628603413742554/posts/RsmyXs8GpMZ">G+</a>)</li><br /><li><a href="http://www.metafilter.com/147970/If-you-can-read-this-sentence-you-can-talk-with-a-scientist">Is it a good thing that science is monolingual?</a> Is it even true? (<a href="https://plus.google.com/100003628603413742554/posts/aV9jJwG3ddr">G+</a>)</li><br /><li><a href="http://blog.wikimedia.org/2015/03/17/raspberry-pi-tanzania-school/">Bringing Wikipedia to a school without electricity</a> (<a href="https://plus.google.com/100003628603413742554/posts/eg7upcxdKRC">G+</a>)</li><br /><li><a href="https://plus.google.com/115585433364871264133/posts/7pYEqXYu36G">Erik Demaine presents the MAA Centennial Lecture</a> (<a href="https://plus.google.com/100003628603413742554/posts/Q6PNMKv51db">G+</a>)</li><br /><li><a href="https://quomodocumque.wordpress.com/2015/03/18/math-bracket-2015/">What if we held elimination tournaments based on the strength of math departments?</a> (<a href="https://plus.google.com/100003628603413742554/posts/dyjP8wBTbX8">G+</a>)</li><br /><li><a href="http://bldgblog.blogspot.co.uk/2014/06/mathematical-equations-as-architectonic.html">Mathematical equations as architectonic forms</a> (<a href="https://plus.google.com/100003628603413742554/posts/gujwuuur157">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=AxJubaijQbI">Persi Diaconis on good and bad ways to shuffle cards</a> (<a href="https://plus.google.com/100003628603413742554/posts/jS8u9tWRJV3">G+</a>)</li><br /><li><a href="http://www.cems.uvm.edu/~darchdea/problems/problems.html">Open problems in topological graph theory</a> from the late Dan Archdeacon (<a href="https://plus.google.com/100003628603413742554/posts/iRsQaEVpaGP">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=OuF-WB7mD6k">How to fix a wobbly table.</a> But only if the problem is the uneven ground, not the table itself. <a href="https://plus.google.com/100003628603413742554/posts/VqteLTwupnP">G+</a>)</li><br /><li><a href="http://www.scientificamerican.com/article/new-form-of-ice-forms-in-graphene-sandwich/">Square ice in graphene sandwiches</a> (<a href="https://plus.google.com/100003628603413742554/posts/aW4jWARGbvx">G+</a>)</li><br /><li><a href="http://news.sciencemag.org/scientific-community/2015/03/editor-quits-journal-over-pay-expedited-peer-review-offer">Nature Publishing Group lets authors pay for faster reviews.</a> One editor quits in disgust. (<a href="https://plus.google.com/100003628603413742554/posts/eNcuZdZGEfW">G+</a>)</li><br /><li><a href="https://hbr.org/2015/03/the-5-biases-pushing-women-out-of-stem">Five biases pushing women out of STEM</a> (<a href="https://plus.google.com/100003628603413742554/posts/E5ATuXyMZvP">G+</a>)</li><br /><li><a href="https://en.wikipedia.org/wiki/Entropy_compression">Entropy compression</a>, proving that randomized algorithms terminate because their past histories have too little information (<a href="https://plus.google.com/100003628603413742554/posts/gKxGtagGxRe">G+</a>)</li><br /><li><a href="https://www.simonsfoundation.org/multimedia/mathematical-impressions-multimedia/mathematical-impressions-the-golden-ratio/">George Hart on why you shouldn't believe many claims about appearances of the golden ratio</a> (<a href="https://plus.google.com/100003628603413742554/posts/JcBqFQGkJyr">G+</a>)</li><br /></ul>urn:lj:livejournal.com:atom1:11011110:307017Clique minors in de Bruijn graphs2015-03-22T05:18:21Z2015-03-22T05:18:21ZIn my new Wikipedia article on <a href="https://en.wikipedia.org/wiki/Queue_number">the queue number of graphs</a>, the binary de Bruijn graphs form an important family of examples. These are 4-regular graphs with one vertex for every <i>n</i>-bit binary string, and with an edge from every string of the form 0s or 1s to s0 or s1. <a href="http://11011110.livejournal.com/75392.html">I posted about them</a> here several years ago, with the following drawing, which can be interpreted as a 2-queue drawing with one queue for the edges that wrap around the left side and another for the edges that wrap around the right.<br /><br /><div align="center"><a href="http://en.wikipedia.org/wiki/Image:DeBruijn-3-2.png"><img src="http://www.ics.uci.edu/~eppstein/0xDE/dbg32b.png" border="0"></a></div><br /><br />Graph minors also showed up in the article, and it occurred to me to wonder: do de Bruijn graphs belong to any minor-closed graph families? The answer should be no, because they're too highly connected, but can we quantify this? One way would be to determine the <a href="https://en.wikipedia.org/wiki/Hadwiger_number">Hadwiger number</a> of the de Bruijn graphs, i.e., the size of their largest clique minors. As long as this is not bounded by a constant, the de Bruijn graphs do not belong to any nontrivial minor-closed family. And in fact, that turns out to be true: the Hadwiger number is somewhere near the square root of the number of vertices.<br /><br />One direction is easy: an <i>n</i>-vertex de Bruijn graph has 2<i>n</i> edges, and a <i>k</i>-vertex clique minor needs at least <i>k</i>(<i>k</i> − 1)/2 edges, so <i>k</i> has to be at most approximately 2√<i>n</i>.<br /><br />In the other direction, it's possible to exhibit an explicit clique minor of size nearly the square root of <i>n</i> in any de Bruijn graph. To do so, I need three ingredients:<br /><br />(1) A representative vertex in the de Bruijn graph for each clique vertex,<br /><br />(2) A path in the de Bruijn graph between any two representative vertices (not necessarily disjoint from the other paths), and<br /><br />(3) A mapping from the vertices within these paths to representative vertices, such that each path can be split into two segments that are mapped to the two endpoints of the path.<br /><br />With these ingredients, the minor itself can be formed by throwing away non-path vertices and contracting path edges between pairs of vertices that are mapped to the same endpoint as each other. (Every clique minor of any graph can be represented in this way.)<br /><br />So here are the representative vertices: for order-<i>k</i> de Bruijn graphs (with <i>n</i> = 2<sup><i>k</i></sup> vertices) they are the binary strings of the form 1<i>x</i>1<i>y</i>1<i>y</i>, where <i>x</i> is a string of about log<sub>2</sub> <i>k</i> consecutive 0's and <i>y</i> is a string of length (<i>k</i> − len(<i>x</i>) − 3)/2 that doesn't contain <i>x</i> as a substring. The <i>y</i> part of this is what distinguishes this representative vertex from all the other ones, and we will look for this string to determine how to map path vertices to representative vertices. The <i>x</i> part of the string carries no useful identifying information, but instead will allow us to find <i>y</i> even when the string has been shifted and mangled in the process of finding a path between two representative vertices. With this choice of the length of <i>x</i>, a constant fraction of the strings that are the right length to be <i>y</i> are valid (don't contain <i>x</i> as a substring). The number of valid <i>y</i>'s, and therefore the size of the clique minor that we find, is proportional to the square root of <i>n</i>/log <i>n</i>.<br /><br />To find a path from one representative vertex to another, we simply follow edges that shift the bitstring left by one position, shifting in the bits of the second representative vertex as we shift out the bits of the first. This actually gives two paths between each two representative vertices (one in each direction) but that isn't a problem; just pick one of the two.<br /><br />In order to define the mapping from path vertices to representative vertices, it's convenient to think of a bitstring (vertex of the de Bruijn graph) as having its left end wrapped around and glued to the right end to form a single cyclic sequence of bits. As we follow the path, the string <i>x</i> of consecutive 0's will rotate from the left side of the string to the right and then back to the left, but will always be uniquely identifiable as the only string of consecutive 0's of the correct length in this cyclic sequence. From the position of <i>x</i>, in any path vertex, we can identify two substrings in the cyclic sequence, in the correct positions relative to <i>x</i> to be the <i>y</i>'s of a representative vertex. For the first half of the path, one of these two <i>y</i> substrings will be equal to the <i>y</i> of the starting vertex of the path, and the second will be arbitrary (some mix of the two path endpoints). For the second half of the path, the pattern is reversed: the other one of the two <i>y</i> substrings will be equal to the <i>y</i> of the ending vertex of the path, and the other one will be a mix. But we can tell which of these two situations is the case by looking at the position of the consecutive 0's. So we map each path vertex to the representative vertex for one of its two <i>y</i> substrings, the one that isn't mixed up.<br /><br />So which of sqrt(<i>n</i>) (the edge-counting upper bound) and sqrt(<i>n</i>/log <i>n</i>) (the explicit construction of a clique minor) is closer to the truth? I'm not sure. On the one hand, if you have representative vertices <i>k</i> units apart from each other (as seems necessary, up to constant factors) with disjoint paths between them in the clique minor, then comparing the total number of edges in these paths with the total number of edges in the complete minor would show that the sqrt(<i>n</i>/log <i>n</i>) bound is tight. On the other hand, in the construction above, the paths are not disjoint, and they can't be because the representative vertex doesn't have high degree. But I don't know how to define the mapping from paths from representative vertices without, seemingly, wasting bits on the <i>x</i> strings which are used only as markers to determine where in the path each vertex is.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:306907Shattered glass2015-03-21T04:55:15Z2015-03-21T04:55:15ZA broken pane in the main stairwell of my department's building (maybe a bird strike?) gave me a chance to play with the geometry of shattered glass.<br /><br /><div align="center"><img src="http://www.ics.uci.edu/~eppstein/pix/brenglass/1-m.jpg" border="2" style="border-color:black;" /></div><br /><br /><b>( <a href="http://www.ics.uci.edu/~eppstein/pix/brenglass/index.html">The rest of the photos</a> )</b>urn:lj:livejournal.com:atom1:11011110:306573Linkage for the ides of March2015-03-16T01:11:03Z2015-03-16T01:11:03Z<ul><li><a href="http://blogs.ams.org/visualinsight/2015/03/01/schmidt-arrangement/">The Schmidt arrangement</a>, triangular rosettes of circles from number theory (<a href="https://plus.google.com/100003628603413742554/posts/RM8JpoWaoA4">G+</a>)</li><br /><li><a href="https://archive.org/details/vieleckeundvielf00bruoft">Brückner, <i>Vielecke und Vielflache</i> (1900)</a>, with <a href="http://rudygodinez.tumblr.com/post/79054495133/prof-dr-max-bruckner-four-plates-from-the-book-vielecke">a tumblr post of some stellated polyhedra photographed in the book</a> (<a href="https://plus.google.com/100003628603413742554/posts/fZ4Txj3kJ7p">G+</a>)</li><br /><li><a href="http://www.fq.math.ca/Announcements/Riordan6.pdf">$1000 prize for solving open problems in OEIS</a> (<a href="https://plus.google.com/100003628603413742554/posts/Dj3e9dfkDaX">G+</a>)</li><br /><li><a href="http://bit.ly/1aKcndl">MAA celebrates Women's History Month</a> (<a href="https://plus.google.com/100003628603413742554/posts/TZsABVCueCi">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=0eC4A2PXM-U">Tensegrity robot video</a> (<a href="https://plus.google.com/100003628603413742554/posts/c7UwZtBgv52">G+</a>)</li><br /><li><a href="http://theconversation.com/you-probably-havent-heard-of-these-five-amazing-women-scientists-so-pay-attention-38329">Mini-biographies of five women scientists</a> (<a href="https://plus.google.com/100003628603413742554/posts/XvdMgN1vNhW">G+</a>)</li><br /><li><a href="http://www.youtube.com/watch?v=l4bmZ1gRqCc">Numberphile video on the diversity of human natural-language number systems</a> (<a href="https://plus.google.com/100003628603413742554/posts/73Njxen87s2">G+</a>)</li><br /><li><a href="http://iacopoapps.appspot.com/hopalongwebgl/">Interactive 3d fractal fly-through</a> (<a href="https://plus.google.com/100003628603413742554/posts/Vem5zGKrBpX">G+</a>)</li><br /><li><a href="http://www.laurenbcollister.com/well-well-look-whos-at-it-again">Yet more Elsevier misbehavior</a> (charging for access to open-access papers; <a href="https://plus.google.com/100003628603413742554/posts/BSXDuPdECwK">G+</a>)</li><br /><li><a href="http://sarielhp.org/blog/?p=8827">Sad news of Jirka Matousek's death</a> (<a href="https://plus.google.com/100003628603413742554/posts/DTwFj8qmvvT">G+</a>)</li><br /><li><a href="http://retractionwatch.com/2015/03/12/yes-we-are-seeing-more-attacks-on-academic-freedom-guest-post-by-historian-of-science-and-medicine/">Increasing attacks on academic freedom</a> (<a href="https://plus.google.com/100003628603413742554/posts/ELfJRoJda4c">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=2E9m6yDEIj8">Vi Hart throws cold water on the whole Pi day thing and how arbitrary it is</a> (<a href="https://plus.google.com/100003628603413742554/posts/6YiHR4YHEj4">G+</a>)</li><br /><li><a href="https://cameroncounts.wordpress.com/2015/03/15/folding-de-bruijn-graphs/">Folding de Bruijn graphs</a> (<a href="https://plus.google.com/100003628603413742554/posts/DUGi3ED1ppW">G+</a>)</li></ul>urn:lj:livejournal.com:atom1:11011110:306195Photos from Bellairs2015-03-15T04:31:47Z2015-03-15T04:31:47ZI was in Barbados last week for the <a href="http://cglab.ca/~morin/misc/bb2015/">Third Annual Workshop on Geometry and Graphs</a>. This time, unlike <a href="http://11011110.livejournal.com/286162.html">my visit last year</a>, I remembered to bring my camera.<br /><br /><div align="center"><img src="http://www.ics.uci.edu/~eppstein/pix/bellairs15/23-m.jpg" border="2" style="border-color:black;" /></div><br /><br /><b>( <a href="http://www.ics.uci.edu/~eppstein/pix/bellairs15/index.html">Many more photos, not all by me</a> )</b>urn:lj:livejournal.com:atom1:11011110:305937The nearest neighbor in an antimatroid2015-03-06T06:46:36Z2015-03-06T06:49:41ZFranz Brandenburg, Andreas Gleißner, and Andreas Hofmeier have <a href="http://dx.doi.org/10.1142/S1793830913600033">a 2013 paper</a> that considers the following problem: given a finite partial order P and a permutation π of the same set, find the nearest neighbor to π among the linear extensions of P. Here "nearest" means minimizing the <a href="https://en.wikipedia.org/wiki/Kendall_tau_distance">Kendall tau distance</a> (number of inversions) between π and the chosen linear extension. Or, to put it another way: you are given a directed acyclic graph whose vertices are tagged with distinct numbers, and you want to choose a topological ordering of the graph that minimizes the number of pairs that are out of numerical order.<br />Among other results they showed that this is NP-hard, 2-approximable, and fixed-parameter tractable.<br /><br />An idea I've been pushing (most explicitly in my recent <i>Order</i> paper) is that, when you have a question involving linear extensions of a partial order, you should try to generalize it to the basic words of an <a href="https://en.wikipedia.org/wiki/Antimatroid">antimatroid</a>. So now, let A be an antimatroid and π be a permutation on its elements. What is the nearest neighbor of π among the basic words of A? Can the fixed-parameter algorithm for partial orders be generalized to this problem?<br /><br />Answer: Yes, no, and I don't know. Yes, the problem is still fixed-parameter tractable with a nice dependence on the parameter. No, not all FPT algorithms generalize directly. And I don't know, because I don't seem to have subscription access to the journal version of the BGH paper, the <a href="http://www.uni-passau.de/fileadmin/files/forschung/mip-berichte/MIP-1102.pdf">preprint version</a> doesn't include the FPT algorithm, and I don't remember clearly enough what Franz told me about this a month or so ago, so I can't tell which one they're using.<br /><br />But anyway, here's an easy FPT algorithm for the partial order version of the problem (that might or might not be the BGH algorithm). For any element x, we can define a set L of the elements coming before x in the given permutation π, and another set R of the elements coming after x in the permutation; L, x, and R form a three-way partition of the elements. We say that x is "safe" if there exists a linear extension of P that gives the same partition for x. Otherwise, we call x "unsafe". Then in the linear extension nearest to π, every safe element has the same position that it has in π. For, if we had a linear extension σ for which this wasn't true, then the sequence (σ ∩ L),x,(σ ∩ R) would also be a linear extension and would have fewer inversions. On the other hand, every unsafe element participates in at least one inversion, so if the optimal solution value is k then there can be at most 2k unsafe elements. Therefore, we can restrict both π and P to the subset of unsafe elements, solve the problem on the resulting <a href="https://en.wikipedia.org/wiki/Kernelization">linear-sized kernel</a>, and then put back the safe elements in their places, giving an FPT algorithm.<br /><br />You can define safe elements in the same way for antimatroids but unfortunately they don't necessarily go where they should. As an extreme example, consider the antimatroid on the symbols abcdefghijklmnopqrstuvwxyz* whose basic words are strings of distinct symbols that are alphabetical up to the star and then arbitrary after it, and the permutation π = zyxwvutsrqponmlkjihgfedcba* that wants the symbols in backwards order but keeps the star at the end. The star is safe, but if we put it in its safe place then the only possible basic word is abcdefghijklmnopqrstuvwxyz* with 325 inversions. Instead, putting it first gives us the basic word *zyxwvutsrqponmlkjihgfedcba with only 26 inversions. So the same kernelization doesn't work. It does work to restrict π and P to the elements whose positions in π are within k steps of an unsafe element, but that gives a bigger kernel (quadratic rather than linear).<br /><br />Instead, let's try choosing the elements of the basic word one at a time. At each step, if the element we choose comes later in π than i other elements that we haven't chosen yet, it will necessarily cause i inversions with those other elements, and the total number of inversions of the word we're finding is just the sum of these numbers i. So when the number of inversions is small, then in most steps we should choose i = 0, and in all steps we should choose small values of i. In fact, whenever it's possible to choose i = 0, it's always necessary to do so, because any basic word consistent with the choices we've already made that doesn't make this choice could be made better by moving the i = 0 element up to the next position.<br /><br />So this leads to the following algorithm for finding a basic word with distance k: at each step where we can choose i = 0, do so. And at each step where the antimatroid doesn't allow the i = 0 choice, instead recursively try all possible choices of i from 1 to k that are allowed by the antimatroid, but then subtract the value of i we chose from k because it counts against the number of inversions we have left to find.<br /><br />Each leaf of the recursion takes linear time for all its i = 0 choices, so the main factor in the analysis is how many recursive branches there are. This number is one for k = 0 (because we can never branch), and it's also one for k = 1 (because at a branch point we can only choose i = 1 after which we are in the k = 0 case). For each larger value of k, the first time we branch we will be given a choice of all possible smaller values of k, and the total number of branches in the recursion will be the sum of the numbers of branches for these smaller values. That is, if R(k) denotes the number of recursive branches for parameter k, it obeys the recursion R(0) = R(1) = 1, R(k) = sum<sub>i<k</sub>R(i), which solves to R(k)=2<sup>k−1</sup>. So this algorithm is still fixed-parameter tractable, with only single-exponential dependence on k.<br />If we don't know k ahead of time, we can run the whole algorithm for k = 1,2,3,... and the time bound will stay the same.<br /><br />Given the existence of this simple O(2<sup>k</sup>nI) algorithm (where I is the time for testing whether the antimatroid allows an element to be added in the current position), does it make sense to worry about a kernelization, which after all doesn't completely solve the problem, but only reduces it to a smaller one? Yes. The reason is that if you kernelize (using the O(k<sup>2</sup>)-size kernel that restricts to elements that are within k steps of an unsafe element) before recursing, you separate out the exponential and linear parts, and get something more like O(nI + 2<sup>k</sup>k<sup>2</sup>I). But the difference between quadratic and linear kernels is swamped by the exponential part of the time bound, so rather than looking for smaller kernels it would be better to look for a more clever recursion with less branching.<br /><br />The same authors also have <a href="http://dx.doi.org/10.1007/s10878-012-9467-x">another paper</a> on <a href="https://en.wikipedia.org/wiki/Spearman&quot;s_rank_correlation_coefficient">Spearman footrule distance</a> (how far each element is out of its correct position, summed over all the elements) but the kernelization in this paper looks a little trickier and I haven't thought carefully about whether the same approach might work for the antimatroid version of that problem as well.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:305884Linkage for the end of a short month2015-03-01T02:09:07Z2015-03-01T17:02:04Z<ul><li><a href="http://www.theguardian.com/science/alexs-adventures-in-numberland/2015/jan/13/golden-ratio-beautiful-new-curve-harriss-spiral">The Harriss spiral</a> (<a href="https://plus.google.com/100003628603413742554/posts/cj7FuVzPcyY">G+</a>)</li><br /><li><a href="http://www.thisiscolossal.com/2015/02/ice-sand-scultpures-lake-michigan/">Wind-carved towers of sand and ice</a> (<a href="https://plus.google.com/100003628603413742554/posts/KAj7MgLygwJ">G+</a>)</li><br /><li><a href="http://boingboing.net/2015/01/28/watch-beachbot-make-large-scal.html">Beachbot</a>, a giant etch-a-sketch for your local beach (<a href="https://plus.google.com/100003628603413742554/posts/N7zNSZubpGG">G+</a>)</li><br /><li><a href="http://www.koutschan.de/data/link/index.html">Linkages that can draw any algebraic curve</a> (<a href="https://plus.google.com/100003628603413742554/posts/AojzKM96uR3">G+</a>)</li><br /><li><a href="https://cp4space.wordpress.com/2015/02/19/proto-penrose-tilings/">Precursors to the Penrose tiling</a> in the works of Kepler and the Islamic architects (<a href="https://plus.google.com/100003628603413742554/posts/h8aVPY67v4v">G+</a>)</li><br /><li><a href="http://fivethirtyeight.com/datalab/academy-awards-best-picture-instant-runoff/">Instant-runoff demo</a> (<a href="https://plus.google.com/100003628603413742554/posts/AQbqNjFsXi6">G+</a>)</li><br /><li><a href="https://3010tangents.wordpress.com/category/women-in-math">Women in mathematics</a> (<a href="https://plus.google.com/100003628603413742554/posts/KkeBR6hDLrD">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=i5oc-70Fby4">Big Bang Theory Eye of the Tiger Scene</a> (<a href="https://plus.google.com/100003628603413742554/posts/dUNx4JEs1n6">G+</a>)</li><br /><li><a href="http://www.scfbm.org/content/8/1/7/">Why using git is good scientific practice</a> (<a href="https://plus.google.com/100003628603413742554/posts/J21fqi9ZUqS">G+</a>)</li><br /><li><a href="https://en.wikipedia.org/wiki/Klam_value">Klam values</a> and other colorful neologisms from the parameterized complexity crowd (<a href="https://plus.google.com/100003628603413742554/posts/3aQLAeeKckW">G+</a>)</li><br /><li><a href="http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/">Timsort is broken</a> (and has been for the past 12 years) (<a href="https://plus.google.com/100003628603413742554/posts/MHsutRHNrQ1">G+</a>)</li></ul>urn:lj:livejournal.com:atom1:11011110:305481Highly abundant numbers are practical2015-02-26T18:49:00Z2015-02-27T19:00:19ZA <a href="https://en.wikipedia.org/wiki/Highly_abundant_number#References">highly abundant number</a> is a positive integer <i>n</i> that holds the record (among it and smaller numbers) for the biggest sum of divisors σ(<i>n</i>). While cleaning up some citations on the Wikipedia article, I ran across an unsolved problem concerning these numbers, posed by Jaycob Coleman and listed on <a href="https://oeis.org/A002093">the OEIS entry for them</a>: are all sufficiently large highly abundant numbers practical?<br /><br />A <a href="https://en.wikipedia.org/wiki/Practical_number">practical number</a> <i>n</i> has the property that all numbers up to <i>n</i> can be expressed as sums of distinct divisors of <i>n</i>. This can be tested by looking at the factorization of <i>n</i>: define the <<i>p</i>-smooth part of <i>n</i> to be the product of the prime factors of <i>n</i> that are less than <i>p</i>. Then <i>n</i> is practical if and only if, for each prime factor <i>p</i> of <i>n</i>, <i>p</i> is at most one more than the sum of divisors of the <<i>p</i>-smooth part of <i>n</i>. So, for instance, the highly abundant number 10 is not practical: the <5-smooth part of 10 is 2, and 5 is too big compared to σ(2) = 3. Also, 3 is not practical as its <3-smooth part is only one. Are these the only exceptions?<br /><br />As with other questions involving record-holders for <a href="https://en.wikipedia.org/wiki/Multiplicative_function">multiplicative functions</a>, the highly abundant numbers can be thought of as solutions to special instances of the <a href="https://en.wikipedia.org/wiki/Knapsack_problem">knapsack problem</a>: if we define the size of a prime power <i>p<sup>i</sup></i> to be log <i>p</i>, and we define its profit to be the logarithm of the factor (<i>p</i><sup><i>i</i> + 1</sup> − 1)/(<i>p</i><sup><i>i</i></sup> − 1) by which including <i>p<sup>i</sup></i> as a divisor of <i>n</i> would cause σ to increase (relative to the next lower power of <i>p</i>), then the factorization of <i>n</i> is given by the set of prime powers whose sizes add to at most log <i>n</i> and whose profits add to the largest number possible. I don't know how to use this knapsack view of the problem directly (in part because knapsack is a hard problem) but it is helpful in thinking about showing that certain factors must be present or absent because they would lead to a better knapsack solution.<br /><br />For instance, suppose that <i>n</i> is highly abundant, let <i>p</i> be the smallest prime that does not divide <i>n</i>, and let <i>P</i> be the largest prime factor of <i>n</i>. Then it must be true that <i>P</i> < <i>p</i><sup>2</sup>. For, if not, let <i>q</i> = floor(<i>P</i>/<i>p</i>). We could replace <i>P</i> in the factorization of <i>n</i> by <i>pq</i>, giving a smaller number than <i>n</i> with a bigger contribution to σ: at least (<i>p</i> + 1)<i>q</i>, versus at most <i>P</i> + 1, or smaller if <i>P</i> appears to a higher power than one.<br /><br />Based on this fact, it's straightforward to show that all highly abundant numbers that are divisible by four are practical. More strongly the same is true for other numbers <i>n</i> that are divisible by four and have the same inequality for <i>p</i> and <i>P</i>. For, if the first missing prime <i>p</i> is 3, then the sum of divisors of the <<i>p</i>-smooth part is at least 7, big enough to cover any prime factor <i>P</i> that satisfies the inequality. And for each additional prime factor of <i>n</i> smaller than <i>p</i>, the bound on <i>P</i> grows by at most four (by <a href="https://en.wikipedia.org/wiki/Bertrand%27s_postulate">Bertrand's postulate</a>) and the sum of divisors of the smooth part grows by at least four, so this sum of divisors always remains large enough to satisfy the condition for being practical.<br /><br /><s>But in their early work on highly abundant numbers, Alaoglu and Erdős observed that 210 is the largest highly abundant number to include only one factor of two in its prime factorization. All larger highly abundant numbers are divisible by four, and by the argument above they are all practical. The remaining cases are small enough to test individually, and they are all practical. So Jaycob Coleman's conjecture is true.</s><br /><br />Update: this claim about 210 is obviously wrong. 630 is highly abundant and is also not divisible by four. So here's a better argument along the same lines. The case <i>p</i> = 2 is easy to handle: <i>P</i> can only be 3, so <i>n</i> is a power of three. If it is not 3 itself, we could replace a factor of 9 in it by a factor of 8, getting a smaller number with a bigger contribution to σ (15 vs 13). So the only odd highly abundant number is 3. Similarly, if the first missing prime is 3, then <i>n</i> must be {2,5,7}-smooth. If it is divisible by 25, we can replace this factor by 24 (with a contribution of at least 32 to σ instead of 31) and if it is divisible by 7, we can replace this factor by 6 (with a contribution greater than 8 to σ instead of 8). So the only possible highly abundant numbers that are even but not divisible by 3 are powers of two and their multiples by five, and the only one of those that can be impractical is 10.<br /><br />Next, suppose that the first missing prime is 5, and there is only one factor of two. The <5-smooth part is at least 6 and its sum of divisors is 12, big enough to cover all primes less than 11, and if any of these primes is a factor of <i>n</i> then including it in the smooth part boosts the sum of divisors to large enough to cover all remaining factors. Similarly, if there is more than one factor of three, then the sum of divisors of the smooth part is at least 39, covering all possible prime factors. So the only possible impractical numbers in this case are not divisible by 5, 7, or 11 but are divisible by exactly one factor of 2 or 3 and may be divisible by 13, 17, 19, or 23. A factor of 13 can be replaced by a factor of 10 (contributing 14 to σ in either case, so giving a smaller number with the same sum of divisors). A factor of 17 can be replaced by a factor of 15 (contributing 19.5 to σ instead of 18). A factor of 19 can be replaced by a factor of 18 (contributing 23 1/2 instead of 20) and a factor of 23 can be replaced by a factor of 20 (contributing 30 instead of 24). So none of these cases give rise to new exceptions.<br /><br />Finally, if the first missing prime is 7, then <7-smooth part is at least 30 and its sum of divisors is at least 72, big enough to cover all primes less than 49, and from here we can use the same Bertrand postulate argument.<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:305358Halin graph algorithms made simple2015-02-19T01:49:18Z2015-02-19T01:49:18ZI have a new paper on the arXiv, <a href="http://arxiv.org/abs/1502.05334">D3-reducible graphs</a> (arXiv:1502.05334), but it's a small one that is not related to this week's many conference submission deadlines (ICALP yesterday, COLT tomorrow, WADS friday). One reason for its existence was that I wanted an implementable algorithm for working with <a href="https://en.wikipedia.org/wiki/Halin_graph">Halin graphs</a> (the graphs that you get by drawing a tree in the plane, with no degree-two vertices, and then connecting the leaves by a cycle surrounding the tree) and the algorithms that I could find for them were based on linear-time planarity testing, something I haven't yet worked up the courage to try implementing. Instead I found that it's possible to recognize Halin graphs, and to solve a wide class of related problems (such as finding their planar embeddings, decomposing them into a tree and a cycle, or finding a Hamiltonian cycle) using a simple reduction-based algorithm that repeatedly finds and simplifies certain local configurations within the graph. The two reductions that I used are shown below; one of them collapses a triangle of degree-three vertices to a point, and the other shortens certain paths of degree-three vertices.<br /><br /><div align="center"><img src="http://www.ics.uci.edu/~eppstein/0xDE/D3-reductions.png"></div><br /><br />Every Halin graph can be simplified by these reductions to a complete graph on four vertices; in terms of the tree and cycle decomposition of the Halin graph, one of these reductions removes the children from a tree node with two leaf children, and the other removes the middle of three consecutive leaf children. But if you want to use these to recognize Halin graphs only, you need to restrict them a little, because some other graphs can also be simplified to the same four-vertex complete graph, and that's mostly what the paper is about. I call these D3-reducible graphs, and they have a lot of properties in common with the Halin graphs: they are planar, minimally 3-vertex-connected, Hamiltonian, bounded treewidth, etc. One of the smallest examples of a D3-reducible graph that is not a Halin graph is the truncated tetrahedron graph:<br /><br /><div align="center"><img src="http://www.ics.uci.edu/~eppstein/0xDE/trunctet.png"></div><br /><br />I have updated my <a href="http://www.ics.uci.edu/~eppstein/PADS/">PADS Python algorithm library</a> to include the new Halin graph recognition algorithm, and some related algorithms, as <a href="http://www.ics.uci.edu/~eppstein/PADS/Halin.py">Halin.py</a>. (I also updated the license text for the library, to use the MIT license — you can do almost anything you want but don't hold me responsible for it — rather than trying to claim that the code is public domain, which I'm told is not so meaningful legally.)<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:304968Linkage2015-02-16T01:32:12Z2015-02-16T01:32:12ZI don't know what Google+ is doing under the hood (and don't really want to know) but whatever it is seems kind of bloated to me, enough to kill my browser and the responsiveness on my whole machine when I try to open 14 G+ tabs at once. But anyway, here they are:<br /><ul><li><a href="http://www.slate.com/articles/technology/bitwise/2014/12/wikipedia_editing_disputes_the_crowdsourced_encyclopedia_has_become_a_rancorous.single.html">Sexism and bureaucracy at Wikipedia</a> and <a href="http://ergodicity.net/2015/01/23/linkage-55/">an update on the Walter Lewin sexual harassment story</a> (<a href="https://plus.google.com/100003628603413742554/posts/TzcWwiVhKtr">G+</a>)</li><br /><li><a href="http://www.dailykos.com/story/2015/01/28/1360765/-Gov-Scott-Walker-seeks-300-million-in-university-cuts-but-220-million-to-build-Bucks-a-new-arena">Wisconsin gov. Walker seeks major cuts on universities so he can build a sportsball facility;</a> Calif. gov. Brown isn't much better (<a href="https://plus.google.com/100003628603413742554/posts/SKkguKRxmLB">G+</a>)</li><br /><li><a href="http://www.confsearch.org/confsearch/faces/pages/topic.jsp?topic=Theory&sortMode=1&graphicView=1">Conference search: Theory</a> (<a href="https://plus.google.com/100003628603413742554/posts/9My7JoFhSgN">G+</a>)</li><br /><li><a href="https://twitter.com/INTERESTING_JPG/status/562618942217531393">Automated textual image analysis results</a> and <a href="http://deeplearning.cs.toronto.edu/i2t">engine</a> (<a href="https://plus.google.com/100003628603413742554/posts/grLqEiKonkk">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=GznQgTdEdI4">Super eggs</a>: the mathematics behind the shape of, among other things, Azteca Stadium in Mexico City (<a href="https://plus.google.com/100003628603413742554/posts/2QrxUEH2NDx">G+</a>)</li><br /><li><a href="http://libraries.calstate.edu/equitable-access-public-stewardship-and-access-to-scholarly-information">Cal State Univ. gives up on Wiley journals after hefty price increases and refusal to unbundle</a> (<a href="https://plus.google.com/100003628603413742554/posts/URkXdWxDzew">G+</a>)</li><br /><li><a href="https://gilkalai.wordpress.com/2015/02/06/from-oberwolfach-the-topological-tverberg-conjecture-is-false">Topological Tverberg counterexample</a>. It is true for all prime-power dimensions but that wasn't good enough to be true for all dimensions. (<a href="https://plus.google.com/100003628603413742554/posts/KRqdQqCt9Gw">G+</a>)</li><br /><li><a href="http://blog.matthen.com/post/97284098616/take-a-rectangle-and-cut-it-along-a-random-line">Randomly cut and flipped rectangles</a> from another Tumblr of interesting mathematical visualizations (<a href="https://plus.google.com/100003628603413742554/posts/4Dw5FthmMjg">G+</a>)</li><br /><li><a href="http://www.maureeneppstein.com/mve_journal/?p=634">1961 interview with F1 racing driver Bruce McLaren's family</a>. From my mother's blog; McLaren was her second cousin. (<a href="https://plus.google.com/100003628603413742554/posts/4vd3YZSkdfK">G+</a>)</li><br /><li><a href="http://www.theguardian.com/science/alexs-adventures-in-numberland/2015/feb/10/muslim-rule-and-compass-the-magic-of-islamic-geometric-design">Muslim rule and compass: the magic of Islamic geometric design</a> (<a href="https://plus.google.com/100003628603413742554/posts/HorwnpBrtM9">G+</a>)</li><br /><li><a href="http://www.umass.edu/gradschool/sites/default/files/iranian_student_admissions_2_2015.pdf">UMass Amherst bans Iranian STEM grad students</a> (<a href="https://plus.google.com/100003628603413742554/posts/VDYSkY69tGe">G+</a>)</li><br /><li><a href="http://www.metafilter.com/146924/Paper-Engineering-Over-700-years-of-Fold-Pull-Pop-and-Turn">Many links on pop-up books and related paper engineering problems</a> (<a href="https://plus.google.com/100003628603413742554/posts/eijbaYWgV4w">G+</a>)</li><br /><li><a href="http://www.win.tue.nl/SoCG2015/?page_id=601">SoCG accepted papers, with abstracts</a> (<a href="https://plus.google.com/100003628603413742554/posts/LAJJqRDivFX">G+</a>)</li><br /><li><a href="http://boingboing.net/2015/02/14/facebook-tells-native-american.html">The Nymwars continue at Facebook</a> (<a href="https://plus.google.com/100003628603413742554/posts/Wa3EkaMXgog">G+</a>)</li></ul>urn:lj:livejournal.com:atom1:11011110:304679Where do you get your BibTeX data?2015-02-06T06:35:27Z2015-02-06T19:30:55ZFormatting a couple hundred references for a proposal led me to wonder: If you find yourself wanting to look up the BibTeX data for a paper, where do you go? And how much do you have to edit it yourself afterwards?<br /><br />The three most obvious choices for me are <a href="http://www.informatik.uni-trier.de/~ley/db/">DBLP</a>, <a href="http://dl.acm.org/">ACM Digital Library</a>, or <a href="http://www.ams.org/mathscinet/">MathSciNet</a>.<br /><br />There used to be a project to maintain a collective file "geom.bib" with all the references that any computational geometer would ever use. I still have about 18 copies of it on my computer (presumably not all in sync with each other) from various papers that used it, but it became unwieldy (too big to use as one file) and seems to have fallen by the wayside. Additionally, many publishers supply citation files for their own publications, so you could use those, or even take the time to write your own. But my experience is that most of the publishers are not good at generating clean data (e.g. they use hyphens instead of en-dashes for page ranges, or permute conference title words into a different order than what you'd want to use in a citation), although at least they're better at it than Google scholar.<br /><br />The big three above all have their quirks, but they generate pretty clean data (especially if you tell DBLP not to use crossref). Copying from them can be a lot easier and less error-prone than typing it all in yourself, and picking one source and sticking to it could also help achieve greater consistency. DBLP has the best coverage for Computer Science, I think. I recently looked at a five-year window of my papers (for the prior work section of that proposal) and it missed only three (two in non-computer science journals about topology and mathematical psychology, and the third in an edited volume about cellular automata).<br /><br />My own idiosyncratic preference is for MathSciNet, though. Their coverage is almost as good for my purposes (sometimes better) but what ends up making the difference for me is their care about the capitalization of title words and formatting of math in titles. DBLP and ACM leave lots of words capitalized and let the bibtex style lowercase them later, which mostly works, but fails when some words are proper nouns that should stay capitalized. MathSciNet takes care to lowercase everything to how it should appear in a citation (my preference) and to protect the letters that should remain uppercase. And for titles that contain formulas, MathSciNet gets it right and the other two don't.<br /><br />Example: ACM: "The h-Index of a Graph and Its Application to Dynamic Subgraph Statistics".<br />DBLP: "The h-Index of a Graph and Its Application to Dynamic Subgraph Statistics" (journal version); "The \emph{h}-Index of a Graph and Its Application to Dynamic Subgraph Statistics" (conference version).<br />MathSciNet: "The {$h$}-index of a graph and its application to dynamic subgraph statistics". One of these is correct and the others aren't.<br /><br />But maybe there's some new tool or database that beats all of these that I haven't yet found out about. One of my co-authors uses Zotero, but I haven't tried that myself. Are systems like it based on shared libraries rather than comprehensive databases still useful?<br /><br />(See also <a href="https://plus.google.com/u/0/100003628603413742554/posts/T7msni7sGmJ">discussion on G+</a> from the same post.)<a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:304478Linkage2015-02-01T03:17:46Z2015-02-01T03:17:46ZDid you know...<ul><li>... that <a href="http://www.imdb.com/title/tt2582802/">Bernard Chazelle's son directed a film that has been nominated for a best-picture Oscar?</a> (<a href="https://plus.google.com/100003628603413742554/posts/FsKPpc8K545">G+</a>)</li><br /><li>... that <a href="http://www.sciencepubs.org/content/347/6217/14.full">the rebellion in Ukraine has caused many scientists and whole universities to move?</a> (<a href="https://plus.google.com/100003628603413742554/posts/fAThtaZX9kT">G+</a>)</li><br /><li>... that <a href="https://adamsheffer.wordpress.com/2015/01/19/a-list-of-recent-papers/">there have been many recent papers on counting geometric incidences?</a> (<a href="https://plus.google.com/100003628603413742554/posts/5AB15iLt8kc">G+</a>)</li><br /><li>... that <a href="http://www.wired.com/2015/01/chocolates-whose-intricate-architecture-designed-tweak-taste-buds/">the shape of a piece of 3d-printed chocolate might influence its flavor?</a> (<a href="https://plus.google.com/100003628603413742554/posts/Ri7GagMRtza">G+</a>)</li><br /><li>... that <a href="http://www.wired.com/2015/01/quanta-curves-from-flatness-kirigami/">placing precise slits in a flat paper surface can cause it to curve in predictable ways?</a> (<a href="https://plus.google.com/100003628603413742554/posts/RxzVP7VWdkJ">G+</a>)</li><br /><li>... that <a href="https://www.youtube.com/watch?v=on3ZLLKQp-4">the waterbear is a new fast knightship in Conway's game of life?</a> (<a href="https://plus.google.com/100003628603413742554/posts/hkGgm2ohJfG">G+</a>)</li><br /><li>... that <a href="http://www.thisiscolossal.com/2015/01/intricate-modular-paper-sculptures-by-richard-sweeney/">Richard Sweeney's paper-folding artworks are inspired by snow and clouds?</a> (<a href="https://plus.google.com/100003628603413742554/posts/e6xLbXJeeJS">G+</a>)</li><br /><li>... that <a href="https://www.google.com/webmasters/tools/mobile-friendly/">Google has a service for checking whether your home page is mobile-friendly?</a> (<a href="https://plus.google.com/100003628603413742554/posts/8fdGejK5U1W">G+</a>)</li><br /><li>... that <a href="http://hyrodium.tumblr.com/post/109000595139/i-made-gif-animations-of-sum-of-square-numbers">the sum of the first n squares is n(n+1)(2n+1)/6?</a> (<a href="https://plus.google.com/100003628603413742554/posts/JxFABtKkxj1">G+</a>)</li><br /><li>... that <a href="https://facultystaff.richmond.edu/~ebunn/homocentric/">epicycles can be visualized by spheres spinning inside each other?</a> (<a href="https://plus.google.com/100003628603413742554/posts/P5H2889XWxR">G+</a>)</li><br /><li>... that <a href="https://www.youtube.com/watch?v=74BGYzSkMeU">Paul Erdős traveled to Madras to meet Krishnaswami Alladi when Alladi was only an undergraduate?</a> (<a href="https://plus.google.com/100003628603413742554/posts/MRo43mTmSGN">G+</a>)</li><br /><li>... that <a href="http://aperiodical.com/2015/01/apiological-mathematical-speculations-about-bees-part-1-honeycomb-geometry/">you can persuade bees to make honeycombs in nonstandard tessellations by giving them patterned foundation plates?</a> (<a href="https://plus.google.com/100003628603413742554/posts/7ytwWuMJzsJ">G+</a>)</li><br /><li>... that <a href="http://googlescholar.blogspot.com/2015/01/blast-from-past-reprint-request.html">professors used to send each other postcards requesting printed copies of their recent papers?</a> (<a href="https://plus.google.com/100003628603413742554/posts/UPntqoRxWtk">G+</a>)</li><br /><li>... that <a href="http://boingboing.net/2015/01/22/origami-dollar-bill-koi.html">you can fold a dollar bill into a fish?</a> (<a href="https://plus.google.com/100003628603413742554/posts/RPB3AWW8Dhb">G+</a>)</li><br /><li>... that <a href="http://www.jebiga.com/strandbeest-kinetic-animal-sculptures-theo-jansen/">Theo Jansen's autonomous walking creatures have no brains?</a> (<a href="https://plus.google.com/100003628603413742554/posts/G8Cd8U1MEsk">G+</a>)</li></ul>urn:lj:livejournal.com:atom1:11011110:304362The linear algebra of edge sets of graphs2015-01-22T23:29:25Z2015-01-23T03:29:20ZThis quarter, in my advanced algorithms class, I've been going through <a href="http://www.cc.gatech.edu/fac/Vijay.Vazirani/book.pdf">Vazirani's <i>Approximation Algorithms</i> book</a> chapter-by-chapter, and learning lots of interesting material that I didn't already know myself in the process.<br /><br />One of the things I recently learned (in covering chapter 6 on feedback vertex set approximation)<sup>*</sup> is that, although all the students have taken some form of linear algebra, many of them have never seen a vector space in which the base field is not the real numbers or in which the elements of the vector space are not tuples of real coordinates. So instead of discussing the details of that algorithm I ended up spending much of the lecture reviewing the theory of binary vector spaces. These are very important in algebraic graph theory, so I thought it might be helpful to write a very gentle introduction to this material here.<br /> <br />First of all we need the concept of a <a href="https://en.wikipedia.org/wiki/Field_(mathematics)">field</a>. This is just a system of elements in which we can perform the usual arithmetic operations (addition, subtraction, multiplication, and division) and expect them to behave like the familiar real number arithmetic: addition and multiplication are associative and commutative, there are special values 0 and 1 that are the identities for addition and multiplication respectively, subtraction is inverse to addition, division is inverse to multiplication by anything other than zero, and multiplication distributes over addition. The field that's important for this material is a particularly simple one, <a href="https://en.wikipedia.org/wiki/GF(2)">GF(2)</a>, in which the required special values 0 and 1 are the only elements. The arithmetic of these two values can be described as ordinary integer arithmetic mod 2, or equivalently it can be described by saying that addition is Boolean xor and multiplication is Boolean and. Subtraction turns out to be the same as addition, and division by 1 (the only value that it's possible to divide by) is just the identity operation. It's not hard to verify that these operations have all the desired properties of a field, and doing so maybe makes a useful exercise (Exercise 1).<br /><br />Next, a <a href="https://en.wikipedia.org/wiki/Vector_space">vector space</a> is a collection of elements that can be added to each other and multiplied by <a href="https://en.wikipedia.org/wiki/Scalar_(mathematics)">scalars</a> from a field. (One can generalize the same concept to other kinds of arithmetic than fields but then one gets modules instead of vector spaces.) The vector addition operation must be commutative and invertible; this implies that it has an identity element, and this element (whatever it happens to be) is called the zero vector. Additionally, scalar-scalar-vector multiplications must be associative, scalar multiplication by the special element 1 of the field must be the identity operation, and scalar multiplication must be distributive over both vector and field addition.<br /><br />One easy way to construct vector spaces over a field <b>F</b> is to make its elements be <i>k</i>-tuples of elements of <b>F</b> with the addition and scalar multiplication operations acting independently on each coordinate, but it's not the only way. For the vector spaces used in this chapter, a different construction is more natural: we let the elements of the vector space be sets in some family of sets, and the vector addition operation be the <a href="https://en.wikipedia.org/wiki/Symmetric_difference">symmetric difference</a> of sets. The symmetric difference <i>S</i> Δ <i>T</i> of two sets <i>S</i> and <i>T</i> is the set of elements that occur in one but not both of <i>S</i> and <i>T</i>. This operation is associative, commutative, and invertible, where the inverse of a set is the same set itself: <i>S</i> Δ <i>T</i> Δ <i>T</i> = <i>S</i> regardless of which order you use to perform the symmetric difference operations. If a nonempty family of sets has the property that the symmetric difference of every two sets in the family stays in the family, then these sets can be interpreted as the elements of a vector space over GF(2) in which the vector addition operation is symmetric difference, the zero vector is the empty set (necessarily in the family because it's the symmetric difference of any other set with itself), scalar multiplication by 0 takes every set to the empty set, and scalar multiplication by 1 takes every set to itself. One has to verify that these addition and multiplication operations are distributive, but again this is a not-very-difficult exercise (Exercise 2).<br /><br />As with other kinds of vector spaces, these vector spaces of sets have bases, collections of vectors such that everything in the vector space has a unique representation as a sum of scalar products of basis vectors. Every two bases have the same number of vectors as each other (Exercise 3: prove this), and this number is called the dimension of the vector space. If the dimension is <i>d</i>, the total number of vectors in the vector space is always exactly 2<sup><i>d</i></sup>, because that is the number of different ways that you can choose a scalar multiple (0 or 1) for each basis vector. <br /><br />The families of sets that are needed for this chapter are subsets of edges of a given undirected graph. These can also be interpreted as subgraphs of the graph, but they're not quite the same because the usual definition of a subgraph also allows you to specify a subset of the vertices (as long as all edges in the edge subset have endpoints in the vertex subset), and we won't be doing that. Every graph has three important vector spaces of this type associated with it, the edge space, the cycle space, and the cut space. The edge space is the family of all subsets of edges (including the set of all edges of the given graph and the empty set). That is, it is the <a href="https://en.wikipedia.org/wiki/Power_set">power set</a> of the set of all edges; it has a natural basis in which the basis vectors are the one-edge sets, and its dimension is the number of edges in the graph.<br /><br />The <a href="https://en.wikipedia.org/wiki/Cycle_space">cycle space</a> is the family of all subsets of edges that have even degree at all of the vertices of the graph (Exercise 4: prove that this family is closed under symmetric difference operations). So it includes the simple cycles of the graph, but it also includes other subgraphs; for instance in the graph of an octahedron (a six-vertex graph with four edges at each vertex) the set of all edges is in the cycle space, as are the sets of edges formed by pairs of triangles that touch each other at a single vertex and the sets complementary to triangles or 4-cycles. It's always possible to find a basis for the cycle space in which the basis elements are themselves simple cycles; such a basis is called a <a href="https://en.wikipedia.org/wiki/Cycle_basis">cycle basis</a>. For instance you can form a "fundamental cycle basis" by choosing a spanning forest of the given graph and then finding all cycles that have one edge <i>e</i> outside this forest and that include also the edges of the unique path in the forest that connects the endpoints of <i>e</i>. Or, for a planar graph, you can form a cycle basis by choosing one cycle for each bounded face of a planar embedding of the graph. There are lots of interesting algorithmic problems associated with the cycle space and its cycle bases, but for this chapter the main thing that's needed is to compute its dimension, which has the nice formula |<i>E</i>| − |<i>V</i>| + <i>c</i>, where <i>E</i> is the edge set of the given graph, <i>V</i> is the vertex set, and <i>c</i> is the number of connected components. One name for this dimension is the <a href="https://en.wikipedia.org/wiki/Circuit_rank">cyclomatic number</a> of the graph, and the book chapter denotes it as cyc(<i>G</i>). (It's also possible to interpret it topologically as the first Betti number of the graph but for students who don't already know about binary vector spaces that would probably be more confusing than helpful.)<br /><br />The cut space of the graph doesn't take part in this chapter, but can be defined similarly as the set of all cut-sets of the graph. A <a href="https://en.wikipedia.org/wiki/Cut_(graph_theory)">cut</a> of a graph is a partition of its vertices into two disjoint subsets; in some contexts we require the subsets to both be nonempty but we don't do that here, so the partition into an empty set and the set of all vertices is one of the allowed cuts. The corresponding cut-set is the set of edges that have one endpoint in each of the two subsets. The family of cut-sets is closed under symmetric difference (Exercise 5) so it forms a vector space, the edge space. If the edges are all given positive weights and the graph is connected, then the minimum weight basis of the edge space can be represented by a tree on the vertices of the graph, in which each tree edge determines a cut (the partition of the tree into two subtrees formed by deleting that edges) and has an associated number (the weight of its cut). This tree is called the <a href="https://en.wikipedia.org/wiki/Gomory%E2%80%93Hu_tree">Gomory–Hu tree</a> of the graph and it came up (stripped of its linear-algebra origin) earlier, in an approximation for <i>k</i>-cuts in chapter 4. I also have a recent preprint on computing this basis and this tree for graphs that can be embedded onto low-genus surfaces: see <a href="http://arxiv.org/abs/1411.7055">arXiv:1411.7055</a>.<br /><br /><small><sup>*</sup>Unrelatedly, in preparing to cover this topic, I was confused for a long time by a typo in this chapter. On page 56 it states that, for a minimal feedback set, "clearly" the sum over feedback vertices of the number of components formed by deleting that one vertex equals the number of feedback vertices plus the number of components that are formed by deleting the whole feedback set but that touch only one vertex in the set. This is not true. What is true, and what is needed for the later argument, is that the left hand side is greater than or equal to the right hand side.</small><a name='cutid1-end'></a>urn:lj:livejournal.com:atom1:11011110:304060Linkage2015-01-16T04:16:46Z2015-01-16T04:16:46Z<ul><li><a href="http://www.thisiscolossal.com/2015/01/pixel-a-mesmerizing-dance-performance-incorporating-digital-projection/">Real-time 3d special effects in modern dance</a> (<a href="https://plus.google.com/100003628603413742554/posts/Hp8vcVRmzHS">G+</a>)</li><br /><li><a href="http://stemfeminist.com/2015/01/05/450/">How not to react to conference talks that happen to be presented by women</a> (<a href="https://plus.google.com/100003628603413742554/posts/KvCqKMhU84U">G+</a>, including also an unrelated report from the SODA business meeting)</li><br /><li><a href="http://www.neatorama.com/2015/01/07/Iced-Intrigue/">Photos of icy landscapes</a> showing how varied the geometry of ice can be (<a href="https://plus.google.com/100003628603413742554/posts/M9v6nj2Kfu2">G+</a>)</li><br /><li><a href="http://awards.acm.org/press_releases/fellows-2014b.pdf">New ACM fellows</a> (<a href="https://plus.google.com/100003628603413742554/posts/8HgRjNyNuQE">G+</a>)</li><br /><li><a href="http://www.maths.manchester.ac.uk/~jm/Choreographies/about.html">n-body choreagraphies</a> (strange solutions to the n-body problem in which all bodies follow each other along a curve; <a href="http://gminton.org/#choreo">more</a> and <a href="https://en.wikipedia.org/wiki/N-body_choreography">still more</a>; <a href="https://plus.google.com/100003628603413742554/posts/84uAkqPtzrM">G+</a>)</li><br /><li><a href="http://www.washingtonpost.com/news/speaking-of-science/wp/2015/01/08/men-on-the-internet-dont-believe-sexism-is-a-problem-in-science-even-when-they-see-evidence/?Post+generic=?tid%3Dsm_twitter_washingtonpost">Men (on the Internet) don’t believe sexism is a problem in science, even when they see evidence</a> (<a href="https://plus.google.com/100003628603413742554/posts/9kgtv1mh5SR">G+</a>)</li><br /><li><a href="https://plus.google.com/101584889282878921052/posts/VbBk9JrLxqm">The fractional chromatic number of the plane</a> (<a href="https://plus.google.com/100003628603413742554/posts/Ea6VqUWL6XG">G+</a>)</li><br /><li><a href="https://www.youtube.com/watch?v=KboGyIilP6k">Elwyn Berlekamp video on dots-and-boxes strategy</a> (<a href="https://plus.google.com/100003628603413742554/posts/UrgtLhCcEi9">G+</a> <a href="https://plus.google.com/113862074718836293294/posts/aJi4HxTP9Pe">reshare</a>)</li><br /><li><a href="http://richardelwes.co.uk/2015/01/02/the-grothendieck-song/">Richard Elwes sings the Grothendieck Song for us</a> (<a href="https://plus.google.com/100003628603413742554/posts/XDe3WtoERW5">G+</a>)</li><br /><li><a href="http://www.thisiscolossal.com/2015/01/fascinating-3d-printed-fibonacci-zoetrope-sculptures/">Animated shapes from a 3d printed object, a turntable, and a strobe light</a> (<a href="https://plus.google.com/100003628603413742554/posts/Jpk5j2sKQqB">G+</a> <a href="https://plus.google.com/117273001021476361745/posts/QjURgBC7K3j">reshare</a>)</li><br /><li><a href="http://gruze.org/tilings/">Why tilings by regular polygons can't include the pentagon</a> (<a href="https://plus.google.com/100003628603413742554/posts/PZMj7dnC9oC">G+</a> via <a href="http://www.metafilter.com/146120/No-Pentagons">MF</a>)</li></ul>