20 October 2006 @ 07:54 am
An old paper of mine on finding complete bipartite subgraphs in sparse graphs can be reinterpreted as doing a form of data mining called formal concept analysis (see e.g. Choi for the connections between these two formalisms). A system of objects and attributes (say, photos in Flickr and their tags) can be represented as a bipartite graph, and "concepts" are bicliques (maximal complete bipartite subgraphs). The concepts form a lattice in which the join and meet operations both involve intersecting the vertex sets of bicliques on one side of the bipartition and then extending that intersection to a biclique on the other side.

Anyway, the result of my paper, rephrased in this language, was that if the object-attribute graph is sparse, then the total size of all the concepts in the concept lattice is linear in the number of objects and attributes, and the lattice can be generated in linear time. Or at least, the set of bicliques can be generated in that time; I didn't address the connections between bicliques in the concept lattice structure.

Which all makes it a little odd to see a paper by Lindig claiming that in systems with sparse object-attribute graphs, the size of the concept lattice empirically seems to grow quadratically. I think the resolution of this conflict is that the definitions of "sparse" are different: in my paper, a system is sparse if there's some absolute bound k such that any subsystem of N objects and attributes has at most kN relations. Equivalently, the system can be constructed from a (Barabasi-like) growth process in which one adds objects and attributes one at a time, each new object or attribute connected to at most k earlier attributes or objects. In Lindig's, the systems are generated randomly with a small but fixed fill-in factor, so one can view them as a form of Erdős–Rényi random graph...I'm wondering whether his quadratic growth rate is less about sparsity and more about randomness.

23 January 2006 @ 04:38 am
Apparently the easy way to increase the views for your photos on flickr is to use "nude" as a keyword or tag. At least it's been working out that way for one of my photos, which doesn't have any other reason I can think of to be viewed ten times as often as the others. The fact that the nude in question is a bronze statue doesn't seem to be deterring the viewers...

31 December 2005 @ 03:38 pm

My father-in-law was discussing the other day how much less useful Google has become lately. He's an electrical engineer; it used to be that, if he wanted to know about some part, he could just google it and find a spec sheet for what he wanted. Nowadays his answer is instead buried among huge numbers of sites trying to sell him the part or the information on it, and not providing the actual information.

I encountered the same phenomenon just now, trying to find online galleries of posters like the one from which my new icon's art comes. I found the poster itself in a book, Off the Wall: Psychadelic Rock Posters From San Francisco, by Amélie Gastaut and Jean-Pierre Criqui, which I picked up in the Gallery Bookshop in Mendocino, and it was not hard to find scanned images of it from various poster shops on the web. But when I realized that there must be online galleries similar to the collection in the book, and went looking for them, I found the search more difficult than expected; the obvious keywords like "psychadelic poster art" just led to commercial sites.

Eventually I found some sites by entering more specific information, the name of the artist (Bonnie MacLean) of the poster in question. Apparently online art exhibits care more about that kind of information than online memorabilia salesmen do. For future reference, the two sites I found are Pooter's Psychadelic Shack, and Professor Poster.

Google has had some success splitting out different kinds of searches: Google blog search separate from its main web page search, etc. I wonder whether a similar more specialized non-commerce search would be helpful? The question is how to distinguish the commercial sites from the other ones. It isn't the .com address (both of the real sites I found had that), and it's not even the actual information content (many of the commercial sites have fine poster collections), but how it's organized and the intent behind the organization; it seems difficult to determine that intent automatically. Alternatively, one could imagine that people who make lists of links to interesting art sites are more likely to list the noncommercial ones; I wonder how well Kleinberg's hub-and-authority model does at picking such sites out, relative to Google's more naive pagerank algorithm?

03 November 2005 @ 02:22 pm
The first story I've seen about successful use of Google Print, and it's a recreational geometry book. Eric Gjerde used Google Print to find Greg Frederickson's book Hinged Dissections: Swinging and Twisting. He writes: "I can honestly say I would never have found this book if it was not indexed in Google Print. "

ETA: Metafilter thread on Google Print. Most of which seems to be comments of the form "this looks neat, why would you want to stop them from doing that?"

22 September 2005 @ 04:48 pm

Saw a talk today on how event-based indexing of all multimedia data ever recorded anywhere will save us from the hell that is text documents and searching for keywords on Google. I dunno, I kind of like searching for keywords on Google. I feel I know much more, am much smarter, when I work in combination with Google than on my own. And when I struggle with indexing multimedia data (say, my photos) it's the opposite problem from indexing everything: I want to be selective in what I index, throw away the unimportant photos so they don't distract from the good ones. The talk's claim that photographic data is more objective than text rankled a little as well, since I know from experience that my own photography tends to succeed when I have an emotional response to the subject, and fail otherwise. Even a fully automated webcam is set at a viewpoint which was chosen by a human for a reason, rather than achieving complete objectivity by being completely random and purposeless.

The thought of recording and accessing all human experience as video reminds me of the Borges story of the country where the mapmakers got so good at making maps that they made one at 1:1 scale. Or was it Lafferty? Probably both. Not much use as a map, anyway. And since I like doing Google keyword searches so much: the first Google search I tried finds a bunch of relevant stuff, the sixth hit from which is exactly the quote I wanted:

...In that Empire, the craft of Cartography attained such Perfection that the Map of a Single province covered the space of an entire City, and the Map of the Empire itself an entire Province. In the course of Time, these Extensive maps were found somehow wanting, and so the College of Cartographers evolved a Map of the Empire that was of the same Scale as the Empire and that coincided with it point for point. Less attentive to the Study of Cartography, succeeding Generations came to judge a map of such Magnitude cumbersome, and, not without Irreverence, they abandoned it to the Rigours of sun and Rain. In the western Deserts, tattered Fragments of the Map are still to be found, Sheltering an occasional Beast or beggar; in the whole Nation, no other relic is left of the Discipline of Geography.

Which reminds me, I also kind of like text documents. How could I read Borges without them?