0xDE (11011110) wrote,

Visualizing the space of linear threshold functions

A linear threshold function is a function that maps n-tuples of Boolean variables to a single Boolean variable. Such a function is defined by choosing a weight wi to each variable and a target weight W. We can then compute the function value by summing the weights of the true input variables, and letting the output be true if this sum is at least W. Geometrically, the 2n possible input tuples can be mapped to the vertices of a hypercube {-1,+1}n by making the coordinate value +1 whenever the corresponding input variable is true and -1 whenever it is false. The weights define a hyperplane in Rn, such that the hypercube vertices for which the function is true are on one side of the hyperplane and the hypercube vertices for which the function is false are on the other side. It appears to be open to determine exactly how many different threshold functions exist; OEIS lists the number of threshold functions only up to eight variables. If these numbers grow exponentially with the square of the dimension (as seems likely from the OEIS data) then this would lead to an exponentially small bound on the margin of linear classifiers for hypercubes, answering Leo Kontorovich's recent question about how small these margins can get.

Linear threshold functions on n variables are equivalent to symmetric linear threshold functions on n + 1 variables. A symmetric function is one defined by a hyperplane through the origin, or equivalently, a system of weights with W = 0, or a threshold function that true for exactly one of each opposite pair of hypercube vertices. To turn a symmetric function on n + 1 variables into an asymmetric one on fewer variables, set the last variable to false. In the other direction, given a linear threshold function defined by a separating hyperplane H in Rn, view Rn as passing through one of the facets of a hypercube in a space of one higher dimension, and form a central hyperplane that passes through H and the origin. Thus, counting symmetric threshold functions gives the same answers as counting asymmetric ones, but allows us to go to one more variable: nine variables instead of eight for the counts in OEIS. It makes sense to think of the system of weights defining a symmetric linear threshold function as the coordinates of a vector, in the same space that the hypercube is embedded into. Two different systems of weights give the same function value when evaluated at a pair of opposite hypercube vertices if and only if the two weight vectors are on the same side of a hyperplane that perpendicularly bisects the two vertices. So, if we group vectors into equivalence classes according to whether they produce the same function, these equivalence classes are just the cells of a hyperplane arrangement, formed by this collection of perpendicular bisectors. What does this arrangement look like?

To begin with, we can normalize the vectors so they lie on a surface of one lower dimension than the number of variables. For instance, weight systems (w1,w2,w3) defining three-variable symmetric threshold functions can be interpreted as points in three-dimensional space, but multiplying all the weights by the same positive scalar doesn't change the function they define. So we can normalize each vector by dividing by its magnitude, producing a unit vector that lies within the same cell of the arrangement. All such unit vectors form a sphere, and on this sphere, the hyperplanes bisecting each pair of hypercube vertices form an arrangement of four great circles, cutting the sphere up into a pattern of spherical quadrilaterals and spherical triangles, with the combinatorial structure of a cuboctahedron (left, below). However, one can also normalize in a different way, by dividing each vector by the largest absolute value of any coordinate, so that the normalized vectors lie on the surface of the hypercube itself. In this cubical normalization, the plane that bisects two opposite vertices of a cube cuts the surface of the cube in a regular hexagon, and these hexagons combine to cut each square face of the cube by a rotated square connecting the midpoints of the cube edges (right, below). Even more simply, if we view each square face of the cube as being subdivided into four smaller squares, each smaller square is cut diagonally by a single arrangement plane.

    

For four-dimensional symmetric linear threshold functions, something similar happens. A four-dimensional hypercube has eight cubical facets that fit together four to a vertex and three to an edge. Seven of these cubes are shown below; the eighth connects to the eight outer square faces of the complex formed by these seven.

If we normalize the vectors corresponding to four-dimensional weight systems to the surface of the hypercube, the partition of the weight vectors by what function they define corresponds to a partition of the hypercube surface as sliced by hyperplanes perpendicularly bisecting pairs of opposite hypercube vertices. Each hyperplane cuts the hypercube surface in a regular octahedron, and these octahedra crisscross each other within each cubical facet of the hypercube to form a stella octangula, a three-dimensional star formed by two interpenetrating tetrahedra. Even more simply, if we view each cubical facet of the hypercube as being subdivided into eight smaller cubes, each smaller cube is cut by four planes into a regular tetrahedron surrounded by four irregular tetrahedra. The regular tetrahedra form 64 cells of the overall arrangement, while the irregular tetrahedra join up to form 8 octahedra (at the center of each hypercube facet) and 32 bipyramids (surrounding each hypercube edge), so the arrangement has 64+8+32=104 cells corresponding to the 104 different four-dimensional symmetric linear threshold functions.

    

Curiously, if one takes this hyperplane arrangement, and adds to it the hyperplanes that subdivide each cubical facets into eight smaller cubes (as shown right, above, for a single cubical facet of the hypercube) the result is a simplicial arrangement. It has eight planes for the eight pairs of opposite hypercube vertices, and four more planes for the subdivision into smaller cubes. As a central arrangement in four dimensions it has 64+8*8+32*2=192 tetrahedral-cone cells, while as a simplicial arrangement in three-dimensional projective space it has half that, 96 tetrahedra.

What happens when we try to generalize all this to higher dimensions? I don't know.

Tags: arrangements, combinatorics, hypercube, stella octangula, threshold function
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded  

  • 6 comments