Though the historical evolution of the diatonic scale is long and probably not perfectly understood, the scale itself has some properties that make it easy to derive objectively as a very useful scale.

There is a property of some scales called "Myhill's Property" (

http://en.wikipedia.org/wiki/Myhill's_property). Scales with this property, are perceptually very friendly, they are very easy to follow, think about, and sing. The reason for this is they have only two step sizes for any given "generic interval". A generic interval is one like "second" or "third" that merely describes how many steps are spanned within a scale, not the exact size. The diatonic scale has Myhill's Property because there are two types of every generic interval: there are major and minor seconds, major and minor thirds, etc. (Fourths and fifths have different names like perfect and augmented or diminished, but still two sizes in the diatonic scale.)

The only way to arrive at scales with Myhill's property is using two intervals, a "generator" and a "period". One takes the generator and stacks it on top of itself, taking each newly arrived note in the stack of generators and putting it in each period.

For the (let's say, C major) diatonic scale, the period is the octave (an approximate frequency ratio of 2:1) and the generator is a fifth (or an approximate frequency ratios of 3:2.) Stacking the fifth of seven times arrives at these notes:

F-C-G-D-A-E-B

But in order for these notes to make sense, we must reduce each one back into the original octave. Doing so arrives at this sequence:

F-G-A-B-C-D-E (or any order there-of.)

This combination of notes has Myhill's Property as is described above, which makes it perceptually friendly. It is also very simple, that is, it has relatively few notes before it repeats, which is perceptually helpful. We usually need somewhere between 5 and 10 notes in our scales for them to seem functional.

Finally, it also has very low "error", or difference between the frequency ratios we say its intervals represent. That is, we say from C to E is an approximate 5/4 (which is actually 386 cents*), we say from C to Eb is an approximate 6/5 (which is actually 314 cents), and we say that from C to G is an approximate 3/2 (which is actually 702 cents). Though the diatonic scale never has all of those intervals approximated perfectly, it gets pretty close, and so is low error.

http://en.wikipedia.org/wiki/Cent_(music)

(How we decide on what frequency ratios we are going to say each interval represents has to do with what's called the "mapping", but that's a little much to get into now. Better I leave you with this and ask if I haven't explained anything well.)

The chromatic scale can ALSO be reached by stacking fifths on top of one another, and it ALSO has Myhill's Property. (The chromatic scale actually has TWO sizes of intervals, the minor second AND the augmented unison. The fact that they are the same size has to do with our use of 12-TET or 12-edo) It also is just as *accurate* as the diatonic scale. The one thing it doesn't have that the diatonic scale has is *simplicity*. Once you get to 12 notes in an octave, it is much harder to follow along. So the diatonic scale tends to work as the fundamental scale, while it operates within a (more complicated) chromatic sort of gamut.

Does any of that make sense?