Ideally, Earth would find out the real disk space usage of each file in addition to its file size.
Depending on the file system, the disk space occupied by a file can differ substantially from its actual size. For instance, a single-byte file (file size 1) occupies at least one block on many file systems, which - depending on block size - can translate to several kilobytes of real disk usage.
It might be worthwhile investigating the source code of the "du" tool from the GNU core utility distribution to find a good way of doing this. For example, "echo x > /tmp/foo; du -sh /tmp/foo" reports 4K on my workstation. (http://www.gnu.org/software/coreutils/coreutils.html)