One of the largest obstacles to creating effective data-mining software is the scarcity of good data sets. If good data sets were easily available to people on the web, I think it's likely that more people would come up with innovative data mining applications. Some data sets that I would like to see:
I can imagine several practical problems with such a data source. First is copyright; assuring that the site had a copyright for so much data would be a daunting task. Perhaps the responsibility for the copyright could be held by the submitter?
The format of the data would also be an interesting problem. To make the data worthwhile, it would likely have to be constrained to some known subset of well-documented data formats. It would require a fairly large effort to convert existing data to an acceptable format, and verify that the data is in the correct format.
Documenting the meaning of the data contained in the files would also be a daunting challenge, requiring a fairly large effort. If the repository contained a lot of data, but nobody knew what it meant, it would be worhless.
Finally, the site's success would be a part of its problem. Transferring large data sets over the internet would create a hell of a bandwidth bill. Finding a way to deal with this bill - through donations, advertising, or perhaps a fee for use of the site - would be crucial to its success.
The value of such a website is, in my mind, undeniable. If you could build a community of users around it, I believe that novel applications of data-mining techniques would inevitably arise. While administrative problems would be significant, the success of such large open sites as Sourceforge and Wikipedia leads me to believe that it's a conceivable project.