Motivation: Wading through and addressing the confusion of scholarly communication semantics.
Problem statement: Different names, same shtick. There are too many concepts regarding scholarly communicating that are practically the same but call themselves differently. Or worse, not the same but still clumped together with concepts it is related to. To me, this causes unnecessary vagueness and confusion. However, it is of the utmost importance that scholarly communication uses clear wording, so this must be addressed.
Findings: In the scientific community, there seem to be real differences in definitions even with terms that are defined the same in the official dictionaries, such as between archive and repository. Yet, they may not always be known to all of those involved with science, so sometimes they would simply use one term (repository) for it, even though that term might not be as correct in some cases.
Conclusion: There is still plenty of semantical unclarity regarding the archiving of research works.
“Semantics? Here we go again!”
Over at The RePEc blog, Christian Zimmermann blogged about the Webometrics Ranking of World Repositories. And the list is a nice reference, but it does not seem to differentiate between those ranked in terms of representing the concept of “repository”. For example: RePEc calls itself a decentralized database. Is that the same as an institutional repository? I doubt it. It certainly is not a central repository, like arXiv, which calls itself an (e-print) archive by the way. RePEc looks and sounds more like a collection of repositories under one name. Classifying it as a repository is therefore incorrect, just like how calling a house a room is incorrect.
With that out of the way, let us check out the others as well. E-LIS, the #2, also calls itself an (OA) archive. The #4 in the rankings, University of Southampton’s e-Prints Soton, does call itself a (research) repository, so points for consistency! The #5 on the list, Academic Archive On-line, obviously calls itself an (academic) archive. #6 on the list, Citeseer, calls itself a library and a search engine. A search engine is definitely not a repository, but what about a library? I can see how a library can be a repository, but is a repository a library? #7 on the list is Ecole polytechnique fédérale de Lausanne, and I do not know what they call their Infoscience. They do not seem to have problems with their place on this repository listing, though. SSRN, #37 on the list, calls itself a (research) network and a (e)library. Skipping a bit to #18, the Arts and Humanities Data Service, they call their product a “Data Service”. Moving down to #26, Ohio State University, they call their product a “Knowledge Bank”, which uses DSpace (and Dspace refers to itself as an open-source platform for accessing, managing, and preserving scholarly works).
Now, a repository and an archive are, according to my English dictionary, defined as the same things, so archives are repositories and vice versa. But as they are the same thing, why would they prefer one to the other? A top repository ranking where the majority of the listed ones do not refer to themselves as repositories (I counted 9 on that list with “repository” in their name) is a bit awkward if you ask me. And I wonder whether the more significant differences in naming, such as “library”, “database”, “knowledge bank”, “service” and “network” also reflect similar differences in what they offer in terms of functionalities and data?
Thing is, I read the paper “Developing a model for e-prints and open access journal content in UK further and higher education” by Alma Swan and Paul Needham and here is an excerpt:

Although often used interchangeably, we use the term ‘institutional archive’ here in preference to ‘institutional repository’. This is in part because the term ‘archive’ is used in many official names (e.g. Institutional Archives Registry, Open Archives Initiative) and in part because it reflects an activity (authors ‘self-archive’ their work – they do not ‘self-reposit’). Most importantly, though, the use of the term repository is now generally coming to denote something more than an e-print archive; rather, an institutional collection of material that contains far more than e-prints, such as grey literature, institutional-specific digital collections and so on. Since the remit of our study was to develop a model for the delivery and management of e-print and open access journal content only, the term institutional archive is the most accurate and appropriate.

I suppose this reasoning explains the preference for the term “archive” rather than “repository” for arXiv and co. Perhaps this also explains why a ranking of “repositories” is a safer bet (covers more concepts) rather than using rankings for “archives”. However, this brings me back to the point: is this difference between the term “repository” and “archive” known to those ranked? And is that the reason why some of them choose to refer to themselves as “archive” as opposed to “repository”? Hmmmm, more food for thought, I guess. Maybe I should try to e-mail them. I would certainly like to know, because at least then I do not have to be confused about which term to use.

