Gell-Mann Amnesia Effect

The Gell-Mann Amnesia Effect was coined by the late Michael Crichton in a talk entitled Why Speculate, given to the International Leadership Forum, La Jolla, in 2002. Below is an excerpt from that talk:

Media carries with it a credibility that is totally undeserved. You have all experienced this, in what I call the Murray Gell-Mann Amnesia effect. (I call it by this name because I once discussed it with Murray Gell-Mann, and by dropping a famous name I imply greater importance to myself, and to the effect, than it would otherwise have.)

Briefly stated, the Gell-Mann Amnesia effect works as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward-reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.

In any case, you read with exasperation or amusement the multiple errors in a story-and then turn the page to national or international affairs, and read with renewed interest as if the rest of the newspaper was somehow more accurate about far-off Palestine than it was about the story you just read. You turn the page, and forget what you know.

That is the Gell-Mann Amnesia effect. I'd point out it does not operate in other arenas of life. In ordinary life, if somebody consistently exaggerates or lies to you, you soon discount everything they say. In court, there is the legal doctrine of falsus in uno, falsus in omnibus, which means untruthful in one part, untruthful in all.

I think this has become worse in recent years. Much of the mainstream press and TV news seems to dwell in the realm of speculation, more than dry, objective reportage. The important lesson is, frankly, to doubt everything you read in the news unless you have reason to trust the source. This is exhausting, and makes the whole business of actually reading "speculative" news reporting somewhat pointless.

As Crichton said, introducing the transcript of the talk on his website:

In recent years, media has increasingly turned away from reporting what has happened to focus on speculation about what may happen in the future. Paying attention to modern media is thus a waste of time.

In recent months I have successfully weaned myself off daily news consumption. I pick up bits and pieces, here and there, but I no longer intentionally go to news sources. At the weekend, I catch up with digests from a few, trusted sources. I do not think this has significantly impaired my awareness of current affairs, while it has certainly saved me from wasting a lot of time!

Folder Indexes With Obsidian and Dataview

Using the excellent Dataview Obsidian plugin, inserting this snippet (below) into the note will create a table listing:

  • all notes in the same folder as the note, and in all sub-folders, recursively
  • all notes which link to the note
  • all notes linked to by the note

Particularly when used in a "folder note" (a note which serves as the key note in any given folder), this is a simple way to create a kind of "section index" for that part of the folder hierarchy.

```dataview
TABLE rows.file.link AS Pages
WHERE
	contains(file.folder, this.file.folder)
	OR contains(file.inlinks, this.file.link)
	OR contains(file.outlinks, this.file.link)
	AND file != this.file
GROUP BY file.folder AS Folder
```

Open and Engaged Conference 2023

I attended the British Library's annual Open and Engaged conference on 2023-10-30, held in their conference centre in St Pancras, London. At the time, the British Library had just discovered that they had been subjected to a cyber attack (this is ongoing at the time of writing). Despite the ensuing disruption, with BL staff being unable to access their email or documents, and with the BL's internet access being offline, the staff there managed the remarkable feat of hosting the event with little evidence of the chaos in the background. I found the day interesting, and made the following notes from the various speakers' presentations.

Keynote from Monica Westin

Monica (Internet Archive) gave an entertaining talk on new ownership models for cultural heritage institutions. From this I learned about two interesting initiatives:

Internet Archive Scholar

This was conceived as an archiving solution, but then evolved to become a search service.

This fulltext search index includes over 25 million research articles and other scholarly documents preserved in the Internet Archive. The collection spans from digitized copies of eighteenth century journals through the latest Open Access conference proceedings and pre-prints crawled from the World Wide Web. https://scholar.archive.org

OJS - Beacon

This is a function in the open-source OJS platform to report usage statistics back to a central collection, to aid in ongoing product design and marketing.

GLOBAL USAGE OF OJS More than 8 million items have been published with Open Journal Systems, our open-source publishing software trusted by more than a million scholars in almost every country on the planet. Global Usage of OJS - Public Knowledge Project

Mia Ridge, Living with Machines

Mia (British Library) described the Living with Machines project.

Living with Machines is both a research project, and a bold proposal for a new research paradigm. In this ground-breaking partnership between The Alan Turing Institute, the British Library, and the Universities of Cambridge, East Anglia, Exeter, and London (QMUL, King’s College), historians, data scientists, geographers, computational linguists, and curators have been brought together to examine the human impact of industrial revolution. https://livingwithmachines.ac.uk/about/

Douglas McCarthy, This is not IP I'm familiar with

Douglas (Delft University of Technology) talked about "the strange afterlife and untapped potential of public domain content in GLAM institutions". This was an excellent talk on different perceptions of copyright in the GLAM sector. There was a startling contrast between UK and non-UK institutions in their respective treatment of Digital surrogates of The Rake’s Progress, with UK institutions largely avoiding using public Domain licensing and claiming copyright instead. True to his topic, Douglas has made his slides available here and they are well worth viewing. I particularly liked his use of the "Drake meme" and have been using this in my own work - mostly recently presenting to a workshop in Nigeria.

Emma Karoune, The Turing Way

Emma (The Alan Turing Institute) spoke about Community-led Resources for Open Research and Data Science. Her team has assembled a rich set of resources to support data-science communities.

The Turing Way project is open source, open collaboration, and community-driven. We involve and support a diverse community of contributors to make data science accessible, comprehensible and effective for everyone. Our goal is to provide all the information that researchers and data scientists in academia, industry and the public sector need to ensure that the projects they work on are easy to reproduce and reuse. Welcome — The Turing Way

I have made a note to examine more closely the Community Handbook they have produced - not only for its content, but also for the way in which they have produced it.

Iryna Kuchma, Collective action for driving open science agenda in Africa and Europe

Iryna (EIFL) spoke remotely on this LIBSENSE initiative.

EIFL, WACREN and AJOL will collaborate on a new three-year project to support no-fee open access (OA) publishing in Africa (diamond OA) that launches in November 2023 to empower African diamond OA community of practice and offer cost-efficient, open, public, shared publishing infrastructures. https://libsense.ren.africa/wp-content/uploads/2023/08/LIBSENSE-Collaboration-for-sustainable-open-access-publishing-in-Africa.pptx.pdf

I was interested in this because I have been doing some work with LIBSENSE and am developing an awareness of open-science in Africa more generally.

Presentation on Notify project to the Samvera Virtual Connect meeting

This was a short, relatively technical introduction to the COAR Notify project for the Samvera Virtual Connect meeting.

Download presentation as PDF

Five Prerequisites for a Sustainable Knowledge Commons

COAR Infographic

I very much like this infographic from COAR. I've been working with COAR on the Next Generation Repositories Working Group and we have been gradually building a picture of a technological future for repository systems. As this work has progressed over the last year or so, it has gradually become clear that there is an opportunity to describe a sustainable knowledge commons. While the Next Generation Repository group is gradually assembling a picture of the technical components and protocols which can make this work, this infographic covers some other, non-technical aspects which will also be required.

I recommend taking a look at the document from which I have taken this image - it adds some useful context.

My New Venture

Today is my final day at EDINA. Rather than stepping into a new role in another institution, I'm taking a bit of a leap into the unknown. I have started my own consultancy business, Antleaf, a vehicle which allows me to take on new, challenging and rewarding work.

I'm pleased to say that, through Antleaf, I have a contract to act as the Managing Director of the Dublin Core Metadata Initiative (DCMI), and I'm negotiating a contract with an institute in Japan to help with an exciting development there, so Antleaf seems to be off to a good start!

If you need the help of an information professional with both development and management experience, please do get in touch.

It feels like a fresh start for me, which is always an invigorating - if slightly nerve-wracking feeling!

Leaving EDINA

After four good years, I am moving on from EDINA. My last day there will be the 16th October.

I have very much enjoyed my time at EDINA, which has allowed me to work with some very smart people, on some great services and projects. For many years EDINA has made a valuable contribution to the fabric of teaching, learning and research in universities in the UK and I am grateful for having had the chance to be a part of that, working in such areas as scholarly communications, digital preservation, mobile development and citizen science, metadata management, open-access repositories and more. I'd like to thank my colleagues at EDINA for their enthusiasm and support, and for the free exchange of ideas and knowledge which has been a foundation of the culture there.

I would particularly like to thank Peter Burnhill, founder and recently retired director of EDINA. It was Peter who brought me into EDINA, and I have benefited enormously from being able to test ideas against his wisdom and insight.

EDINA is in the process of finding a new direction, and I wish my colleagues there the very best of luck for the future.

As EDINA transitions, it feels like a good time for me to do likewise. I have decided to start my own consultancy, something I have been considering for some time. I am excited (and maybe a little nervous!) about it, but this is already shaping up to give me the freedom to do worthwhile and interesting work. More on this in another post soon!

Melissa Terras keynote, BL Labs Symposium, 2016

These are some rough notes from what I thought was an interesting keynote from Melissa Terras, Director of the UCL Centre for Digital Humanities, at this year's BL Labs Symposium.

Melissa has a blog: Adventures in Digital Cultural Heritage and a recommended book: Defining Digital Humanities

Melissa started by asserting that reuse of digital cultural heritage data is still rare, and that preservation of such data is problematic. Of the content digitised in the National Lottery Fund's New Opportunities programme around the turn of the Millennium, ~60% of the content digitised then is no longer available now.

However, referred to collectively under the unofficial label #openglam, a number of changes have converged to give hope that the situation may be improving:

  • funders now frequently mandate that research data will be made available for long periods - up to 10 years.
  • licensing is greatly simplified with the growing adoption of the Creative Commons
  • technical frameworks, which address the challenge of making such data available for others to use, are becoming available
  • projects are more willing to address these issues

Melissa then went on to describe how UCL has been working with the British Library's archive of digitised 19th century books. These books, numbering 65,000 were digitised by Microsoft and then handed back to the BL in 2012 under a CC0 license.

The data generated by the digitisation of these books, and the subsequent OCR output, comprises about 224GB of text data in ALTO XML format. This is too much data to make available over the network - and it is this fact which creates the need for better infrastructure services to allow researchers to work with the data.

The UCL Centre for Digital Humanities engages with science faculties as well as humanities faculties. Any member of UCL staff can access what is effectively 'unlimited' local compute power. What has become apparent is that this local infrastructure is typically optimised for science with the following characteristics:

  • one large dataset
  • one or two complex queries
  • single output (the answer), often a visualisation

whereas the requirements for a researcher wanting to work with the digitised books data are more like this:

  • to work with 65,000 individual datasets
  • to make one simple query
  • to generate multiple outputs - e.g. hundreds of pages, which the researcher will take away and processes further

UCL has therefore been designing computational platforms which allow users to filter the 65,000 books and find, for example, 300 books about some subject and then to download this data to process on a laptop. This project has also been good for computer science students who have been invited to design platforms to solve these kinds of problems.

Melissa suggested that there was a small number of very common query 'types':

  • searches for all variants of a word
  • searches that return keywords in context traced over time
  • NOT searches for a word or phrase that ignores another word or phrase
  • searches for a word when in close proximity to a second word
  • searches based on image metadata

... all returned in a derived dataset, in context

Melissa proposes that these would give 90% of what those people researching a collection like the BL 19C digitised books would want. Furthermore, librarians are quite capable of applying these basic recipes as a service for researchers, and they can build on these to offer more sophisticated searches.

Melissa identified the following best practices:

  1. support derived datasets - people want to take subset of the data away to process further
  2. document decisions - researchers need to know about the dataset - the decisions about how it was generated, provenance, how their query is working etc.
  3. offer fixed/defined datasets (has the data changed since the query was run?)
  4. support normalisations (e.g. if you find more mentions of your query term in later books, it might be because there are more books in the collection from that year)

Cooperative Open Access eXchange (COAX)

(This is the second of two posts forming my contribution to Open Access Week 2015.)

The following proposal was written as a thought-experiment to test, in a recognisable problem-space, the idea outlined in my previous post, The Active Repository Pattern. I was able to call on the the advice of colleagues at EDINA who have world-class expertise in the area of 'routing' open-access metadata and content.

The United Kingdom Council of Research Repositories (UKCoRR) was invited to comment, and many members of that organisation provided some very valuable feedback, for which I am very grateful! It is fair to say that this idea has generated a degree of interest from that community.

This is the first public airing of the proposal, with the UKCoRR feedback incorporated.

Cooperative Open Access eXchange (COAX) - a proposal

Elevator pitch

According to estimation by colleagues at EDINA, based on data gathered by the Repository Junction Broker (also formerly known as the Jisc Publications Router), the mean average number of authors for a scholarly article is 6.3. This means that, when the corresponding author does the right thing and deposits the manuscript in their local institutional repository there are, on average and potentially, 5 other repositories which might be interested in getting a copy of that manuscript. What if institutional repositories could participate in a cooperative mechanism, sharing manuscripts between them according to need?

The primary ‘user-story’

As a repository manager I would like to receive, or at least be notified of, a paper deposited in another institutional repository where that paper has one or more authors affiliated to my institution, so that I can more easily acquire papers of relevance to my repository (and in so doing satisfy my funder's requirements)

Basic assumptions

  1. A journal article is often written by more than one author, from more than one institution: that should be the default use case for repositories.
  2. The author(s)’ accepted manuscript (AAM) is held by the publisher and by the corresponding-author. It may also be held by another of the listed authors.
  3. Any one of the listed authors may deposit the AAM in their respective institutional repository.
  4. There are internationally-accepted standard identifiers for many of the objects and parties within the field of scholarly communication, such as DOI, ISSN, ISNI (ORCID) etc. Although these are not always present in the metadata, it is desirable that they should be.
  5. The primary user-story is best served by an ‘event-driven’ approach, where repositories send immediate notification that they have had a new paper deposited into them, rather than by a periodic ‘harvesting’ approach, where some system enquires of all repositories about recent deposits.
  6. Wide-spread use of a single, well-defined metadata application profile will mitigate some aspects of this problem space. For example, RIOXX, which is in the process of being implemented across UK institutional repositories, and which is gaining interest beyond the UK, and OpenAIRE which is already established in parts of Europe.

Working principles and constraints

  1. This document is concerned with a narrow, focussed problem-space. This is essentially defined in the primary user-story, above. Reference is made to aspects of this which have wider relevance or application, but these are not explored in detail here.
  2. The document attempts to articulate and design the Simplest Solution that could Possibly Work (outlined later in The Minimum Viable Product section). The approach leans towards a light-weight solution, with minimal centralised infrastructure.
  3. Initially, other related solutions, projects or initiatives are ignored until the problem and a possible, logical approach to solving it, have been clearly articulated.

Terminology and component parts

The following actors, assets and events appear in the use-cases described below - although not all in every use-case. The bold, short name in square brackets is the label used in the descriptions and diagrams for each use-case.

  • institutional repository, which may be one of:
    • the repository which already has the paper [Origin IR]
    • the repository which may be interested in receiving notification about and (perhaps) a copy of the paper [Target IR]
  • institutional repository manager [IRM]
  • manuscript, usually a PDF [MS]
  • metadata record describing the MS [Record]
  • system for handling ‘notifications’ to/from repositories [Hub]
  • system for matching manuscripts to appropriate repositories [Matcher]
  • database of institutional repositories with contact details [IR DB]
  • database of metadata records with contact details [Record DB]
  • a text-mining function for extracting metadata from manuscripts [Text-miner]

More detailed descriptions of some of these components can be found in a later section.

The Minimum Viable Product (use-case 1)

Use-case 1 describes the the ‘Minimum Viable Product’ (MVP) which can support the primary user-story.

This MVP establishes some components and data-flows. These not only set the ground for a minimal system which can begin to satisfy the primary user-story, but also provide the foundations for future development and improvement.

Description

When an Origin IR has an MS (together with some metadata) deposited [1] into it, the Origin IR automatically notifies [2] the Hub, sending it the metadata Record and the MS. The Hub calls [3] the Matcher, which invokes [4] the Text-miner which mines the MS for extra metadata. The Matcher then enhances the Record and stores [5] it in the Record DB before identifying which (if any) IRM(s) to notify. The Matcher then instructs [6] the Hub to send [7] an email notification, containing the Record, to the IRM(s) it believes might be interested in that Record.

Notes
  • the Matcher references the Record DB which records what has been sent and where, so that it knows not to send the same Record more than once to a given IRM
  • the MS itself is not transmitted, but the Record should contain the URL from which it can be directly downloaded by the interested IRM
  • the MS is not kept or made available to Target IRs under any circumstances.
Challenges

The primary challenge to be overcome is in deciding which Target IRs might be interested in a given MS. There is more than one possible approach to ascertaining this, and the COAX strategy is to employ several such approaches, including:

  • assertion in metadata records coming from Origin IRs, where other institutions are directly referenced
  • text-mining manuscripts for references to institutions
  • finding indirect associations with institutions through author affiliations
    • exploiting ORCIDs
    • extrapolation based on comparing records in bulk (e.g. these authors often write together)

This use-case depends upon the possibility of extracting some usable metadata from the manuscript itself. However, there is no substantial extra work or cost for the Origin IR to despatch the manuscript along with the metadata record, so we include this in the MVP.

It is hoped that widespread adoption of ORCIDs may create opportunities for matching authors to institutions (and hence repositories). However, it is not yet certain that the use of ORCIDs will become ubiquitous (not to say reliable) and, in any case, it will be some time before such levels of mainstream use can be achieved. Moreover, as the ORCID system matures, it will bring with it a growing issue of ‘false positives’ in this use-case, as each author’s number of affiliated institutions becomes potentially larger.

Another significant challenge is to avoid sending unnecessary notifications to Target IRs. If a manuscript is deposited in more than one participating Origin IR, then it would be beneficial if the system could recognise that it has already received information and sent notification about this manuscript, and avoid sending notification again. Such matching will be inexact, since most metadata records associated with a deposited AAM do not, at the time of deposit, contain a global identifier (typically a DOI). However, there may be strategies which can be used to match records to an acceptable level of accuracy without such global identifiers in place.

Other use-cases

These other use-cases are offered for comparison.

Use-case 2: simple, email notification triggered by initial deposit with manuscript as attachment

This use-case builds on use-case 1, with an additional step of fetching the MS from the originating IR and adding it to the email as an attachment for convenience.

Description

When an Origin IR has an MS (together with some metadata) deposited [1] into it, the Origin IR automatically notifies [2] the Hub, sending it the metadata Record and the MS. The Hub calls [3] the Matcher, which invokes [4] the Text-miner which mines the MS for extra metadata. The Matcher then enhances the Record and stores [5] it in the Record DB before identifying which (if any) IRM(s) to notify. The Matcher then instructs [6] the Hub to send [7] an email notification, containing the Record and the MS (as email attachment), to those IRM(s) it believes might be interested in that Record.

Notes on use-case 2
  • builds on use-case 1, with an additional step of fetching and sending the manuscript.
Challenges

AAMs are not always simple PDF - they can be in various formats, and be comprised of multiple files. Furthermore, AAMs may be under embargo and therefore not openly available on the Web. This means that the automated retrieval of AAMs is a significant challenge. Use-case 1 avoids this problem by relying on repository managers to fetch the AAM manually (requesting it by email if necessary).

Manuscripts held in repositories may be under embargo. The system would need to take account of this such that it did not disseminate the manuscript to other repositories in violation of any embargo agreement with the publisher. The MVP (use-case 1) avoids this issue by never disseminating the manuscript to Target IRs.

Use case 3: automated fetch and deposit of manuscript and metadata

Description

When an Origin IR has an MS (together with some metadata) deposited [1] into it,, the Origin IR automatically notifies [2] the Hub, sending it the metadata Record and the MS. The Hub calls [3] the Matcher, which invokes [4] the Text-miner which mines the MS for extra metadata. The Matcher then enhances the Record and stores [5] it in the Record DB before identifying which (if any) IRM(s) to notify. The Matcher then instructs [6] the Hub to deposit [7] into the Target IR(s) a SWORD package, containing the Record and the MS.

Notes on use-case 3
  • this use-case introduces a more sophisticated level of machine-to-machine interoperability, where the Target IR is able and willing to accept metadata and content deposited directly into it from a trusted source, without requiring the immediate intervention by the IRM
Challenges

(as in use-case 2)

Descriptions of some of the main components

Matcher

The Matcher is a service which:

  • matches different metadata records (coming from different repositories) describing the same resource
  • extracts authors from metadata records and identifies their repository affiliations

The matcher operates with approximations of a ‘match’, observing pre-defined thresholds of ‘confidence’ before instructing the Hub to send notifications.

Hub

The Hub is envisaged as an industry-standard message handling system, capable of receiving ‘messages’ from registered systems and relaying these, asynchronously, to other registered systems or processes according to a defined set of rules.

Next steps

I hope that this has been sufficient to stimulate some thinking about how an Active Repository Pattern might allow us to accelerate open-access to all publicly-funded research. Please do comment below!

The Active Repository Pattern

(This is the first of two posts forming my contribution to Open Access Week 2015.)

Context

Institutional repositories

It is easy to overlook, or take for granted, the way in which the drive towards open-access (over the last decade or more) has succeeded not only in creating several viable "institutional-repository" software packages, but also in encouraging libraries and IT departments in universities to deploy them. It should be recognised that individual universities have shown, and continue to show commitment to maintaining their repositories in spite of shrinking budgets.

While these repository systems are various, they mostly adhere to certain standard protocols, common metadata formats and conventions, allowing for a degree of potential interoperability. It is this potential for interoperation which elevates the institutional repository from a local system, to a networked system.

This achievement should be celebrated!

Repositories as infrastructure

Since institutional repositories have generally been developed with a degree of interoperability, we can consider their potential role in a wider infrastructure. Currently, the interoperability of institutional repositories is most clearly realised in the way in which they expose metadata records and (sometimes) content in a standard way, so that this information can be 'harvested' by an external process. The use of a standard protocol, OAI-PMH as well as standardised metadata-profiles such as OpenAIRE and RIOXX, allow institutional repositories to be first-class components in a distributed infrastructure. Since institutional repositories are where open-access metadata and content is created and managed, their role in this distributed infrastructure is both vital and fundamental. And because institutional repositories are controlled by their host institutions, they are collectively less vulnerable to political or business decisions made by any single organisation.

Centrally-provided services supporting open-access

In parallel with the rise of the institutional repository, there has been significant investment in centralised services which support open-access by interacting with institutional repositories. There can be valid technical reasons for providing such services from a centralised platform. For example, it has for some time been generally accepted that in order to search for open-access papers, the metadata records of institutional repositories first need to be harvested and aggregated into one database, which can then serve a centralised search-portal or similar online service. However, searching is not the only way in which open-access papers can be discovered (as is discussed later).

Many such services have operated at a national or regional level, as they have been paid for with public money. This creates a paradox: academic research (and therefore open-access to scholarly publications) is not an activity which is comfortably bounded by geo-politics. While services created in this way are often deployed openly, allowing global use (for example Sherpa RoMEO), such global access is vulnerable to being withdrawn, since the service provider bears no commitment to users beyond its own context.

An alternative to nationally-funded services are those provided by private corporations. These can, on the face of it, appear more sustainable: after all, if there is a profit to be made (even indirectly) that is increased by the provision of such services then support and investment is likely to continue. Of course this comes with its own risks, not least of which is that the corporations most likely to develop and support such services might be ambivalent about the goal of ubiquitous, global open access.

So, while the centralised provision of services, whether publicly or privately financed, might prove to be effective in some circumstances, it incurs the risk of dependance on a single organisation, the service-provider. Moreover, in a de-centralised infrastructure based heavily on the presence of institutional repositories, this centralised model of service-provision might not be the best fit, architecturally.

Institutional repositories as active participants

One curious side-effect of the architecture of infrastructure that has evolved to support open-access is that institutional repositories currently play a largely passive role. Essentially, institutional repositories act as databases of metadata and papers, and are not even especially Web-friendly.

This need not be the case. Other approaches to distributed online infrastructure have started to mature in recent years. In particular, strategies which depend on active notification are increasingly interesting in this space. We can conceive of repositories as active components in an open-access infrastructure, rather than passive data-silos. With a modest amount of development (in many cases the deployment of a 'plugin' or similar would be sufficient) institutional repositories could become systems which actively send notifications triggered by events such as, for example, the addition or modification of a metadata record or paper. Standard protocols (for example PubSubHubbub) to send notifications are already in mainstream use. And when it comes to conveying the detail of the repository event, mechanisms more sophisticated than OAI-PMH exists already: indeed ResourceSync would serve as a successor to OAI-PMH in this respect.

Open-access infrastructure would likely retain the need for some centralised services. Even many types of peer-to-peer systems retain the need for a central directory of participating peers for example. However, the idea is to reduce the dependance on central services, by moving more of the responsibility (and therefore functionality) out to the distributed institutional repositories. The centralised services required to support an infrastructure of distributed, active repository components could be modest, inexpensive and easily replaced.

Peer-to-peer systems work when the peers have a vested interest in participating, and when enough of them are sustained. Our institutional repositories fit this model. Increasingly, higher education institutions are committed to providing open-access to the scholarly. They also have a vested interest in gathering papers papers authored by each of their researchers, even when that researcher was not the lead author. This means that institutional repositories have an incentive to actively share papers, rather than simply making available what they already have. A distributed, peer-to-peer architecture of events and notifications would serve this purpose well.

I believe we need to do more to exploit the latent value in our institutional repositories. From the point of view of the network, they can and need to be much more than passive databases, and with a very modest technical investment they can start to be active components in the global infrastructure.

I call this the Active Repository Pattern. In my next post, Cooperative Open Access eXchange (COAX) I offer a proposal setting out an approach to this in more concrete terms.

Please feel free to leave any thoughts or comments below!

more posts (archive)

Recent Mastodon Posts (mastodon.social/@paulwalk)

If you use Obsidian and you want to create an “index note“ for a folder, listing all of the notes in that folder and any sub-folders, then this is one way to do it that doesn’t need an extra plugin (it uses the DataView plugin but all Obsidian users have that one installed anyway, right…?)

paulwalk.net/2023/folder-index

“…the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well […] You read the
article and see the journalist has absolutely no understanding of either the facts or the issues […] you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate […] than the baloney you just read.” geer.tinho.net/crichton.why.sp

This seems like science done right:

“This preprint is special because it does not contain any data*: it is a pre-registration of a study. […] submitting these plans for peer review, both formal (through PCI RR) and informal (everyone is invited to comment at PubPeer). […] So, not only does this approach helps achieve a sound methodology before the experiments starts, it also helps to solve the problem of publishing bias where “negative results” don’t get published…” raphazlab.wordpress.com/2023/1

Card image from mastodon.social toot 499146

Our First Pre-Registration is Live! Replication of…

After months of efforts, my co-authors and I are absolutely delighted to share this preprint, which is special in many ways: Said, Maha, Mustafa Gharib, Samia Zrig, and Raphaël Lévy. 2023. “Replica…

This is such a well-written piece (jenniferplusplus.com/losing-th) from @jenniferplusplus
I’ve already experimented with LLMs to produce boilerplate software code (Golang) and, used judiciously, it has worked well. But there's a price - even for an experienced developer - which Jennifer outlines so clearly here. For a junior developer the trade-off is worse, meaning that the negative impact on the whole endeavour of making good software will worsen over time. If you build software, read this now.

Card image from mastodon.social toot 499146

Losing the imitation game

AI cannot develop software for you, but that's not going to stop people from trying to make it happen anyway. And that is going to turn all of the easy software development problems into hard problems.