Bits of Learning

Learning sometimes happens in big jumps, but mostly in little tiny steps. I share my baby steps of learning here, mostly on topics around programming, programming languages, software engineering, and computing in general. But occasionally, even on other disciplines of engineering or even science. I mostly learn through examples and doing. And this place is a logbook of my experiences in learning something. You may find several things interesting here: little cute snippets of (hopefully useful) code, a bit of backing theory, and a lot of gyan on how learning can be so much fun.

Thursday, April 25, 2013

A Really Open Research Publication Process

Follows a fabricated discussion upon an idea of an alternative research publication system that could bring about several positive changes with respect to how it is done currently:
  • It could drastically reduce the delay between a research and its reporting.
  • It could significantly bring down the barrier to publication of research in the name of quality which is a very subjective and qualitative parameter as it is measured today.
  • It will open up the possibility of significantly increasing the activity in authoring, reading, reviewing and ranking research work.
The below writeup as an approximate transcript of how I was trying to work out the idea in my own mind. So, another side intention of this is to present a new way of reporting research, in the form of dialogues, which doesn't merely presents an idea after it has already taken shape, but also hints towards how it evolved in the head of the researcher.


The Present System

How is research publication done today?
A researcher writes an article for a specific conference or a journal. The article is reviewed by a group of programme committee members, mostly anonymous, who give their votes, along with review comments, on whether the article should be considered for presentation and publication in conference proceedings (or journal). Depending on that, the fate of the article is decided.

What happens if the article gets published?
A research article getting published is a matter of prestige for a researcher. It is a documented proof of the fact that the researcher has made a contribution to the existing body of knowledge that human race has.

What happens if the article doesn't get published?
It's a matter of disappointment for the researcher. But it's not the end of the world. The article rejected from a conference or journal can be improved and resubmitted elsewhere for publication. The comments provided by the peer reviewers are supposed to provide valuable constructive inputs to the researcher to improve his paper.

What are the advantages of this method?
Many. As mentioned above, the very event of publication is a matter of prestige. The fact can be cited in the researcher's resume to assert the point that he has made contributions in the field of knowledge.

Is that all?
Of course not! Publication in a well-recognised journal or conference (together let's call the 'forum') brings it to the view of the entire world. The invention or discovery of the idea thus gets timestamped, and the inventor's (discoverer's) claim to it is also permanently etched in the human history. So, no one else would be able to come up later to stake claim on the same idea.

Then what happens?
So other researchers can't claim the work to be theirs anymore. But the reported work is read by other researchers and gives them more ideas. They can build upon your work to further the knowledge in that field.

You mentioned the word 'well-recognised' somewhere. What's that?
There are a large number of fora. Some are more famous -- well recognised -- than the others. That's because they have established a track record of having published research articles of high quality over time. Consequently, users of research ideas (for example, researchers and industries) are likely to refer to these fora more than others when they are looking for scholarly work useful for their purpose. Since the work published in these fora get seen by a large and/or high quality audience, researcher throng at their doorsteps to get their work published there. Competition rises. So does quality. The whole thing turns into a virtuous cycle. On the other hand, those fora who have chosen to let crappy stuff pass through their filters often lose credibility and audience. Eventually this helps weeding out waste stuff.

Disadvantages of the Present System

All this sounds so rosy! Then why are we here talking about an alternative model?
That'll be clear if allow to also talk about the negative aspects of the existing publishing model.

By all means, feel free.
OK. One negative thing is the delay between the inception of an idea and its publication. It could be anything between a few months to eternity. But almost never less than a few months.

Anything else?
Yes. And it may be a bit controversial. But all my fellows in research community must have seen this happening. These fora -- conferences and journals -- for obvious reasons tend to get more and more specialised as time passes. Usually papers are written on topics which are very niche. Mostly inaccessible to a large majority of scholars in the world. Often, even though there are millions of scholars in the world, there are almost equally numerous fields of study. Thus, it often happens that some subjects have just a handful of authorities in the entire world.

What's the consequence?
The consequence is that every time you submit a paper, it ends up with one of those handful. Even researchers are humans. And humans have their own biases, favourites, eyesores, rivals.

Are you pointing towards unethical conduct?
Not necessarily. But communities develop their own lingo. And the authority figures of any field often start defining what's acceptable as a valid contribution to that area, how it ought to be presented etc.

What's wrong with that system?
In spirit, nothing. But, I think, much goes missing in the way things get implemented. There's a continuous tussle between the need to evaluate new work through a meticulous formation of credibility structures. The process has become so complex now it's not anymore possible to say if it's working.

Which means you can't even say if it's not working. Then why crib?
If you look at the output values, you will realise that I don't need to defend myself. There's severe inefficiency. If you count the number of good papers that get published, and the number of good stuff that gets done and thought, you won't find much coincidence. More and more number of students enter the profession of research with their eyes fixed on getting high impact publication. The whole thing eventually is no different from our current examination oriented education system. The trick boils down to cracking the system and fairing well in it. The way current research publication process fails in identifying good research is the same as the one in which current education system fails in identifying and nurturing intellectual talent in the society. And going by that comparison, the job this process must be doing in bringing out good research to the world for their fruitful consumption can't be very good!

That's a very roundabout argument.
Well, to tell you the truth, I am drawing from a bunch of my own personal experiences. In my own circles, I have seen a large number of good researchers competing and failing in this process of publication. They end up dejected and depressed. On the other hand, I have seen many people who aren't necessarily all that good as researchers, but have somehow conquered a way of thinking and presenting, which keeps a steady flow of publications coming in. I know, nobody means any harm in all this. But, nevertheless, the process ends up being very unfair and ineffective. Just like our education system.

I still don't see anything unfair. Is there anything wrong happening over here?
There are many researchers who start out very naive. Which is fine. They should pick up ways of working, learning, discovering, inventing and sharing their work as they go. Instead, the pressure of getting their work published puts a similar pressure on their minds which the fear of exams does on many. Often the fear is crippling for a large number of researchers. They aren't necessarily worse researchers. But they fear the prospect of being evaluated and written off. They fear the threat of not being allowed to belong to the community. May be, they have a weakness. But the weakness isn't scientific. It's psychological. The problem should be addressed instead of being dismissed as an inevitability, simply because the talent it is causing the society to lose out upon is sizable.

The Proposed System

OK. Enough of picking faults. How do you think, all this should work?
I feel, there are two things which get mixed up in the given model of research publication. One is reporting your work. And the other is earning credibility for your work. Inability to earn credibility shouldn't deter you from reporting your work. The basic idea of our alternative model of reporting research tries to decouple these two things. We say, reporting must happen immediately at any rate. And the moment you do it, you are in. All threats of ever being disowned, ignored and persecuted by the community are annihilated. Now comes the other part: of earning credibility, of doing something influential. Well, that's important too. But let that happen subsequently. In fact, ideally, you don't need to earn any credibility. It should come to you for your good work. Without your having to work separately for it. You focus on doing good work. Report it as soon as you have it ready to report. And let accolades flow in by themselves.

But how, unless you publish it somewhere where people are looking for good work?
Here's where I feel technology today is ripe enough to play a part. Google takes a bunch of keywords and pulls out anything you are looking for from the Web. I am sure that Google, or whatever engine that runs underneath, is capable enough to tell you what relates to your work. If you are writing an article, you anyway use Google to fish out related papers, don't you?

What's going to change?
Consider the scenario today. You work on something. You write it up in the form of a paper. And you go to these ACM, IEEE journals or all the other equivalent places for other fields. All this while, you are afraid that the hard-disc of your computer will be the grave of your paper if. That fear shapes your approach to the whole thing. You locate the influential people of your field. You adjust the tone of your paper to resemble the lingo these people have tacitly certified for that field. You take care not to say something that challenges their authority. Just novel enough so that it catches their attention. You have to be politically correct. In our so-called scientific world, you end caring too much about the politics of publication.
Now consider this other model. You just write it up. Put it up on the web for anyone willing to read. Now, it's in the interest of the ACMs and IEEEs to find your work out if it's good. And they have all the technologies to do that for them. They search the web. They do data mining. They do machine learning. They do whatever they want. But hell they must do a good job when someone asks them to feature a bunch of relevant articles on the web using some keywords. ACMs and IEEEs don't publish your work anymore. They just recommend.

They close their businesses, right?
I don't know. I don't think so. Recommending good stuff will be a big thing. And then there are ads and promotions. The whole business model will probably change. What will go is this big difference between publishing and not publishing which comes in the way of a researchers primary responsibility to the society: reporting his work.

Today, choosing the right forum to publish your work is a big thing. You get too ambitious, and you get rejections. You get too modest, and your work is tagged for oblivion forever. And each time either happens, you run a little run out of little bit more steam. Remember that in the current model, publication can happen only once. And publishing in a wrong place merely makes sure that your work will never probably be looked at with interest.

What should happen instead is that there ought to be only one publication platform: the Web. And the probability of a work being read by an interested audience shouldn't dwindle with time just because it's first weekend revenue didn't cross a mark. Research works are not feature films. They are potentially for the eternity.

So what's the difference? What happened earlier between publication and non-publication will now happen between credibility and non-credibility.
I agree. This model we propose is in no way a socialistic model swearing by equal wealth etc. And if research wants to be of any value, it must come with some way to measure its quality. However, understand that there's this huge artificially created chasm created by the current system of research publication: the chasm between publication and non-publication. You are either this way or that way. And the fear of falling into the chasm will keep a large number of people from even trying. On the other other hand, we propose to fill up this chasm. You do your best. And your rewards are somewhere in a continuous space depending on the quality of your work, your good luck, influence, whatever. But surely, there's no disgrace associated with trying and failing. There's only one failure in this model: that of not trying. Rest of it is all degree of success. And that, I agree, can vary as wildly as it does in the present day research world.

What gives you the confidence that what you are talking about is realistic and not a figment of your imagination?
I give you a few examples. Consider blogosphere. Today everyone blogs. There's immense crowding out there. But there are some blogs which are popular. And there are those which no one reads. Since there's Google to help us out, searching relevant online articles isn't such big headache, even if it may require some bit of work. Publishing a blog article is such a low entry barrier job: just a push of a button! But getting hits. Of course, that's a very different ball game. And people do play all sorts of silly games there. Links and backlinks. Guest posts. Ads. Promotions. It's again a you scratch me and I scratch you culture. But, you continue to publish, because there's no cost associated with it. And the only game that seems to really win is the game of quality. Put up good stuff, and you get hits.

Another example is of open source software. People write programs in thousands everyday. It's a breeze to upload your code. But not all of them are as popular as Linux or Eclipse or Apache. There's not such a big deal created out of whether you have contributed anything to the open community or not. But making a hugely popular piece of software is a big deal.

Yet another example is from one of the biggest computer scientists of our times: Edger Dijkstra. He wasn't a very keen publisher in the regular sense of the word. But he used to write profusely. And used to distribute his work among his peers. That was his way of reporting his work. All those notes which Dijkstra used to number as EWD #### are now archived in the University of Texas Austin website.

Now there's been only one Dijkstra in the history. And Dijkstra could afford to be unconventional. Doesn't mean everyone can.
I understand. I am not talking about what we can currently afford, but what's the best way to do things. The collection has all sorts of things: papers, letters, little rough notes, rebuttals... Dijktra never distinguished between them, I suppose. But the readers do, I am sure. I would bet my head on that some of the EWDs receive more hits than some others. The distinction is left to the reader, not on a broker. The need of brokers and agents is gradually becoming obsolete even in real estate. We have the Web and automatic recommender systems to do that job for us.

Wrap Up

OK. Can we have a quick summary of this new system of research publication?
Well. I will just sketch it out. You have an idea you wish to share. Well, write it up and put it out on the web. The information retrieval engines on the Web get down to work immediately. They fetch out links on other published stuff that may have a bearing on your work. It's your wish to notice or ignore them. However, it's in your benefit to notice those recommendations. The more you analyse, the better you will be able to place your work in a continuously evolving semantic web of related research.
People turn up at your page. They read it if they want. They rate it if they want. They provide comments if they want. The recommender system again comes to your rescue in sorting these feedback comments in terms of their relevance, importance, quality and potential impact. You work on the feedback comments and update your work to your heart's content. That way, a research paper isn't something done away with. It's an ever evolving working document. Hopefully you acknowledge the contribution of any review comment in any improvement that you are able to incorporate due to them.
Someone in ACM or IEEE may want to feed all this activity happening in the semantic neighbourhood of your article to compute some sort of an impact factor for it. Going forward, someone may decide to feature your article highly based on this impact factor.

Got it. And what, in very brief, would we gain from this change?
More research will get reported. No thoughts goes wasted. No work goes unreported.
The latency between a work and publication will almost disappear. 
More reviews will happen.
The ratings of a scholarly work will depend on automated algorithms, not on the private judgement of a select few.
A research missing an initial wave of popularity will not be doomed to an eternity of oblivion. It may spring back to popularity ages after it was reported if someone gets interested, because searching will be done by machines. And machines don't forget as easily as we humans do.
Just like bloggers, researchers will realise that the only surefire way to popularity is quality. They will direct their efforts into doing good work and writing well, not on knocking at the doors of brokers and agents.
There is plenty of work, good work, happening out there. It is not always backed up by timeliness, influence, alignment with `important' work, or some other technical reason. And it thus loses out. However, immediate publication makes sure it will probably be seen by an interested audience someday. At least the probability isn't ruled out. Our new model aims to give such work their much needed another chance.
Isn't that quite a handful? :)


The above article is itself an example of how I think this new system of research publication should be implemented. I got an idea. I wrote it down in an informal tone. As of now, I stake claim only on this article, not on the idea. If I am fortunate, it may actually turn out to be a novel idea. But now, no one can steal it from me. But, if it happens to be a novel idea indeed, I put my faith on the ever-improving information retrieval technology to make sure that any other claim to coining the same idea will eventually be detected.

The above work is rather raw. I wanted to get it out the moment it occurs. I want to claim it to be my own, but I don't want to depend on keeping it secret for some period while I work on it with the fear that someone else may be faster than me in cashing it out. I believe that if someone else gets inspired by this idea and gets it into a shape before I can, it's perfectly OK. But I also promise that I will keep working on it as and when I have a way to improve it, either through my own ideas, or through those which my readers, or the search technology provide. In the latter case, I pledge to acknowledge the same.

If the above inspires you in any of your work, I invite you to cite it.


Sandeep said...

nice idea. I liked it very much since it emphasize sharing the ideas.

But how is this different than except the fact that you proposed that its the journal and not the author, who should strive to get it printed in the journal .

Sandeep said...
This comment has been removed by the author.
Shipra Agrawal said...

I am not sure if, like blogs, the quality of research papers could always be judged by their popularity. Note that some research papers are very technical, and require strong expertise and significant time for a reviewer to read and make sure that all the details are correct. In the system you are suggesting, most of such technical details will go unread, and may not be found to be incorrect -- making it likely that a paper suggesting an incorrect but otherwise seemingly attractive solution to a problem will become popular. I think the reason we believe anything published in a famous journal/conference is worth reading is mainly because we believe in its strong peer reviewing system (which of course is not without its faults). But, it takes much effort from the editors/organizers to push the experts (and I am using experts term loosely to mean anyone with enough technical expertise to understand the proofs etc.) with limited time in their hands to even get the submitted papers to be reviewed. I do not know how this would be made sure when there is not much push to review. Experts will end up reviewing only articles which really interest them or align well with their own research agendas. Isn't that even worse?

tnc said...

I was going to mention Arxiv too, but I see it's been mentioned already...