Sunday, September 16, 2012

Is the Danish/Norwegian Model a Step Forward?

In Denmark, a new scheme for assigning resources to the universities according to their research performance was recently rolled out. One aspect of this scheme is a mechanism for calculating a score that captures the publication performance of a university. This mechanism is based on a similar mechanism used in Norway.

It works roughly as follows: Each journal is assigned to a subject area and to level 1 or level 2. Within a subject area, at most 20% of the world production (in number of papers) can be in a level 2 journal. Thus, level 2 roughly contains the 20% best journals. A level 1 publication yields 1 point, and a level 2 publication yields 3 points.

The following journals in my area (places where I have published or where it would be natural for me to publish) are at level 2:

ACM Transactions on Database Systems
Communications of the ACM
The Computer Journal
Data and Knowledge Engineering
GeoInformatica
IEEE Transactions on Knowledge and Data Engineering
Information Systems
International Journal of Geographical Information Science
Journal of Database Management
Journal of Intelligent Information Systems
SIGMOD Record
Software: Practice & Experience
The VLDB Journal

(This is according to a list that may no longer be completely up to date.)

There is some debate as to whether this ranking is intended to regulate the behavior of researchers. Thus, some pertain that the ranking should only be used at high levels of aggregation, e.g., at the level of universities. However, others believe that the ranking should be applied to each individual researcher.

In my view, if this ranking significantly affects the funding that universities and faculties at universities receive, which will be the case, then deans who care about funding are likely the take steps towards improving the performance of their faculties. The opposite seems to be simply irresponsible. A scalable way of improving performance is to reward those departments that perform well. So at the end of the day, each individual researcher will be confronted with his or her score and will be rewarded for improving it.

Is this good or is this bad? To shed light on this, let me consider what we have been doing in my old group at Aalborg University. Each five years, an evaluation of the research in my department is being carried out. In connection with a recent evaluation, my group prepared this ranking of database journals:

General Journals

1a. ACM Transactions on Database Systems
1b. The VLDB Journal, IEEE Transactions on Knowledge and Data Engineering
2a. Information Systems
2b. Data and Knowledge Engineering, the Computer Journal
3. Journal of Intelligent Information Systems, Knowledge and Information Systems, Journal of Database Management, Journal of Data Semantics, Information Sciences, etc.

Specialized Journals

1. Geoinformatica, Transactions in GIS

We prepared this ranking based on the prestige we associated with the different journals. Thus, we considered publication in ACM TODS to carry the highest prestige and publication in The VLDB Journal and IEEE Transactions on Knowledge and Data Engineering to carry similar, but less prestige than publication in TODS. Together, we viewed these at the top-3 general database journals. We expected few to disagree – and this expectation has held true. Then follows a group of three journals with clearly less prestige, and among these, we felt that Information Systems was the better one. The listing of journals at the fifth level is incomplete and consists of respectable journals.

Comparing the national ranking to our own, several observations may be made:

First, our ranking is much more detailed than the national one. Second, the journals in our four top levels are in the top level in the national ranking. Of the 5 journals mentioned in our fifth level, 2 are in the national top level, 1 is not listed, and 2 are at the national level 1. For the specialized journals, 1 is at the top level, and 1 is at level 1.

How do I perform well according to the two schemes? First, consider our own scheme. The pars of our work that fit in the general journals, we need to publish in quite specific journals. And publishing in Journal of Database Management is not a substitute for publishing in ACM Transactions on Database Systems. Other parts of our work are too specific to be publishable in general database journals. For example, some work is specific to geographic information systems and is thus published in specialized outlets in the GIS area.

Second consider the national scheme. How can I improve my current performance? First, I should avoid publishing in level 1 outlets. But I already avoid level 1, for which reason this will not change my current behavior and thus will not improve my performance. So what can I do? I think there is a lot to be gained to stop publishing in the best level 2 outlets and start aiming for the bottom ones. Publication in the bottom ones is much easier. The results need not be as good or novel or significant, and one can slice things thinner and thus write shorter papers that build extensively on previous papers. One can publish things that one could never publish in the best outlets. So I imagine that by choosing my outlets carefully, I could easily triple my score.

However, if I did that, my colleagues would wonder what went wrong. And I worry that many would no longer bother to talk with me. The currency of the science is reputation, and my reputation would go down the drain. Would this be in the best interest of Denmark? I think not.

And frankly, if I had to publish in the bottom outlets, my motivation to do science would be seriously affected – this is not something I would be likely to spend some 20 hrs of my spare time on each week.

One response is that we simply need to refine the national model. I worry that a committee that needs to cover all of computer science is unable to do this in a meaningful way, even if the committee members are great scientist. And too many compromises need to be made. So I worry that the entire approach is problematic.

Perhaps a bottom-up approach is better. With such an approach, one can let the scientists in a specific research area in a department (or across several departments in different universities) quantify their performance based on the best possible insight into their research area.

In closing, I note that I have not touched upon the unusual importance of conferences in computer science. This causes its own host of problems when creating a simple system that is to apply across all subject areas of science.

[Note: I prepared the above some time ago, and so it is not entirely up-to-date with current developments. Notably, conferences are now in the process of being considered.]

Saturday, July 4, 2009

What Does SIGMOD Need?

Each four years, the ACM SIGMOD membership elects a Chair, a Vice-Chair, and a Secretary/Treasurer. This past week, the results of the 2009 election were announced. I ran for Vice Chair and had the great fortune to be elected.

As part of the election process, the candidates write a statement. I have included my statement below. When I take office, I will start working with my fellow SIGMOD officers on transforming the statement into concrete action. They, too, have written statements and no doubt have their own views on the matters that I address.

I hope to be able to elaborate on my statement in future blog posts. If, having read the statement, you have any comments or suggestions, please share.

Vice Chair statement:

“SIGMOD is an outstanding organization and it is a privilege to be given the opportunity to run for Vice Chair. If elected, my main objectives will be to understand and meet the needs of the community as best as I am able. I am committed to continuing to innovate SIGMOD.

The SIGMOD Conference is of central importance to the SIGMOD community. Recent years have seen increased dissatisfaction with the review process, and many attempts have been made at reengineering the review process within the constraints of a conference setting. However, if the quality of the reviewing itself is not high, such reengineering is ineffective. Many have observed that there are few rewards for good reviewing. I propose that SIGMOD initiates an effort to find ways of rewarding quality reviewing, to be introduced gradually and evaluated in a systematic manner.

I will work to integrate social networking tools into the SIGMOD web site in a way consistent with the high quality of SIGMOD, to ensure that the site reflects the breadth of our community and becomes more dynamic.

In particular, I believe that the very advances brought about by members of our community in the area of web data offer opportunities for further improving the SIGMOD web site. As one example, we should aim for a much more dynamic site that members of the community will want to visit frequently. To achieve that, additional and relevant content, including member-generated content, should be enabled. We should also aim for new ways of establishing an increased SIGMOD presence on the web.

Next, SIGMOD can and should do more to involve and provide services to its members and to the database community across the globe. For example, regional columns may be introduced in SIGMOD Record and on the SIGOMD web site.

Throughout my twenty years as a database researcher, I have had substantial collaborations across Asia, Europe, and the US; and I have spent substantial time in each. As a result, I am aware of the many different perspectives within our profession, and I will work hard to represent and honor them all. I have built a sizable research group in my department and I have served in leadership roles for top conferences as well as on some 140 PCs. That is the kind of commitment I will bring to SIGMOD.”

[http://www.sigmod.org/elections09/candidate.vp.Christian_Jensen.pdf]

Wednesday, June 10, 2009

(Why) Are Database Conferences in Decline?

This week, I took a look at the submission counts and acceptance rates for key database conferences. I started out by updating the statistics that Peter Apers has been maintaining for some time with the most recent data available. Here, I report my findings.

I view SIGMOD and VLDB as the flagship database conferences, and I believe that most members of the community will agree.

From 1993 to ca. 2002, SIGMOD received numbers of submissions in the range from 200 to 300, in a zigzag-like pattern. Then the number of submissions grew to the 400-to-500 range, peaking at 480 in 2007. The last couple of years have seen numbers that are similar to the 2004 and 2005 level of 431.

Now consider VLDB. The trend for the period 1993 to 2001 is that of a slight increase from around 300 to around 350, with 1999 being an outlier (390). Then follows a few years of marked increases in submissions. The last three years have each seen some 550 submissions. At 626 submissions, 2006 is an outlier.

The VLDB acceptance rates have been in the range from 13 to 19 percent since 1993, with only two years being outside the 14 to 18 percent range. There are no clear patterns. Similarly, the SIGMOD acceptance rates have generally been in the range from 14 to 18 percent during the past decade, with no clear trends toward increase or decrease.

So VLDB and SIGMOD are similarly selective. SIGMOD is seeing a slight decline in submissions, while VLDB is seeing a slight increase. VLDB is attracting the larger number of submissions.

Here, recent submission counts for several conferences are graphed:





I generally have no explanations for the various fluctuations in the submissions. Or for why SIGMOD seems to be decreasing slightly while VLDB seems to be increasing slightly.The relatively low figure for SIGMOD 2002 may be due to an overlap between the EDBT reviewing period and the SIGMOD submission date that year. And the locations of the conferences may be a factor. If anyone has good explanations for the figures, I would love to hear about it.

Next, I view ICDE and EDBT as the second-most prestigious database conferences.

For ICDE, the numbers of submissions generally were in the 250 to 300 range from 1993 to 2002. Then followed a gradual increase to 521 in 2005. The last three years have seen from about 550 to 650 submissions. It is noted that ICDE had few relatively submissions in 2006, the year where VLDB had relatively many (-is this a coincidence?).

The ICDE acceptance rate is characterized by a declining trend: From the 20-25 percent range early on to a situation where four of the last six acceptance rates have been in the range from 12 to 14 percent, with the two other being 19 percent.

EDBT also exhibits a slight growth from 1994 to 2002, ending slightly above 200. Then follows a strong growth that takes EDBT to 352 in 2006. And there was a slight decline in 2008. The acceptance rate has been at 17 percent for all years, with the exception of two (16 and 14 percent).

So we are seeing a pattern of slight growth until ca. 2002, then a significant growth followed by a decline or a slowing growth the last couple of years. I wonder why?

Here are the acceptance rates that correspond to the submission counts given above:




Finally, I want to mention CIKM. This conference has grown steadily over the years; and in 2009, CIKM received 847 submissions, which is the highest for a database research conference ever (yes, CIKM spans broader than databases). CIKM's acceptance rate has been as high as 40 percent, but it seems to have stabilized in the range from 15 to 18 percent.

It would be interesting to compare with non-database conferences as well.

Let me end by observing that there is of course more to a conference than the number of submissions and the acceptance rate. For example, the expectations associated with a conference wrt. the topics it accepts and its "toughness," affect the quality of the conference and the submission behavior, e.g., leading to self-refereeing, where researchers choose to send only their best papers to the "toughest" conferences.

PS. I chose to manage the conference statistics using Fusion Tables.