Skip to main content

Evaluation summary and metrics: “Artificial Intelligence and Economic Growth”

Summary, metrics and ratings, and Manager's comments on Evaluation of “Artificial Intelligence and Economic Growth” by Aghion et al.

Published onMar 16, 2023
Evaluation summary and metrics: “Artificial Intelligence and Economic Growth”
·

Abstract

We organized two evaluations of the paper “Artificial Intelligence and Economic Growth” (1. Seth Benzell, 2. Philip Trammell). The authors also responded. To read the evaluations and the response, click the links at the bottom.

Paper: Artificial Intelligence and Economic Growth (2018) in The Economics of Artificial Intelligence: An Agenda.

Authors: Philippe Aghion, Benjamin F. Jones, Charles I. Jones.

Originally published as NBER Working Paper 23928 (2017).

Evaluation manager’s notes (David Reinstein)

We are grateful to the authors of this paper for agreeing to participate and engage with the Unjournal’s evaluation of this paper, and for following through with this. (Although this was an NBER working paper this was selected before we began the “Unjournal Direct track”.)

In our current phase, The Unjournal is mainly targeting empirical papers (and papers with quantitative simulations, impact evaluations, direct policy recommendations, etc.) In contrast, this would probably be considered ‘applied macroeconomic/growth theory’. Nonetheless, we saw this work as particularly important and influential for reasons mentioned here (considering tradeoffs between positive and negative consequences of AI; the paper appears in ‘economics of effective altruism and longermism’ syllabi; it has nearly 500 citations).

We are also grateful for the extremely diligent work of the evaluators. My impression (from my own experience, from discussions, and given the incentives we have in place) is that we rarely see referees and colleagues actually reading and checking the math and proofs in their peers’ papers. Here Phil Trammel did so and spotted an error in a proof of one of the central results of the paper (the ‘singularity’ in Example 3). Thankfully, he was able to communicate with the authors, and work out a corrected proof of the same result (see philiptrammell.com “Growth given Cobb-Douglas Automation”) currently linked here.

The authors have acknowledged this error (and a few smaller bugs), confirmed the revised proof, and link a marked up version on their page. This is ‘self-correcting research’, and it’s great!

Even though the same result was preserved, I believe this provides a valuable service.

  1. Readers of the paper who saw the incorrect proof (particularly students) might be deeply confused. They might think ‘Can I trust this papers’ other statements?’ ‘Am I deeply misunderstanding something here? Am I not suited for this work?’ Personally, this happened to me a lot in graduate school; at least some of the time it may have been because of errors and typos in the paper.

  2. I suspect many math-driven papers also contain flaws which are never spotted, and these sometimes may affect the substantive results (unlike in the present case).

Again, I’m grateful to the present authors for being willing to put their work through this public checking, and acknowledging and correcting the errors. I now have more confidence that the paper’s results are valid, and that the authors have confidence in their work. This makes their research output more credible overall, and it sets a great example for the field.

Evaluators were asked to follow the general guidelines available here. For this paper, we did not give specific suggestions on ‘which aspects to evaluate’.

In addition to written evaluations (similar to journal peer review), we ask evaluators to provide quantitative metrics on several aspects of each article. These are put together below.

Metrics

Metrics: data format (github gist)

Link

Data format. We aim to enable analysis of these ratings through code processes

Metrics: table format

Eva1. 1

Seth Benzell

Eval. 2

Philip Trammel

Rating category

Rating (0-100)

90% CI (0-100)*

Rating (0-100)

90% CI (0-100)*

Comments (fn)

Overall assessment

80

(70, 90)

92

(80, 100)

Advancing knowledge and practice

75

(65, 85)

97

(80, 100)

Methods: Justification, reasonableness, validity, robustness

80

(75, 85)

70

(40, 90)

1

Logic & communication

70

(60, 80)

45

(30, 70)

2

Open, collaborative, replicable*

95

(90, 100)

80

3

Relevance to global priorities

90

(85, 100)

92

(80, 100)

4

*Evaluation Manager (David Reinstein): Evaluator 2 wrote “80?” in his rating here; see comment column footnote.

Predictions

Eva1. 1

Seth Benzell

Eval. 2

Philip Trammel

Prediction metric

Rating (0-5) (low to high)

90% CI (0-5)*

Comments (fn)

Rating (0-5)

Conf. (0-5)*
High = 5, Low = 0

Comments (fn)

What ‘quality journal’ do you expect this work will be published in? Note: 0= lowest/none, 5= highest/best

5

6

On a ‘scale of journals’, what ‘quality of journal’ should this be published in?

Note: 0= lowest/none, 5= highest/best

4

(3.5, 5.0)

7

5

4**

8

**Evaluation Manager’s note (David Reinstein) evaluator 2 indicated “Medium-high” confidence. I am interpreting this as a confidence level of 4 on our scale

Comments
0
comment
No comments here
Why not start the discussion?