ICLR 2026 – Institutional Affiliations Dataset and Analysis

TL;DR

Researchers have developed a comprehensive, PDF-derived dataset of institutional affiliations from ICLR 2026 accepted papers. This dataset aims to improve accuracy over profile-based methods and provides a visual map of research influence. The development offers new insights into AI research hubs but some data processing details remain ongoing.

A new dataset derived from ICLR 2026 accepted papers has been released, offering a detailed, PDF-based view of institutional affiliations in AI research. This development aims to provide more accurate attribution of research output to institutions, moving beyond author profile data. The dataset, compiled through an automated pipeline, covers 5,356 papers and includes visualizations of the research landscape.

The dataset was created by processing PDFs of all accepted ICLR 2026 papers, extracting author affiliations directly from the document title blocks. This approach avoids common issues with profile-based affiliation data, such as ‘profile drift,’ where author current jobs are incorrectly linked to past publications. The pipeline successfully parsed 96% of papers, with the remaining 4% relying on OpenReview profiles. It produces several data files, including a ranked list of institutions based on unique paper counts, first-author contributions, and fractional credits.

The primary visualization is a treemap showing the top 50 institutions, sized by their publication count and region, with distinctions between academia and industry. The data is normalized via around 250 regex rules to unify institution names, ensuring consistency across the dataset. The project also provides sensitivity analyses comparing different counting methods to assess robustness of institutional rankings.

Why It Matters

This dataset enhances the accuracy of analyzing research influence and collaboration patterns in AI by providing a more reliable attribution of institutional output. It enables researchers, policymakers, and institutions to better understand the distribution of research efforts, identify leading hubs, and track trends over time. The visualizations facilitate quick comprehension of the research landscape, which is valuable for strategic decisions and funding allocations.

Amazon

AI research institutional affiliation dataset

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Prior to this development, affiliation data largely depended on author profiles or manual curation, which often led to inaccuracies due to profile drift. The approach taken at ICLR 2026 leverages PDF extraction, a method increasingly adopted for large-scale research analytics. Similar efforts have been made in other conferences, but the current pipeline is notable for its scale and automation, covering all accepted papers at a major AI conference.

“This pipeline provides a more accurate picture of who is shaping AI research right now, by extracting affiliations directly from PDFs rather than relying on potentially outdated profiles.”

— Dmytro Lopushanskyy, project lead

Wolfram Summer School Research Reports 2024

Wolfram Summer School Research Reports 2024

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how the dataset will perform across future conferences or other fields, as the methodology is tailored to PDF layouts common in ICLR papers. Additionally, some affiliations still rely on profile data due to parsing failures, though these are a small minority. The long-term stability of the institution rankings and regional groupings remains to be validated as more data is accumulated over time.

USB Data Recovery Device | Windows Data Recovery Software | Recover SD Card, Photos, Files

USB Data Recovery Device | Windows Data Recovery Software | Recover SD Card, Photos, Files

Recover Deleted Files Quickly & Easily – Simply plug in the Data Recovery Stick and click start—no technical…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include expanding the pipeline to other conferences, refining PDF parsing accuracy, and integrating the dataset into research analytics tools. Further analysis is expected to reveal more detailed collaboration networks and trends. The team plans to publish updates and potentially open-source the pipeline for broader use.

Amazon

institutional ranking visualization tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does this dataset improve upon previous affiliation data?

It extracts affiliations directly from PDFs, avoiding inaccuracies caused by author profile drift, thus providing more current and precise institutional attribution.

Can this method be applied to other conferences?

Yes, the pipeline is designed to be adaptable, though adjustments may be needed for different document formats and layouts.

What are the main limitations of the current dataset?

Approximately 4% of papers rely on profile data due to parsing failures, and the methodology may need refinement for different types of PDFs or future conference formats.

How might this dataset influence AI research or policy?

By providing more accurate institutional data, it can inform funding decisions, collaboration strategies, and assessments of research impact across regions and institutions.

You May Also Like

Septerna, Inc. (SEPN) Presents at Bank of America Global Healthcare Conference 2026 Transcript

Septerna, Inc. (SEPN) announced its participation in the Bank of America Global Healthcare Conference 2026, highlighting its latest developments and strategic outlook.

Parex Resources Inc. (PXT:CA) Shareholder/Analyst Call Prepared Remarks Transcript

Parex Resources (PXT:CA) shareholder/analyst call reveals company’s Q1 outlook and strategic focus, with confirmed guidance and ongoing uncertainties.

Nippon Steel projects $630m profit for US Steel on added efficiency

Nippon Steel expects its U.S. subsidiary U.S. Steel to contribute $630 million in profit this fiscal year due to increased operational efficiency, according to sources.

Line-Yahoo Japan operator values Kakaku.com at $4bn in challenge to EQT

Line-Yahoo Japan’s operator has launched a counterbid for Kakaku.com, valuing it at $4 billion and sparking a potential takeover battle with EQT.