The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.
I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.
The dataset consists of the following fields:
- An image of the first page of the paper - title: The title of the paper - authors_list: The list of authors - abstract: The abstract of the paper - arxiv_link: Link to the paper on arXiv - other_link: Link to the project page, if found - category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy) - all_categories: All categories this paper falls into, according to arXiv taxonomy - keywords: Extracted using GPT-4o
- Scrape the CVPR 2024 website for accepted papers - Use DuckDuckGo to search for a link to the paper's abstract on arXiv - Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper - Use pdf2image to save the image of paper's first page - Use GPT-4o to extract keywords from the abstract
I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.
I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.