LABIC - Bioinformatics and Computational Intelligence
Laboratory
Local Repository of Research Datasets
UTFPR-ABCD: Amazon Book Covers Dataset
- Objective:
The dataset was created using the Amazon virtual bookstore. All the
relationships found are a result of visiting Amazon's website and
registering product recommendations, which in the case of this dataset are
books. The dataset was created to enable data mining studies related to the
features of the book covers to their popularity and sales.
- Data Description:
The dataset contains over 180 million relationships between a pool of almost 6
million objects. These relationships are a result of visiting Amazon and
recording the product recommendations that it provides. For our approach we
filter the products of the book category, and the following steps were carried
out:
- Conversion of JSON (JavaScript Object Notation) format data to JSON
Strict, which allows direct import by MongoDB.
- Import data to a non-relational database (MongoDB was chosen) so that
queries can be made to that data.
- Select only books with related purchase information (also bought and /
or bought together).
The total of books obtained up to this point was 59173, a sample of the
dataset can be seen in the Figure below, where:
- asin - ID of the product, e.g. 0000031852;
- title - name of the product;
- price - price in US dollars (at time of crawl);
- imUrl - url of the product image;
- related - related products (also bought, also viewed, bought
together, buy after viewing);
- salesRank - sales rank information;
- brand - brand name;
- categories - list of categories the product belongs to;
In addition, color and object characteristics of the book covers were extracted,
respectively using colorgram (python) and the Yolo neural network. These
features were included in our database as it is also shown in the Figure below.

- Link to the dataset:
(soon it will be
available for download)
-
Related Papers:
- Brenda C. S. Berno, Ademir C. Gabardo, Leandro T. Hattori, Andrei
S. Inácio, Matheus Gutoski, Heitor S. Lopes. A Framework for Analyzing Book
Covers and Co-purchases using Object Detection and Data Mining Methods.
Proc. IEEE Latin-American Conference on Computational Intelligence,
2019.