What is this?

This project makes it easy to analyze the Python ecosystem by providing of all the code ever published to PyPI via git, parquet datasets with file metadata, and a set of tools to help analyze the data.

Thanks to the power of git the contents of PyPI takes up only 429.8 GB on disk, and thanks to tools like libcst every Python file can be analysed on a consumer-grade laptop in a few hours.

Download all the codeExplore the datasets


Stats for nerds 🤓

Total files
1.31 Billion
93,488,818 unique
Total lines of text
417.4 Billion
417,393,577,752 to be precise
Total uncompressed size
75.6 TiB
That is ~56,620,475.407 floppy disks
Lines of code added per second
5,481
In the month 2024-02-01
Click here for lots more stats!