Skip to content

Latest commit

 

History

History
executable file
·
22 lines (14 loc) · 978 Bytes

README.md

File metadata and controls

executable file
·
22 lines (14 loc) · 978 Bytes

The Personal Rasmus Lerdorf

Imagine that you had Rasmus Lerdorf sitting next to you. Rasmus can comment PHP code for you.

This proof-of-concept uses 443 PHP libraries from GitHub to train a Sequence to Sequence Deep Learning with TensorFlow.

Facts about the dataset

We have prepared the data by writing comments and PHP snippets into two files comments.dat and snippets.dat. Line 615 in snippets.dat has its respective comment in line in 615 in comments.dat:

Preview of data
Read obtaining the data yourself if you are interested in how we got this data
  • A total of 34,105 PHP files in the dataset
  • 33,116 (97.10%) had accompanying PSR Docblocks that could be parsed
  • 989 files (2.90%) were skipped due to incomplete/missing PSR Docblocks.
  • Resulting data setis 181,937 rows (88 MB).

Mining the data

Test