In today’s fast-paced digital world, cybersecurity is a paramount concern for organisations of all sizes. To address this, the Center for Internet Security (CIS) has developed a comprehensive set of guidelines known as CIS Benchmarks (https://www.cisecurity.org/cis-benchmarks).

These benchmarks provide invaluable security recommendations and best practices to help organisations secure their systems and networks effectively. They cover a wide range of topics, including operating systems, applications, cloud services, and more. The benchmarks provide detailed configurations and guidelines for securing systems, which are essential for reducing vulnerabilities and mitigating risks. CIS Benchmarks are widely recognised and trusted across industries, making them an invaluable resource for organisations striving to protect their digital assets.

While CIS Benchmarks offer valuable insights, they are published in PDF format, which can be challenging to work with for data extraction and review purposes.

In looking for a more efficient solution, I have written a Python script that simplifies the process of converting CIS benchmark PDFs into CSV format. This script automatically extracts the relevant information, and transforms it into a structured and easily manageable format. In applying the CIS benchmarks to my own working environment, CSV has proven to be a much easier format to share and work with.

The script itself looks for keywords, like the titles of sections and sub-sections. It saves everything after that into a defined list, or a bucket as I have tended to think of it. When it comes across a title of the next section, it prints all of its saved ‘buckets’ to csv.

The script is saved here at my github page.

Being an amateur coder, if anyone has any suggestions for how to improve or streamline the code, I’m all ears!