Facebook open-sources a static analyzer for Python code

Need a tool to check your Python-based applications for security issues? Facebook has open-sourced Pysa (Python Static Analyzer), a tool that looks at how data flows through the code and helps developers prevent data flowing into places it shouldn’t.

Python Static Analyzer

How the Python Static Analyzer works

Pysa is a security-focused tool built on top of Pyre, Facebook’s performant type checker for Python.

“Pysa tracks flows of data through a program. The user defines sources (places where important data originates) as well as sinks (places where the data from the source shouldn’t end up),” Facebook security engineer Graham Bleaney and software engineer Sinan Cepel explained.

“Pysa performs iterative rounds of analysis to build summaries to determine which functions return data from a source and which functions have parameters that eventually reach a sink. If Pysa finds that a source eventually connects to a sink, it reports an issue.”

It’s used internally by Facebook to check the (Python) code that powers Instagram’s servers, and do so quickly. It’s used to check developer’s proposed code change for security and privacy issues and to prevent them being introduced in the codebase, as well as to detect existing issues in a codebase.

The found issues are flagged and, depending on their type, the report is send either to the developer or to security engineers to check it out.

You can get Pysa from here and you can use a number of already developed definitions to help it find security issues.

“Because we use open source Python server frameworks such as Django and Tornado for our own products, Pysa can start finding security issues in projects using these frameworks from the first run. Using Pysa for frameworks we don’t already have coverage for is generally as simple as adding a few lines of configuration to tell Pysa where data enters the server,” the two engineers added.

The tool’s limitations and stumbling blocks

Pysa can’t detect all security or privacy issues, just data flow–related security issues. What’s more, it can’t detect all data flow–related issues because the Python programming language is very flexible and dynamic (allows code imports, change function call actions, etc.)

Finally, those who use it have make a choice about how many false positives and negatives they will tolerate.

“Because of the importance of catching security issues, we built Pysa to avoid false negatives and catch as many issues as possible. Reducing false negatives, however, may require trade-offs that increase false positives. Too many false positives could in turn cause alert fatigue and risk real issues being missed in the noise,” the engineers explained.

The number of false positives can reduced by using sanitizers and manually added and automatic features.

Don't miss