Is it feasible to compile Python to machine code?



Answers

As @Greg Hewgill says it, there are good reasons why this is not always possible. However, certain kinds of code (like very algorithmic code) can be turned into "real" machine code.

There are several options:

  • Use Psyco, which emits machine code dynamically. You should choose carefully which methods/functions to convert, though.
  • Use Cython, which is a Python-like language that is compiled into a Python C extension
  • Use PyPy, which has a translator from RPython (a restricted subset of Python that does not support some of the most "dynamic" features of Python) to C or LLVM.
    • PyPy is still highly experimental
    • not all extensions will be present

After that, you can use one of the existing packages (freeze, Py2exe, PyInstaller) to put everything into one binary.

All in all: there is no general answer for your question. If you have Python code that is performance-critical, try to use as much builtin functionality as possible (or ask a "How do I make my Python code faster" question). If that doesn't help, try to identify the code and port it to C (or Cython) and use the extension.

Question

How feasible would it be to compile Python (possibly via an intermediate C representation) into machine code?

Presumably it would need to link to a Python runtime library, and any parts of the Python standard library which were Python themselves would need to be compiled (and linked in) too.

Also, you would need to bundle the Python interpreter if you wanted to do dynamic evaluation of expressions, but perhaps a subset of Python that didn't allow this would still be useful.

Would it provide any speed and/or memory usage advantages? Presumably the startup time of the Python interpreter would be eliminated (although shared libraries would still need loading at startup).




Jython has a compiler targeting JVM bytecode. The bytecode is fully dynamic, just like the Python language itself! Very cool. (Yes, as Greg Hewgill's answer alludes, the bytecode does use the Jython runtime, and so the Jython jar file must be distributed with your app.)







PyPy is a project to reimplement Python in Python, using compilation to native code as one of the implementation strategies (others being a VM with JIT, using JVM, etc.). Their compiled C versions run slower than CPython on average but much faster for some programs.

Shedskin is an experimental Python-to-C++ compiler.

Pyrex is a language specially designed for writing Python extension modules. It's designed to bridge the gap between the nice, high-level, easy-to-use world of Python and the messy, low-level world of C.




This doesn't compile Python to machine code. But allows to create a shared library to call Python code.

If what you are looking for is an easy way to run Python code from C without relying on execp stuff. You could generate a shared library from python code wrapped with a few calls to Python embedding API. Well the application is a shared library, an .so that you can use in many other libraries/applications.

Here is a simple example which create a shared library, that you can link with a C program. The shared library executes Python code.

The python file that will be executed is pythoncalledfromc.py:

# -*- encoding:utf-8 -*-
# this file must be named "pythoncalledfrom.py"

def main(string):  # args must a string
    print "python is called from c"
    print "string sent by «c» code is:"
    print string
    print "end of «c» code input"
    return 0xc0c4  # return something

You can try it with python2 -c "import pythoncalledfromc; pythoncalledfromc.main('HELLO'). It will output:

python is called from c
string sent by «c» code is:
HELLO
end of «c» code input

The shared library will be defined by the following by callpython.h:

#ifndef CALL_PYTHON
#define CALL_PYTHON

void callpython_init(void);
int callpython(char ** arguments);
void callpython_finalize(void);

#endif

The associated callpython.c is:

// gcc `python2.7-config --ldflags` `python2.7-config --cflags` callpython.c -lpython2.7 -shared -fPIC -o callpython.so

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <python2.7/Python.h>

#include "callpython.h"

#define PYTHON_EXEC_STRING_LENGTH 52
#define PYTHON_EXEC_STRING "import pythoncalledfromc; pythoncalledfromc.main(\"%s\")"


void callpython_init(void) {
     Py_Initialize();
}

int callpython(char ** arguments) {
  int arguments_string_size = (int) strlen(*arguments);
  char * python_script_to_execute = malloc(arguments_string_size + PYTHON_EXEC_STRING_LENGTH);
  PyObject *__main__, *locals;
  PyObject * result = NULL;

  if (python_script_to_execute == NULL)
    return -1;

  __main__ = PyImport_AddModule("__main__");
  if (__main__ == NULL)
    return -1;

  locals = PyModule_GetDict(__main__);

  sprintf(python_script_to_execute, PYTHON_EXEC_STRING, *arguments);
  result = PyRun_String(python_script_to_execute, Py_file_input, locals, locals);
  if(result == NULL)
    return -1;
  return 0;
}

void callpython_finalize(void) {
  Py_Finalize();
}

You can compile it with the following command:

gcc `python2.7-config --ldflags` `python2.7-config --cflags` callpython.c -lpython2.7 -shared -fPIC -o callpython.so

Create a file named callpythonfromc.c that contains the following:

#include "callpython.h"

int main(void) {
  char * example = "HELLO";
  callpython_init();
  callpython(&example);
  callpython_finalize();
  return 0;
}

Compile it and run:

gcc callpythonfromc.c callpython.so -o callpythonfromc
PYTHONPATH=`pwd` LD_LIBRARY_PATH=`pwd` ./callpythonfromc

This is a very basic example. It can work, but depending on the library it might be still difficult to serialize C data structures to Python and from Python to C. Things can be automated somewhat...

Nuitka might be helpful.

Also there is numba but they both don't aim to do what you want exactly. Generating a C header from Python code is possible, but only if you specify the how to convert the Python types to C types or can infer that information. See python astroid for a Python ast analyzer.




Pyrex is a subset of the Python language that compiles to C, done by the guy that first built list comprehensions for Python. It was mainly developed for building wrappers but can be used in a more general context. Cython is a more actively maintained fork of pyrex.




Related