Tecnologie – Italian C++ Community

Primi passi con Boost.Python

stefano — Wed, 02 Dec 2015 18:12:00 +0000

“Finalmente un linguaggio più moderno e funzionale”

Chi fra noi non vorrebbe programmare in un linguaggio multiparadigma, altamente espressivo, in piena evoluzione e con una vastissima libreria standard? Stiamo parlando, ovviamente, di… Python.

Ci sono casi in cui il nostro solito campione (C++11), non è la scelta migliore. Per un prototipo da sviluppare in fretta, uno script “usa e getta”, il server di un’applicazione web, del codice di ricerca… la complessità del C++ è più un peso che un vantaggio.

Come possiamo continuare a sfruttare l’efficienza del C++ o riutilizzare codice già esistente senza passare per cavernicoli fuori moda?

L’interprete Python può caricare moduli scritti in C, compilati in librerie dinamiche. Boost.Python ci aiuta, enormemente, a prepararli. Uniamo la potenza di Boost e C++ alla semplicità di Python.

Attenzione: anche se tutti gli esempi compilano, girano e passano i test questa non è la guida definiva su Boost.Python. Il codice è illustrativo, riflette solo la nostra (scarsa) esperienza con Boost.Python. Non esitate a segnalarci errori.

Un problema di velocità

Vediamo un caso (non troppo) pratico. Ci sono numeri uguali alla somma dei loro divisori (6 = 3 + 2 + 1; numeri perfetti). Il reparto marketing ha fiutato l’affare, ma è fondamentale calcolarne il più possibile prima della concorrenza. La velocità di sviluppo di Python è l’arma vincente, dopo 5 minuti rilasciamo Pefect 1.0®:

def trova_divisori(numero):
	divisori = []
	for i in range(1, numero):
		if numero % i == 0:
			divisori.append(i)
	return divisori


def perfetto(numero):
	divisori = trova_divisori(numero)
	return numero == sum(divisori)


def trova_perfetti(quanti_ne_vuoi):
	trovati = 0
	numero_da_provare = 1
	while (trovati < quanti_ne_vuoi):
		if perfetto(numero_da_provare):
			print numero_da_provare
			trovati += 1
		numero_da_provare += 1


if __name__ == "__main__":
	trova_perfetti(4) # Cercatene di più a vostro rischio e pericolo.
                        # L'attesa sarà lunga...

Questo codice non è perfettamente “pythonico” (https://www.python.org/dev/peps/pep-0008/), ma è stato veramente creato, testato e debuggato nel tempo che di solito spendiamo a leggere un’errore di compilazione¹.

Peccato che il tempo di esecuzione sia paragonabile: 6,5 secondi sulla mia macchina di prova (che non è la vostra, non è il server di produzione, non è il PC del Python-boy che a lui gira tutto in un picosecondo… è un esempio!).

Da bravi ingegneri cerchiamo il collo di bottiglia con il profiler:

import cProfile

... stesso codice di prima ...

if __name__ == "__main__":
	cProfile.run('trova_perfetti(4)')

Ed ecco il risultato:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.420    7.420 :1()
     8128    0.709    0.000    7.326    0.001 purePython-profiler.py:15(perfetto)
        1    0.095    0.095    7.420    7.420 purePython-profiler.py:19(trova_perfetti)
     8128    5.190    0.001    6.523    0.001 purePython-profiler.py:8(trova_divisori)
    66318    0.819    0.000    0.819    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     8128    0.514    0.000    0.514    0.000 {range}
     8128    0.094    0.000    0.094    0.000 {sum}

trova_divisori “ruba” quasi tutti i 6,5 secondi!

boost::python

Nessuno nega che si possa scrivere codice efficiente in Python (Java, VisualQualcosa, il linguaggio funzionale di questa settimana…), ma ottimizzare l’algoritmo di trova_divisori è fuori discussione: vogliamo mostrare Boost.Python, non fare una lezione di Algebra.

Per prima cosa, ci procuriamo Boost.Python. Su una macchina Linux è semplice quanto usare:

sudo apt-get install libboost-all-dev

Potreste dover installare anche i package “dev” di Python. Non è difficile trovare su internet istruzioni per tutte le piattaforme, ma installare (e compilare) può essere la parte più difficile. Non scoraggiatevi.

Questo è il codice C++:

#include "boost/python.hpp"  // (1)

boost::python::list trovaDivisori(uint64_t numero) // (2)
{
	boost::python::list divisori;
	for (uint64_t i = 1; i < numero; ++i)  // (3)
		if (numero % i == 0)
			divisori.append(i);
	return divisori;
}

BOOST_PYTHON_MODULE(divisori)
{
    using namespace boost::python;
    def("trova_divisori", trovaDivisori);  // (4)
}

Includiamo Boost.Python. Deve essere incluso prima di ogni altro eventuale header per evitare warning alla compilazione.
La funzione equivalente a quella che vogliamo sostituire in Python. Manteniamo la stessa segnatura (prende un intero, ritorna una lista) dell’originale in Python per rendere la sostituzione “trasparente”.
Anche l’algoritmo è esattamente lo stesso. Cambia solo la sintassi, e neanche di molto. In questo caso tutta la differenza la fa, probabilmente, il runtime C++.
Dichiariamo la funzione nel modulo python con “def” (…come in Python).

La guida (http://www.boost.org/doc/libs/1_59_0/libs/python/doc/) spiega molto chiaramente tutti dettagli.

La compilazione, purtroppo, non è esattamente elementare, dovrete probabilmente adattarla caso per caso. Vediamo l’esempio un passo alla volta (si tratta di una sola riga di comando, naturalmente):

g++ divisori.cpp			    compilo un file C++, qui tutto normale
 -o divisori.so  			    nome del file: Python esige sia lo stesso del modulo
-I /usr/include/python2.7/	            includo gli header di Python (ho Boost già nel path)
-l python2.7 -lboost_python -lboost_system  includo Python, Boost
-shared -fPIC -Wl,-export-dynamic           chiedo di creare una libreria dinamica

stackoverflow.com farà il resto. Notare che, per “par condicio”, non stiamo usando le opzioni di ottimizzazione di g++.

Una volta che la nostra libreria è nel path di sistema (altrimenti Python non la trova) possiamo includerla nel codice Python:

from divisori import trova_divisori

def perfetto(numero):
	divisori = trova_divisori(int(numero)) # Adesso chiama quella in C++
	return numero == sum(divisori)

… stesso codice di prima …

Tempo di esecuzione: poco meno di un secondo. Siamo testimoni del classico “l’80% del tempo si spreca nel 20% del codice”. Lo stesso algoritmo è 6 volte più veloce, ma l’unica parte su cui abbiamo perso tempo con la programmazione a basso livello (dopotutto, è ancora C++98!) è una sola funzione. Per tutto il resto possiamo ancora approfittare della praticità di Python.

Qualche possibilità in più

Boost.Python non si limita a convertire i tipi primitivi e a incapsulare le liste di Python in un adapter C++. Ecco una selezione dei casi “tipici” per chi programma nel “C con classi”:

class RiutilizzabileInPython 
{
	public:
		RiutilizzabileInPython() {};
		RiutilizzabileInPython(int x, const std::string& y) {};
		int variabileIstanza;
		static void metodoStatico() {};
		void metodo() {}
};

BOOST_PYTHON_MODULE(oop)
{
    using namespace boost::python;
    class_("implementata_in_CPP")	//(1)
	.def(init())				//(2)
	.def_readwrite("variabile_istanza", &RiutilizzabileInPython::variabileIstanza)//(3)
	.def("metodo_statico", &RiutilizzabileInPython::metodoStatico).staticmethod("metodo_statico") //(4)
	.def("metodo", &RiutilizzabileInPython::metodo)		// (5)
    ;
}

>Apriamo la dichiarazione della classe, passando la stringa con il nome Python.
Traduzione del costruttore in Python (…init, ricorda niente?).
La “tradizione” Python non disdegna le variabili di oggetto pubbliche. Eccone una.
Solo una ripetizione del nome Python per esporre un metodo statico.
Il classico, semplice metodo d’istanza.

Una volta compilato (…tra il dire e il fare…) possiamo usare la classe C++ in Python:

from oop import implementata_in_CPP

x = implementata_in_CPP()
y = implementata_in_CPP(3, "ciao")
x.variabil_istanza = 23
implementata_in_CPP.metodo_statico()
x.metodo()

Boost si preoccupa di convertire parametri, tipi di ritorno eccetera. Ci sono opzioni per l’“esportazione” diretta delle classi della STL (e se non ci sono è possibile definirle) e per le policy dei tipi ritornati (per reference, per copia…). Le possibilità sono moltissime, affidatevi alla guida ufficiale.

Quando il gioco si fa duro, Boost continua a giocare. Un assaggio:

class Problems
{
	public:
		void stampa() {
			std::cout << "cout continua a funzionare" << std::endl;
		}

		void eccezione() {
			throw std::runtime_error("Oh, no!!!");
		}

		void coreDump() {
			int * nullPointer = 0;
			*nullPointer = 24;
		}
};

BOOST_PYTHON_MODULE(oop)
{
    using namespace boost::python;

     class_("Problems")
	.def("stampa", &Problems::stampa)
	.def("eccezione", &Problems::eccezione)
	.def("coreDump", &Problems::coreDump)
    ;
}

Il “test-driver” in Python, con un esempio di output:

from oop import Problems
p = Problems()
p.stampa()
try:
	p.eccezione()
except RuntimeError as e:
	print "Il codice C++ non ha funzionato: " + str(e);
p.coreDump()

cout continua a funzionare				(1)
Il codice C++ non ha funzionato: Oh, no!!!	        (2)
Segmentation fault (core dumped)			(3)

Debuggare a colpi di std::cout non è una buona pratica… ma funziona!
Le eccezioni sono perfettamente “inoltrate” al runtime Python.
…pensavate di salvarvi, eh?

Multithreading

Boost.Python non è l’unica arma per affrontare problemi che richiedono efficienza. Il codice multi thread è un modo comune di aumentare le prestazioni, tanto per per trovare divisori che per minare Bitcoin o craccare password. Ecco una classe C++ che sta per saltare in un thread Python.

class JobTrovaDivisori {

	public:
		JobTrovaDivisori(uint64_t numero, uint64_t begin, uint64_t end) :
			numero(numero), begin(begin), end(end) {}
		
		boost::python::list trovaDivisori()
		{
			std::cout << "Start" << std::endl;

			boost::python::list divisori;
			for (uint64_t i = begin; i < end; ++i)
				 if (numero % i == 0)
					divisori.append(i);

			std::cout << "end" << std::endl;
			return divisori;
		}

	private:
		uint64_t numero;
		uint64_t begin;
 		uint64_t end;
};

BOOST_PYTHON_MODULE(fattorizzare)
{
    using namespace boost::python;
    class_("JobTrovaDivisori", init())
	.def("trova_divisori", &JobTrovaDivisori::trovaDivisori)
    ;
}

L’oggetto “JobTrovaDivisori” controlla se i numeri tra “begin” e “end” sono divisori di “numero”. Parallelizziamo il problema di trovare tutti i divisori in più “job” usando ogni oggetto su un intervallo diverso. Non ci sono dati condivisi, non abbiamo alcun problema di concorrenza. Questa è l’unica nota positiva di questa soluzione, ma ancora una volta tralasciamo la matematica (e l’ingegneria del software).

La chiamata in Python:

from threading import Thread
from fattorizzare import JobTrovaDivisori

class Job():							# (1)
	def __init__(self, numero, begin, end):
		self.cppJob = JobTrovaDivisori(numero, begin, end)
		self.divisori = []
	
	def __call__(self):
		self.divisori = self.cppJob.trova_divisori()

		
def trova_divisori_parallelo(numero):			# (2)
	limite = numero / 2

	job1 = Job(numero, 1, limite)
	job2 = Job(numero, limite, numero)

	t1 = Thread(None, job1)
	t2 = Thread(None, job2)
	
	t1.start()
	t2.start()
	t1.join()
	t2.join()

	return [job1.divisori, job2.divisori]


if __name__ == "__main__":
	print trova_divisori_parallelo(223339244);	#(3)

Incapsuliamo il Job C++ per “non complicarci la vita” cercando di esportare un callable C++.
Questo metodo crea 2 job, esegue il “fork e join” (o, come dicono oggi, “map e reduce”), poi stampa il risultato.
Fattorizziamo un numero qualunque.

Ecco l’output: ricordate le stampe di “Start” e “end” nella classe C++? Dopo circa 8 secondi e mezzo il calcolo termina, senza nessun parallelismo:

Start
end
Start
end
[[1L, 2L, 4L, 53L, 106L, 212L, 1053487L, 2106974L, 4213948L, 55834811L], [111669622L]]

Non è un caso. Gli oggetti Python sono protetti dal Global Interpreter Lock (GIL). Spetta al programmatore di ciascun thread rilasciarlo per dare il “via libera” agli altri thread. L’accortezza è di non chiamare codice puramente Python quando non si possiede il lock.

Come al solito in C++ controlliamo le risorse col metodo RAII. L’idioma per il GIL è (https://wiki.python.org/moin/boost.python/HowTo#Multithreading_Support_for_my_function):

class ScopedGILRelease
{
public:
    inline ScopedGILRelease(){
        m_thread_state = PyEval_SaveThread();
    }
    inline ~ScopedGILRelease()    
        PyEval_RestoreThread(m_thread_state);
        m_thread_state = NULL;
    }
private:
    PyThreadState * m_thread_state;
};

Rilasciamo il lock nella classe C++:

boost::python::list trovaDivisori() {
	ScopedGILRelease noGil = ScopedGILRelease(); // (1)
	std::cout << "Start" << std::endl;
		
	boost::python::list divisori;
	for (uint64_t i = begin; i < end; ++i)
		 if (numero % i == 0)  
			divisori.append(i); // (2) Possibile Core Dump!
	std::cout << "end" << std::endl;
	return divisori;
}

Quando questa variabile esce dallo scope, il lock è ri-acquisito, come se fosse uno smart pointer “al contrario”.
Qui è dove prenderemo il core dump. Ma solo in produzione.

Ricordate la clausola “l’accortezza è di non chiamare codice puramente Python quando non si possiede il lock”? La riga (2) potrebbe fare esattamente quello. Provate a far crescere la lista a dismisura (ad esempio, elimiate la “if (numero…” e salvate tutti i numeri nella lista). Credo che, probabilmente (affidatevi alle guide ufficiali per conoscere la vera risposta!) l’interprete Python deve allocare una lista più grossa, ma non avendo il lock qualcosa si corrompe.

Racchiudiamo la sezione parallelizzabile in uno scope a parte, salvando i numeri in una variabile non condivisa con Python:

boost::python::list trovaDivisori() {
	std::cout << "Start" << std::endl;
	std::vector divisoriTemp;
	{
	ScopedGILRelease noGil = ScopedGILRelease();
		for (uint64_t i = begin; i < end; ++i)
			 if (numero % i == 0) 
				divisoriTemp.push_back(i);
		std::cout << "end" << std::endl;
	} // noGil esce dallo scope. Riprendiamo il lock.
	boost::python::list divisori;
	BOOST_FOREACH(uint64_t n, divisoriTemp) {
		divisori.append(n);
	}
	return divisori;
}

Dopo 6 secondi e mezzo (-2 rispetto alla versione “accidentalmente sequenziale”) otteniamo l’interleaving previsto (Start Start – end end). Quei 2 secondi possiamo spenderli per pensare a una soluzione meno rimediata.

Questo conclude l’introduzione a Boost.Python. Ora conosciamo un modo per “incastrare” moduli C++ nelle applicazioni Python, sia per riutilizzarli che per ragioni di efficienza. Boost.Python connette i due mondi senza sacrificare la semplicità di Python e senza limitare le possibilità in C++, pur se è necessaria qualche accortezza. Soprattutto, d’ora in avanti avremo l’ultima parola nel classico flame “Python vs C++” su tutti i forum del mondo!

1E’ vero che si fa prima a fare un programma in Python che aggiustare un solo bug C++.

Fate la prova. Pronti, partenza, via:

/usr/include/c++/4.8/bits/stl_map.h:646:7: note: no known conversion for argument 1 from 
‘int’ to ‘std::map, std::basic_string<
;char> > >::iterator {aka std::_Rb_tree_iterator, std::basic_string > > >}’
/usr/include/c++/4.8/bits/stl_map.h:670:9: note: template void std::map<_Key, _Tp, _Compare, _Alloc>::insert(_InputIterator, 
_InputIterator) [with _InputIterator = _InputIterator; _Key = int; _Tp = 
std::map, std::basic_string >; _Compare 
= std::less; _Alloc = std::allocator, std::basic_string > > >

First steps with Boost.Python

stefano — Wed, 02 Dec 2015 18:11:41 +0000

“Finally a modern, pragmatic language.”

Who among us wants to work with a multi-paradigm, highly-expressive, fast-evolving language with a huge standard library? We are talking, as usual, about… Python.

There are scenarios where our trusty champion (C++11) doesn’t cut it. For a prototype to rush out in a hurry, a “single use” script, the server side of a web application, research code… the complexity of C++ is more a problem than an asset.

How can we continue to take advantage of C++ efficiency or re-use some already available code without looking like old-fashioned cavemen?

The Python interpreter can load modules written in C, compiled as dynamic libraries. Boost.Python helps, a lot, to prepare them. It joins the power of Boost and C++ with the ease of use of Python.

Danger: even if all the examples compile, run and pass the tests this is not the ultimate guide about Boost.Python. The code is meant to be an example, it mirrors our (minimal) experience with Boost.Python. Do not hesitate to report any error we made.

A speed problem

Let’s see a (not too) practical use case. There are numbers which are equal to the sum of their divisors (6 = 3 + 2 + 1; perfect numbers). The marketing department believes it is something hot, but we must compute as many perfect numbers as possible and release them before our competitors. The development speed enabled by Python is key, after 5 minutes we release Pefect 1.0®:

def find_divisors(number):
	divisors = []
	for i in range(1, number):
		if number % i == 0:
			divisors.append(i)
	return divisors


def perfect(number):
	divisors = find_divisors(number)
	return number == sum(divisors)


def find_perfect_numbers(how_many):
	found = 0
	number_to_try = 1
	while (found < how_many):
		if perfect(number_to_try):
			print number_to_try
			found += 1
		number_to_try += 1


if __name__ == "__main__":
	find_perfect_numbers(4)  # Look for more at your own risk.
							 # And prepare for a long wait.

This code is not really “pythonic” (https://www.python.org/dev/peps/pep-0008/), but it really was created, tested and debugged in less time that it takes to read a C++ compilation error.¹.

Unfortunately the execution time is similar: 6.5 seconds on my test machine (which is not your test machine, nor the production server, nor the Python fanboy’s PC which can run everything in a picosecond… it’s an example!).

Let’s look for the bottleneck with the profiler, like the savvy engineers we are.

import cProfile

... same code as before ...

if __name__ == "__main__":
	cProfile.run("find_perfect_numbers(4)")

Here is the outcome:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.657    5.657 :1()
     8128    0.283    0.000    5.582    0.001 purePython.py:16(perfect)
        1    0.075    0.075    5.657    5.657 purePython.py:21(find_perfect_numbers)
     8128    4.294    0.001    5.229    0.001 purePython.py:8(find_divisors)
    66318    0.528    0.000    0.528    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     8128    0.406    0.000    0.406    0.000 {range}
     8128    0.070    0.000    0.070    0.000 {sum}

find_divisors “steals” almost all of the 5.6 seconds it took to run this test!

boost::python

No-one denies that it is possible to write efficient code in Python (Java, VisualWhatever, this week’s functional language…), but optimize the algorithm of find_divisors is out of the question: we are here to show off Boost.Python, not to give an Algebra lesson.

First of all, we get our hands on Boost.Python. On a Linux box this is as easy as typing:

sudo apt-get install libboost-all-dev

You may need to install Python’s “dev” packages. It is easy to find instructions for any platform over the web, but installing (and compiling) the library may be the most difficult step. Do not lose heart.

This is the C++ code:

#include "boost/python.hpp"  // (1)

boost::python::list findDivisors(uint64_t number) // (2)
{
	boost::python::list divisors;
	for (uint64_t i = 1; i < number; ++i)  // (3)
		if (number % i == 0)
			divisors.append(i);
	return divisors;
}

BOOST_PYTHON_MODULE(divisors)
{
    using namespace boost::python;
    def("find_divisors", findDivisors);  // (4)
}

Include Boost.Python. It must be included before any other header to avoid compilation warning.
The function corresponding to the one we want to replace in Python. It keeps the same signature (takes an integer, returns a list) as the Python original to achieve a “transparent” replacement.
Even the logic is exactly the same. Just a few syntax differences. The C++ runtime should make the difference in this case.
Declare the function with “def” (…hey, it’s just like Python).

The guide (http://www.boost.org/doc/libs/1_59_0/libs/python/doc/) has a clear explanation with all the details.

Compiling, sadly, is not so easy, we will have to adapt to your case. Let’s check a step-by-step example (naturally, this is a single line on the console):

g++ divisors.cpp			    compile a C++ file, as usual
 -o divisors.so  			    file name: Python demands it is the same as the module name
-I /usr/include/python2.7/	            to include Python's headers (I already set boost in the path)
-l python2.7 -lboost_python -lboost_system  include python, boost
-shared -fPIC -Wl,-export-dynamic           request to create a dynamic library

stackoverflow.com will cover the rest. Notice that “to level the play field”, I do not use optimization options in g++.

Once our library is in the system path (some place where Python can find it) we can include it in Python:

from divisors import find_divisors

def perfect(number):
	divisors = find_divisors(int(number))  # Calls the C++ implementation
	return number == sum(divisors)

… same code as before …

Run time: a bit less than a second. We are witnessing the classic “80% of time is wasted by 20% of the code”. The same algorithm is 6 times faster, but the part where we had to deal with low level programming (yes, still C++98!) is just one function. Everywhere else we can still take advantage of Python’s practicality.

Some more opportunities

Boost.Python is not limited to primitive types conversion or adapters to pass Python lists in C++. Here is a selection of “common” cases often met when doing “C with classes”:

class ReuseInPython 
{
	public:
		ReuseInPython() {};
		ReuseInPython(int x, const std::string& y) {};
		int instanceVariable;
		static void staticMethod() {};
		void method() {}
};

BOOST_PYTHON_MODULE(oop)
{
    using namespace boost::python;
    class_("implemented_in_CPP")		// (1)
	.def(init())  // (2)
	.def_readwrite("instance_variable", &ReuseInPython::instanceVariable)  // (3)
	.def("static_method", &ReuseInPython::staticMethod).staticmethod("static_method")  // (4)        
	.def("method", &ReuseInPython::method)  // (5)
    ;
}

Open a class declaration, passing a string with its alias in Python.
Translate the constructor in Python (…init, does that ring a bell?).
The Python “translation” won’t balk at public instance variables. Here is one.
Only repeat the Python name to expose a static method.
The run-of-the mill, basic instance method.

Once it is compiled (…sounds easy, but…) we can use the C++ class in Python:

from oop import implemented_in_CPP

x = implemented_in_CPP()
y = implemented_in_CPP(3, "hello")
x.instance_variable = 23
implemented_in_CPP.static_method()
x.method()

Boost takes care of converting parameters, return types etcetera. There are options to “export” directly STL classes (and more can be defined if something is missing) and for the return type policy (by reference, by copy…). There are really many options, trust the official guide.

When the going gets tough, Boost keeps going. A sample:

class Problems
{
	public:
		void print() {
			std::cout << "cout still works" << std::endl;
		}

		void exception() {
			throw std::runtime_error("Oh, no!!!");
		}

		void coreDump()	{
			int * nullPointer = 0;
			*nullPointer = 24;
		}
};

BOOST_PYTHON_MODULE(oop)
{
    using namespace boost::python;

    class_("Problems")
	.def("print_something", &Problems::print  // Print is a Python keyword.    
	.def("exception", &Problems::exception)
	.def("coreDump", &Problems::coreDump)
    ;
}

The Python “test-driver”, with an example of the output:

from oop import Problems
p = Problems()
p.print_something()
try:
	p.exception()
except RuntimeError as e:
	print "The C++ code bombed: " + str(e);
p.coreDump()

cout still works	(1)
The C++ code bombed: Oh, no!!!	(2)
Segmentation fault (core dumped)	(3)

Debugging with std::cout is not a recommended practice… but it works!
Exception are perfectly “thrown” to the Python runtime.
…well, what did you expect?

Multithreading

Boost.Python is not the only weapon to tackle problems that demand efficiency.. Multithreading is a common way to improve performance, as good when computing divisors as to mine bitcoins or crack passwords. Here is a C++ class which is about to jump in a Python thread:

class JobFindDivisors {

	public:
		JobFindDivisors(uint64_t number, uint64_t begin, uint64_t end) :
			number(number), begin(begin), end(end) {}
		
		boost::python::list findDivisors()
		{
			std::cout << "Start" << std::endl;

			boost::python::list divisors;
			for (uint64_t i = begin; i < end; ++i)
				 if (number % i == 0)
					divisors.append(i);

			std::cout << "end" << std::endl;
			return divisors;
		}

	private:
		uint64_t number;
		uint64_t begin;
 		uint64_t end;
};

BOOST_PYTHON_MODULE(factor)
{
    using namespace boost::python;
    class_("JobFindDivisors", init())
	.def("find_divisors", &JobFindDivisors::findDivisors)
    ;
}

The “JobFindDivisors” object checks if the numbers between “begin” and “end” are divisors of “number”. We parallelize the problem of finding all the divisors in many “jobs”, dedicating each object to a different interval. No data is shared between jobs, there are no concurrency problems. This is the only advantage of such a solution, but once again let’s forget about math (and proper software engineering).

The Python call:

from threading import Thread
from factor import JobFindDivisors

class Job():									# (1)
	def __init__(self, number, begin, end):
		self.cppJob = JobFindDivisors(number, begin, end)
		self.divisors = []
	
	def __call__(self):
		self.divisors = self.cppJob.find_divisors()

		
def find_divisors_in_parallel(number):			# (2)
	limit = number / 2

	job1 = Job(number, 1, limit)
	job2 = Job(number, limit, number)

	t1 = Thread(None, job1)
	t2 = Thread(None, job2)
	
	t1.start()
	t2.start()
	t1.join()
	t2.join()

	return [job1.divisors, job2.divisors]


if __name__ == "__main__":
	print  find_divisors_in_parallel(223339244); # (3)

Encapsulate the C++ Job to “keep it simple”, without exporting a C++ callable.
This method creates 2 jobs, does “fork and join” (or, as they say nowadays, “map and reduce”), then prints the results.
Factoring any number would do.

The output: do you remember the “Start” and “end” printouts in the C++ class? After around 8 seconds the computation terminates, with no parallelism whatsoever:

Start
end
Start
end
[[1L, 2L, 4L, 53L, 106L, 212L, 1053487L, 2106974L, 4213948L, 55834811L], [111669622L]]

Working as designed. Python’s objects are protected by the Global Interpreter Lock (GIL). It is up to the programmer to release it in each thread to “give way” to the other threads. The trick is to call pure Python code only when holding the lock.

As usual in C++ we control resources with RAII. The idiom for the GIL is (https://wiki.python.org/moin/boost.python/HowTo#Multithreading_Support_for_my_function):

class ScopedGILRelease
{
public:
    inline ScopedGILRelease(){
        m_thread_state = PyEval_SaveThread();
    }
    inline ~ScopedGILRelease()    
        PyEval_RestoreThread(m_thread_state);
        m_thread_state = NULL;
    }
private:
    PyThreadState * m_thread_state;
};

Release the lock in the C++ class:

boost::python::list findDivisors() {
	ScopedGILRelease noGil = ScopedGILRelease();  // (1)
	std::cout << "Start" << std::endl;

	boost::python::list divisors;
	for (uint64_t i = begin; i < end; ++i)
		 if (number % i == 0)
			divisors.append(i);  // (2) Possible core dump!

	std::cout << "end" << std::endl;
	return divisors;
}

When this variable goes out of scope, the lock is taken again. Like a “reversed” smart pointer.
Here is where we will certainly take a core dump. But only in production.

Do you remember that “the trick is to call pure Python code only when holding the lock”? Line (2) may do just that, without the lock. You can try to massively grow the list (say erase the “if (number…” and save all the number in the list). I believe that, maybe (please read the official documents for the real answer!) the Python interpreter must allocate a bigger list, but without the lock all it gets is corrupted memory.

Let’s encapsulate the parallelizable section in a dedicated scope, saving the numbers in a variable which we do not share with Python:

boost::python::list findDivisors()
{
	std::cout << "Start" << std::endl;
	std::vector divisorsTemp;
	boost::python::list divisors;
	{
		ScopedGILRelease noGil = ScopedGILRelease();
		for (uint64_t i = begin; i < end; ++i)
			if (number % i == 0)
				divisorsTemp.push_back(i);
	} // noGil goes out of scope, we take the lock again.
	BOOST_FOREACH(uint64_t n, divisorsTemp) {
		divisors.append(n);
	}
	std::cout << "end" << std::endl;
	return divisors;
}

After six and a half seconds (-2 compared with the “accidentally sequential” version) we get the expected interleaving (Start Start – end end). We can invest those 2 seconds to think to a less duck-tape-and-chewing-gum-oriented solution.

This completes the introduction to Boost.Python. Now we know how to “push” C++ modules in Python applications either to re-use, either for efficiency reasons. Boost.Python connects the two worlds without sacrificing Python’s simplicity and without adding constraints to C++, even if some spots do need care. Above all, from now on we are going to always have the last word in the unavoidable “Python vs C++” flame in every forum of the world.

1It is true: it takes less time to create a whole program in Python than to fix a single bug in C++.

Try it. Ready, steady, go:

/usr/include/c++/4.8/bits/stl_map.h:646:7: note: no known conversion for argument 1 from 
‘int’ to ‘std::map, std::basic_string<
;char> > >::iterator {aka std::_Rb_tree_iterator, std::basic_string > > >}’

/usr/include/c++/4.8/bits/stl_map.h:670:9: note: template void std::map<_Key, _Tp, _Compare, _Alloc>::insert(_InputIterator, _InputIterator) [with _InputIterator = _InputIterator; _Key = int; _Tp = std::map, std::basic_string >; _Compare = std::less; _Alloc = std::allocator, std::basic_string > > >

Cat: a C++14 functional library

Nicola Bonelli — Wed, 29 Apr 2015 14:20:36 +0000

The rise of functional programming has affected many programming languages, and C++ could not escape from it. The need of paradigms like partial application (via currying) and functional composition are now a reality also in C++, and the spread of libraries like FIT and FTL is an evidence.

Cat is a C++14 library, inspired by Haskell. Cat aims at pushing the functional programming approach in C++ to another level.

The added value of Cat is twofold. On one hand it works for filling the gap in the language with respect to functional programming. For this purpose, some utility functions and classes are provided (callable wrappers with partial application, sections, utilities for tuples, extended type traits, alternative forwarding functions, etc).

On the other hand Cat promotes the use of generic programming with type classes, inspired by Category Theory. A framework for building type-classes along with a dozen of them (Functor, Applicative, Monoids, Monads, Read, Show, to mention just a few) and the related instances dropped in the context of C++ are included in the library.

Cat is distributed under the MIT license and it’s available for download at the address https://cat.github.io.

I compilatori e lo standard ISO: tabella comparativa

Raffaele Rialdi — Sun, 02 Mar 2014 21:52:12 +0000

Tutti i produttori di compilatori hanno preso molto sul serio la necessità di essere aderenti allo standard ISO.
Con le specifiche C++11 e l’ormai imminente C++14 ogni sviluppatore ha la possibilità di “mettere in cassaforte” i sorgenti garantendogli lunga vita e sopravvivenza al compilatore utilizzato.
Non è quindi una rincorsa alle mode ma un vero e proprio investimento che garantisce una migliore gestione del ciclo di vita del software.

Come mostra il file allegato, i due compilatori che hanno già raggiunto lo scopo sono Clang e GCC, con sole poche mancanze.
Sul lato Intel, che tradizionalmente ha molto a cuore le librerie di calcolo e il supporto al parallelismo, ho avuto modo di parlare direttamente con un responsabile del compilatore il quale mi ha confermato la volontà di Intel di raggiungere la piena conformità di C++11 e C++14.
Per quanto riguarda Microsoft, la recente CTP del compilatore ha già aggiunto molti tasselli mancanti ma Alessandro Contenti, speciale ospite alla nostra track C++ dei recenti Community Days 2014, ci ha raccontato il lavoro in corso sul compilatore che presto porterà alla copertura degli standard.

CppISO-Feb2014-r1

La tabella è stata creata dal sottoscritto e potrebbe contenere errori. Se ne trovaste uno sarò ben lieto di aggiornarla.

Enjoy,
Raffaele