boost – Italian C++ Community https://www.italiancpp.org Mon, 24 Aug 2020 13:03:53 +0000 it-IT hourly 1 https://wordpress.org/?v=4.7.18 106700034 Italian C++ Conference 2017 https://www.italiancpp.org/event/conference-2017/ Sat, 17 Jun 2017 06:30:00 +0000 http://www.italiancpp.org/?post_type=tribe_events&p=7276
La conferenza C++, made in Italy
Saturday June 17, Milan

 

160+ partecipanti.

 

Wrap-Up Post

 

Videos

Pictures

 

International reader? Read this page in English

 

 

 

Una track completamente in inglese
grazie ai nostri 5 ospiti internazionali eccezionali:

Keynote speaker: Michael Wong

 


Agenda (video e slides seguendo i link)

time duration room 9 room 7
8.15 – 9.00 45' Check-in
9.00 – 9.15 15' Welcome Message
Marco Arena
-
9.15 – 10.30 75' Keynote: C++ executors to enable heterogeneous computing in tomorrow's C++ today (EN)
Michael Wong
-
10.30 – 11.00 30' Coffee Break
11.00 – 12.00 60' Quicker Sorting (EN)
Dietmar Kühl
An overly simple C++ idiomatic pattern language for message-based product families (ITA)
Carlo Pescio
12.05 – 12.50 45' Diversity and Inclusion in Microsoft (ITA)
Paola Presutto
-
12.50 – 14.00 70' Lunch
13.20 – 13.40 20' Bonus talk by JetBrains -
14.00 – 15.00 60' Boost vs Qt: What Could They Learn From Each Other? (EN)
Jens Weller
Lambda out: a simple pattern for generic output (ITA)
Davide Di Gennaro
15.10 – 16.10 60' Monads for C++ (EN)
Bartosz Milewski
Costruire un bridge C++ tra NodeJS e C# (ITA)
Raffaele Rialdi
16.10 – 16.40 30' Coffee Break
16.40 – 17.40 60' Functional C++ for Fun and Profit (EN)
Phil Nash
Una libreria di rete asincrona scritta in C++ ispirata a Node.js (ITA)
Stefano Cristiano
17.40 – 18.00 20' Closing Message
Marco Arena
-

 

L’Italian C++ Conference 2017 segue l’Italian C++ Code of Conduct.
Quindi, per favore sii professionale e rilassato, in questo modo tutti potranno divertirsi all’Italian C++ Conference 2017.

 

 

Sponsors

         

 

Partners

oreilly-logo

]]>
7276
Coroutines Internals https://www.italiancpp.org/2016/11/02/coroutines-internals/ Wed, 02 Nov 2016 09:15:30 +0000 http://www.italiancpp.org/?p=6862 Article's code is available on Github

What are coroutines and why should I care?

In The Art of Computer Programming Donald Knuth introduced coroutines as an alternative to the usual function caller/callee idiom where two pieces of code were treated as cooperating equals.
Coroutines can be thought of as language-level constructs that generalize subroutines by providing multiple exit/entry points. A normal subroutine usually has a starting point and one or more exit (return) points. A coroutine provides the ability to enter/exit its control flow at different spots therefore allowing for greater code expressiveness, preservation of automatic states across function calls and nonpreemptive multitasking.

It has to be noted that different programming languages can provide various levels of support for coroutines, e.g.

  • Languages supporting the yield keyword
  • Languages providing full support for async, await, yield

In this article we’ll focus on the former.

Thoughtful use of coroutines can lead to cleaner and more maintainable code in a variety of situations. As a motivating example let’s take for instance the following pseudocode

function start_accepting() {

  socket.async_accept(accept_handler);

  start_accepting();

}

function accept_handler() {

  socket.async_read(read_handler);
 
}

function read_handler(data) {

  request = parse_data(data);

  switch(request) {

    case SEND_DATA: {

      data_to_send = prepare_data_for(request);

      socket.async_write(data_to_send, write_handler);

    } break;

  };

}

function write_handler() {

  ... // continue execution

}

Asynchronous programming is often the preferred way of accomplishing potentially blocking operations without stalling the thread on blocking calls. In the pseudocode above we’re assuming (and omitting for clarity’s sake) that all operations are queued and handled by an event loop manager (a common and powerful idiom in asynchronous applications programming, cfr. boost::asio).

Coroutines allow modeling the same behavior with more readable code

coroutine acceptor() {

  while(true) {

    socket.async_accept_connection(yield); // To event manager

    start_coroutine(serve_request);

  }

}

coroutine serve_request() {

  socket.async_read(data, yield);

  request = parse_data(data);

  switch(request) {

    case SEND_DATA: {

      data_to_send = prepare_data_for(request);

      socket.async_write(data_to_send, yield);

      ... // Continue execution

    } break;

  };

}

 

The code in serve_request() uses a sequential-looking paradigm

Coroutines let you create a structure that mirrors the actual program logic. Asynchronous operations don’t split functions, because there are no handlers to define what should happen when an asynchronous operation completes. Instead of having handlers call each other, the program can use a sequential structure.

(boost.asio-coroutines)

 

Standard support

At the time of writing this article coroutines didn’t make it for the next standard version (C++17) although recent MSVC versions already ship with an /await option to test experimental support for coroutines.

Coroutines internals

It is important to understand the role of coroutines in providing a collaborative non-preemptive multitasking: spawning a coroutine does not spawn a new thread of execution but coroutines waive execution by yielding to callers (asymmetric coroutines) or to other coroutines (symmetric coroutines) explicitly.

Since coroutines are concepts that have been known for a relatively long time many different techniques (both at language level and system level) have been devised (an interesting suggested reading: Duff’s device based coroutines).

Implementing basic support for asymmetric stackful coroutines can be a rewarding experience in terms of understanding the relationship between coroutines and the way these program flow constructs interact with callers and the underlying memory. Most of the code that will be presented is a pruned-down version of the coroutine implementation by Oliver Kowalke (cfr. boost::coroutine2) available with the boost libraries.

Abstracting away the execution context

In order to implement a coroutine context switch (in its simplest form from the callee to the caller) we need a mechanism to abstract the execution state of our routines and to save/restore stack, registers, CPU flags, etc.

A context switch between threads on x86_64 can be quite costly in terms of performances since it also involves a trip to kernel syscalls. Coroutines context switches in the same thread are far more lightweight and require no kernel interaction. The foundation bricks for userland context switches are contained in the boost/context library. These functionalities provide a solid abstraction that can provide ready-to-use stacks (either basic malloc’d stack buffers or even split stacks on supported architectures) to store our context data or complement system-level constructs (e.g. ucontext on Unix systems, Fibers on Windows). It has to be noted that boost::context used to support Windows fibers due to undocumented TEB-swapping related issues; after fixes were deployed support was dropped since the introduction of execution context v2.

In this article we’ll go for a fcontext implementation in assembler on a x86_64 Unix system (no system calls involved).

Saving the state

When a coroutine yields an fcontext switch should occur, i.e. we should save whatever state the routine was at that point in time and JMP to another context. On a recent Unix system calling conventions, object and executable file formats and other low-level ABI issues are defined by the System V ABI. On a x86_64 architecture the stack grows downwards and parameters to functions are passed in registers rdi, rsi, rcx, r8, r9 + additional stack space if needed. The stack is always 16-byte aligned before a call instruction is issued. Registers rbx, rsp, rbp, r12, r13, r14, and r15 are preserved across function calls while rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11 are scratch registers:

Return value Parameter Registers Scratch Registers Preserved Registers
rax, rdx rdi, rsi, rdx, rcx, r8, r9 + additional_stack rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11 rbx, rsp, rbp, r12, r13, r14, r15

Therefore following in boost::context‘s footsteps a reasonable memory layout is the following

/****************************************************************************************
 *                                                                                      *
 *  ----------------------------------------------------------------------------------  *
 *  |    0    |    1    |    2    |    3    |    4     |    5    |    6    |    7    |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x0   |   0x4   |   0x8   |   0xc   |   0x10   |   0x14  |   0x18  |   0x1c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        R12        |         R13       |         R14        |        R15        |  *
 *  ----------------------------------------------------------------------------------  *
 *  ----------------------------------------------------------------------------------  *
 *  |    8    |    9    |   10    |   11    |    12    |    13   |    14   |    15   |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x20  |   0x24  |   0x28  |  0x2c   |   0x30   |   0x34  |   0x38  |   0x3c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        RBX        |         RBP       |         RIP        |       EXIT        |  *
 *  ----------------------------------------------------------------------------------  *
 *                                                                                      *
 ****************************************************************************************/

The EXIT field is going to be left unused for our purposes but it will be left in place anyway.

The first thing we need is to allocate space to store the context data and make sure it has a valid alignment for the architecture we’re dealing with

// Allocate context-stack space
context_stack = (void*)malloc(64_Kb);

std::size_t space = UNIX_CONTEXT_DATA_SIZE + 64;
sp = static_cast<char*>(context_stack) + 64_Kb - space;

sp = std::align(64, UNIX_CONTEXT_DATA_SIZE, sp, space);
assert(sp != nullptr && space >= UNIX_CONTEXT_DATA_SIZE);

boost::context offers both memory-efficient on-demand growing stacks or fixed stack allocations (cfr. boost docs). In this example code we’ll go for a fixed stack allocation.
Since we can’t deal with registers directly in C++ we’ll have to fallback on a pure assembly routine. The GAS backend seems the logical tool of choice for this work. We therefore define an external function to link against our executable with C linkage:

extern "C" fcontext_t jump_to_context(fcontext_t context);

What is an fcontext_t? In a x86_64 world it is just a register’s content:

using fcontext_t = void*;

Luckily for us RIP will have already been set by the fact we’re invoking jump_to_context with a CALL instruction so we get an instruction pointer on the stack for free in our assembly code:

.text
.globl jump_to_context
.type jump_to_context,@function
.align 16
jump_to_context:
    pushq  %rbp  /* save RBP */
    pushq  %rbx  /* save RBX */
    pushq  %r15  /* save R15 */
    pushq  %r14  /* save R14 */
    pushq  %r13  /* save R13 */
    pushq  %r12  /* save R12 */

    /* omissis */

    /* restore RSP (pointing to context-data) from RDI */
    movq  %rdi, %rsp

    popq  %r12  /* restore R12 */
    popq  %r13  /* restore R13 */
    popq  %r14  /* restore R14 */
    popq  %r15  /* restore R15 */
    popq  %rbx  /* restore RBX */
    popq  %rbp  /* restore RBP */

    /* continue... */

.size jump_to_context,.-jump_to_context

/* Mark that we don't need executable stack. */
.section .note.GNU-stack,"",%progbits

Using CMake putting everything together becomes quite easy:

project(simple_crts CXX ASM)
cmake_minimum_required(VERSION 2.8.12)
set (CMAKE_CXX_STANDARD 14)

set_source_files_properties(jump_to_context_x86_64_elf_gas.S 
                            PROPERTIES COMPILE_FLAGS "-x assembler-with-cpp")

add_executable(simple_crts simple_crts.cpp jump_to_context_x86_64_elf_gas.S)
target_link_libraries(simple_crts ${CONAN_LIBS} pthread)

Trampolines to coroutines

Something is missing at this point: we need a valid RIP pointer to the coroutine to jump to. We could enter the coroutine and have another function store this information for us, but there’s a better way which avoids cluttering the coroutine code entirely: using a trampoline function.

Just as in boost::context, we define a trampoline function ourselves which, when jumped to, re-jumps to the caller and saves its context as a pre-stage for the coroutine:

void trampoline(fcontext_t ctx) {

  yield_ctx = jump_to_context(ctx);

  wannabe_coroutine();
}

What we have to do now is a simplified version of the make_context routine to set the first RIP towards the trampoline’s prologue:

// Do make_context's work (simplified)
// Do *NOT* try this at home (or even worse in the office)
void** addr = reinterpret_cast<void**>(static_cast<char*>(sp) +
                                         UNIX_CONTEXT_DATA_RIP_OFFSET);
*addr = reinterpret_cast<void*>(&trampoline);

// In a more complex case there might be additional initialization and
// frame adjustments going on
coroutine_ctx = jump_to_context(sp);

So right now we have a valid trampoline RIP set in place:

/****************************************************************************************
 *                                                                                      *
 *  ----------------------------------------------------------------------------------  *
 *  |    0    |    1    |    2    |    3    |    4     |    5    |    6    |    7    |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x0   |   0x4   |   0x8   |   0xc   |   0x10   |   0x14  |   0x18  |   0x1c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        R12        |         R13       |         R14        |        R15        |  *
 *  ----------------------------------------------------------------------------------  *
 *  ----------------------------------------------------------------------------------  *
 *  |    8    |    9    |   10    |   11    |    12    |    13   |    14   |    15   |  *
 *  ----------------------------------------------------------------------------------  *
 *  |   0x20  |   0x24  |   0x28  |  0x2c   |   0x30   |   0x34  |   0x38  |   0x3c  |  *
 *  ----------------------------------------------------------------------------------  *
 *  |        RBX        |         RBP       |         RIP        |       EXIT        |  *
 *  ----------------------------------------------------------------------------------  *
 *                                           ^^^^^^^^^^^^^^^^^^^^                       *
 ****************************************************************************************/

This kickstarts the bouncing to/from the trampoline:

.text
.globl jump_to_context
.type jump_to_context,@function
.align 16
jump_to_context:
    pushq  %rbp
    pushq  %rbx 
    pushq  %r15 
    pushq  %r14 
    pushq  %r13 
    pushq  %r12 

    /* store RSP (pointing to context-data) in RAX */
    movq  %rsp, %rax

    movq  %rdi, %rsp

    popq  %r12 
    popq  %r13 
    popq  %r14 
    popq  %r15 
    popq  %rbx 
    popq  %rbp 

    /* restore return-address (must have been put on the new stack) */
    popq  %r8

    /*
       pass the old context as first parameter (if we're headed
       towards a landing function)
    */
    movq  %rax, %rdi

    /* indirect jump to new context */
    jmp  *%r8

.size jump_to_context,.-jump_to_context
.section .note.GNU-stack,"",%progbits

It is important to note that we’re keeping the stack aligned during this entire process (recall that the stack has to be 16-bytes aligned before a call instruction is issued).

The process roughly goes on like this:
coroutines_graph1

It has to be noted that the trampoline function might reserve stack space for its parameters as well. In the code above we allocated 64Kb of heap space to be used as stack space for context operations. So after the first jump the sp automatic variable is no longer reliable. coroutine_ctx should be used instead.

Resuming fcontext

Resuming trampoline’s fcontext requires another call and rip-save and stack pointer adjustment to coroutine_ctx. Trampoline’s old RIP will be available for free after we’ve restored the first 48 bytes of the fcontext.

Execution can afterwards continue to the designated coroutine. At this point the coroutine should be somehow encapsulated to be able to use the yield_ctx context pointer: that is the gateway to our (in an asymmetric view) caller context.

Each time we want to yield execution back to the caller we’ll have to jump_to_context to the yield_ctx:

void yield() {
  yield_ctx = jump_to_context(yield_ctx);
}

void wannabe_coroutine() {
  std::cout << "I wanna be a coroutine when I grow my stack space up\n";
  yield();
  std::cout << "Hooray!\n";
  yield();
}

Notice that we’re also reassigning the variable with the return value provided by jump_to_context. This assignment is not executed until the control flow comes back to the yield() function:

.. save

/* store RSP (pointing to context-data) in RAX */
movq  %rsp, %rax

.. restore

This is a cooperative behavior example: each jump_to_context() invocation from this point onward actually returns fcontext data for the previous invocation.

The rest of the code bounces back and forth through the contexts resulting in the following sequence:

coroutines_graph2

At the end of the day the stack is freed (sounds weird to say) and the program terminates.

Exercise: wrapping up

As a didactic exercise (i.e. do not EVER use this code in a production environment) we can use some metaprogramming to wrap our coroutines and avoid polluting our code with stack adjustments and cleanup boilerplate. Eventually we’d like to end up with code like this

int g_fib = -1;

void fibonacci_gen() {

  int first = 1;
  g_fib = first;
  yield();

  int second = 1;
  g_fib = second;
  yield();

  for (int i = 0; i < 9; ++i) {

    int third = first + second;
    first = second;
    second = third;
    g_fib = third;
    yield();

  }

}

int main() {

  // Outputs the first 10 Fibonacci numbers

  coroutine<void(void)> coro(fibonacci_gen);

  for (int i = 0; i <= 10; ++i) {
      coro.resume();

    if(i) std::cout << g_fib << " ";
  }

}

To do this we create a templated wrapper class:

template class coroutine;

that will handle the stack and trampoline setup for us. One difference from the first example is the need for a wrapper that will handle the trampoline invocation (shields us from implementation-dependent issues):

template 
void call_member_trampoline(coroutine *instance, fcontext_t ctx) {
  instance->trampoline(ctx);
}

The trampoline is therefore modified as follows:

void trampoline(fcontext_t ctx) {

  size_t index = yield_ctx.size() - 1;
  yield_ctx[index] = jump_to_context(this, ctx);

  this->m_fn();
}

The only difference in the jump_to_context() function is in handling its new arity:

/* restore RSP (pointing to context-data) from RSI */
movq  %rsi, %rsp

and the promotion of %rdi from scratch-register to parameter-register (since we’re directly jumping to a destination context’s RIP).

The rest of the code remains largely unchanged.

Back to boost::context

If you’ve followed through the entire article and you made it here, you should by now know what the following boost::context2 program does:

#include <boost/context/all.hpp>
#include <iostream>
#include <array>

namespace ctx = boost::context::detail;

class Coroutine {
public:
  Coroutine() {
    my_context = ctx::make_fcontext(
      stack.data() + stack.size(),
      stack.size(),
      &Coroutine::dispatch
    );
  }
  virtual ~Coroutine() {}

  void operator()() {
    auto transfer_ctx = ctx::jump_fcontext(my_context, this);
    my_context = transfer_ctx.fctx;
  }

protected:
  void yield() {
    auto transfer_ctx = ctx::jump_fcontext(yield_context, 0);
    my_context = transfer_ctx.fctx;
  }

  virtual void call() = 0;

private:
  static void dispatch(ctx::transfer_t coroutine_ptr) {
    Coroutine *coroutine = reinterpret_cast<Coroutine *>(coroutine_ptr.data);
    coroutine->yield_context = coroutine_ptr.fctx;
    coroutine->call();
    while(true)
      coroutine->yield();
  }

private:
  ctx::fcontext_t my_context;
  ctx::fcontext_t yield_context;
  std::array<intptr_t, 66 * 1024> stack;
};

struct A : public Coroutine {
  void call() {
    std::cout << " __________________________________ " << std::endl;
    yield();
    std::cout << "|    _       _       |_|    _| |_  |" << std::endl;
    yield();
    std::cout << "|   |_|     |_|      | |     | |_  |" << std::endl;
  }
};

struct B : public Coroutine {
  void call() {
    std::cout << "|                                  |" << std::endl;
    yield();
    std::cout << "|  _| |_   _| |_      _    |_   _| |" << std::endl;
    yield();
    std::cout << "|                    |_|     |___| |" << std::endl;
    yield();
    std::cout << "|                                  |" << std::endl;
    yield();
    std::cout << "|__________________________________|" << std::endl;
  }
};

struct C : public Coroutine {
  void call() {
    std::cout << "|                     _       _    |" << std::endl;
    yield();
    std::cout << "| |_   _| |_   _|    | |     | |   |" << std::endl;
  }

  void operator++(int) {
    std::cout << "| ++It - The Italian C++ Community |" << std::endl;
    std::cout << "|__________________________________|" << std::endl;
  }
};


int main() {

  A a;
  B b;
  C c;
  for (size_t i = 0; i<10; ++i) {
    a();
    b();
    c();
  }

  c++; // An entire operator overloading to write 'c++'? Worth it!
}

Final words

Coroutines provide a powerful abstraction to offer the same level of concurrency one would get with asynchronous callbacks by offering at the same time a chance to write more maintainable code. At the time of writing this article boost offers context, coroutine and fiber libraries. As we’ve seen boost::context provides the foundation for userland context switches, boost::coroutine2 offers coroutines support (which conceptually have no need of synchronization whatsoever since they implement a nonpreemptive cooperative multitasking) and boost::fiber which builds on boost::context to add a scheduling mechanism: each time a fiber yields, control is given back to a scheduler manager which decides the next execution strategy (cfr. N4024).

As usual it is up to the programmer to carefully choose which abstractions are to be used in a specific context.

References and credits

Special thanks to the ++it community and Oliver Kowalke for providing insights and reviews of parts of this article.

]]>
6862
First steps with Boost.Python https://www.italiancpp.org/2015/12/02/first-steps-with-boost-python/ Wed, 02 Dec 2015 18:11:41 +0000 http://www.italiancpp.org/?p=5479 “Finally a modern, pragmatic language.”

Who among us wants to work with a multi-paradigm, highly-expressive, fast-evolving language with a huge standard library? We are talking, as usual, about… Python.

There are scenarios where our trusty champion (C++11) doesn’t cut it. For a prototype to rush out in a hurry, a “single use” script, the server side of a web application, research code… the complexity of C++ is more a problem than an asset.

How can we continue to take advantage of C++ efficiency or re-use some already available code without looking like old-fashioned cavemen?

The Python interpreter can load modules written in C, compiled as dynamic libraries. Boost.Python helps, a lot, to prepare them. It joins the power of Boost and C++ with the ease of use of Python.

Danger: even if all the examples compile, run and pass the tests this is not the ultimate guide about Boost.Python. The code is meant to be an example, it mirrors our (minimal) experience with Boost.Python. Do not hesitate to report any error we made.

A speed problem

Let’s see a (not too) practical use case. There are numbers which are equal to the sum of their divisors (6 = 3 + 2 + 1; perfect numbers). The marketing department believes it is something hot, but we must compute as many perfect numbers as possible and release them before our competitors. The development speed enabled by Python is key, after 5 minutes we release Pefect 1.0®:

def find_divisors(number):
	divisors = []
	for i in range(1, number):
		if number % i == 0:
			divisors.append(i)
	return divisors


def perfect(number):
	divisors = find_divisors(number)
	return number == sum(divisors)


def find_perfect_numbers(how_many):
	found = 0
	number_to_try = 1
	while (found < how_many):
		if perfect(number_to_try):
			print number_to_try
			found += 1
		number_to_try += 1


if __name__ == "__main__":
	find_perfect_numbers(4)  # Look for more at your own risk.
							 # And prepare for a long wait.

This code is not really “pythonic” (https://www.python.org/dev/peps/pep-0008/), but it really was created, tested and debugged in less time that it takes to read a C++ compilation error.1.

Unfortunately the execution time is similar: 6.5 seconds on my test machine (which is not your test machine, nor the production server, nor the Python fanboy’s PC which can run everything in a picosecond… it’s an example!).

Let’s look for the bottleneck with the profiler, like the savvy engineers we are.

import cProfile

... same code as before ...

if __name__ == "__main__":
	cProfile.run("find_perfect_numbers(4)")

Here is the outcome:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.657    5.657 <string>:1()
     8128    0.283    0.000    5.582    0.001 purePython.py:16(perfect)
        1    0.075    0.075    5.657    5.657 purePython.py:21(find_perfect_numbers)
     8128    4.294    0.001    5.229    0.001 purePython.py:8(find_divisors)
    66318    0.528    0.000    0.528    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     8128    0.406    0.000    0.406    0.000 {range}
     8128    0.070    0.000    0.070    0.000 {sum}

find_divisors “steals” almost all of the 5.6 seconds it took to run this test!

boost::python

No-one denies that it is possible to write efficient code in Python (Java, VisualWhatever, this week’s functional language…), but optimize the algorithm of find_divisors is out of the question: we are here to show off Boost.Python, not to give an Algebra lesson.

First of all, we get our hands on Boost.Python. On a Linux box this is as easy as typing:

sudo apt-get install libboost-all-dev

You may need to install Python’s “dev” packages. It is easy to find instructions for any platform over the web, but installing (and compiling) the library may be the most difficult step. Do not lose heart.

This is the C++ code:

#include "boost/python.hpp"  // (1)

boost::python::list findDivisors(uint64_t number) // (2)
{
	boost::python::list divisors;
	for (uint64_t i = 1; i < number; ++i)  // (3)
		if (number % i == 0)
			divisors.append(i);
	return divisors;
}

BOOST_PYTHON_MODULE(divisors)
{
    using namespace boost::python;
    def("find_divisors", findDivisors);  // (4)
}

  1. Include Boost.Python. It must be included before any other header to avoid compilation warning.
  2. The function corresponding to the one we want to replace in Python. It keeps the same signature (takes an integer, returns a list) as the Python original to achieve a “transparent” replacement.
  3. Even the logic is exactly the same. Just a few syntax differences. The C++ runtime should make the difference in this case.
  4. Declare the function with “def” (…hey, it’s just like Python).

The guide (http://www.boost.org/doc/libs/1_59_0/libs/python/doc/) has a clear explanation with all the details.

Compiling, sadly, is not so easy, we will have to adapt to your case. Let’s check a step-by-step example (naturally, this is a single line on the console):

g++ divisors.cpp			    compile a C++ file, as usual
 -o divisors.so  			    file name: Python demands it is the same as the module name
-I /usr/include/python2.7/	            to include Python's headers (I already set boost in the path)
-l python2.7 -lboost_python -lboost_system  include python, boost
-shared -fPIC -Wl,-export-dynamic           request to create a dynamic library

stackoverflow.com will cover the rest. Notice that “to level the play field”, I do not use optimization options in g++.

Once our library is in the system path (some place where Python can find it) we can include it in Python:

from divisors import find_divisors

def perfect(number):
	divisors = find_divisors(int(number))  # Calls the C++ implementation
	return number == sum(divisors)

… same code as before …

Run time: a bit less than a second. We are witnessing the classic “80% of time is wasted by 20% of the code”. The same algorithm is 6 times faster, but the part where we had to deal with low level programming (yes, still C++98!) is just one function. Everywhere else we can still take advantage of Python’s practicality.

Some more opportunities

Boost.Python is not limited to primitive types conversion or adapters to pass Python lists in C++. Here is a selection of “common” cases often met when doing “C with classes”:

class ReuseInPython 
{
	public:
		ReuseInPython() {};
		ReuseInPython(int x, const std::string& y) {};
		int instanceVariable;
		static void staticMethod() {};
		void method() {}
};

BOOST_PYTHON_MODULE(oop)
{
    using namespace boost::python;
    class_<ReuseInPython>("implemented_in_CPP")		// (1)
	.def(init<int, std::string>())  // (2)
	.def_readwrite("instance_variable", &ReuseInPython::instanceVariable)  // (3)
	.def("static_method", &ReuseInPython::staticMethod).staticmethod("static_method")  // (4)        
	.def("method", &ReuseInPython::method)  // (5)
    ;
}

  1. Open a class declaration, passing a string with its alias in Python.
  2. Translate the constructor in Python (…init, does that ring a bell?).
  3. The Python “translation” won’t balk at public instance variables. Here is one.
  4. Only repeat the Python name to expose a static method.
  5. The run-of-the mill, basic instance method.

Once it is compiled (…sounds easy, but…) we can use the C++ class in Python:

from oop import implemented_in_CPP

x = implemented_in_CPP()
y = implemented_in_CPP(3, "hello")
x.instance_variable = 23
implemented_in_CPP.static_method()
x.method()

Boost takes care of converting parameters, return types etcetera. There are options to “export” directly STL classes (and more can be defined if something is missing) and for the return type policy (by reference, by copy…). There are really many options, trust the official guide.

When the going gets tough, Boost keeps going. A sample:

class Problems
{
	public:
		void print() {
			std::cout << "cout still works" << std::endl;
		}

		void exception() {
			throw std::runtime_error("Oh, no!!!");
		}

		void coreDump()	{
			int * nullPointer = 0;
			*nullPointer = 24;
		}
};

BOOST_PYTHON_MODULE(oop)
{
    using namespace boost::python;

    class_<Problems>("Problems")
	.def("print_something", &Problems::print  // Print is a Python keyword.    
	.def("exception", &Problems::exception)
	.def("coreDump", &Problems::coreDump)
    ;
}

The Python “test-driver”, with an example of the output:

from oop import Problems
p = Problems()
p.print_something()
try:
	p.exception()
except RuntimeError as e:
	print "The C++ code bombed: " + str(e);
p.coreDump()

cout still works	(1)
The C++ code bombed: Oh, no!!!	(2)
Segmentation fault (core dumped)	(3)
  1. Debugging with std::cout is not a recommended practice… but it works!
  2. Exception are perfectly “thrown” to the Python runtime.
  3. …well, what did you expect?

Multithreading

Boost.Python is not the only weapon to tackle problems that demand efficiency.. Multithreading is a common way to improve performance, as good when computing divisors as to mine bitcoins or crack passwords. Here is a C++ class which is about to jump in a Python thread:

class JobFindDivisors {

	public:
		JobFindDivisors(uint64_t number, uint64_t begin, uint64_t end) :
			number(number), begin(begin), end(end) {}
		
		boost::python::list findDivisors()
		{
			std::cout << "Start" << std::endl;

			boost::python::list divisors;
			for (uint64_t i = begin; i < end; ++i)
				 if (number % i == 0)
					divisors.append(i);

			std::cout << "end" << std::endl;
			return divisors;
		}

	private:
		uint64_t number;
		uint64_t begin;
 		uint64_t end;
};

BOOST_PYTHON_MODULE(factor)
{
    using namespace boost::python;
    class_<JobFindDivisors>("JobFindDivisors", init<uint64_t, uint64_t, uint64_t>())
	.def("find_divisors", &JobFindDivisors::findDivisors)
    ;
}

The “JobFindDivisors” object checks if the numbers between “begin” and “end” are divisors of “number”. We parallelize the problem of finding all the divisors in many “jobs”, dedicating each object to a different interval. No data is shared between jobs, there are no concurrency problems. This is the only advantage of such a solution, but once again let’s forget about math (and proper software engineering).

The Python call:

from threading import Thread
from factor import JobFindDivisors

class Job():									# (1)
	def __init__(self, number, begin, end):
		self.cppJob = JobFindDivisors(number, begin, end)
		self.divisors = []
	
	def __call__(self):
		self.divisors = self.cppJob.find_divisors()

		
def find_divisors_in_parallel(number):			# (2)
	limit = number / 2

	job1 = Job(number, 1, limit)
	job2 = Job(number, limit, number)

	t1 = Thread(None, job1)
	t2 = Thread(None, job2)
	
	t1.start()
	t2.start()
	t1.join()
	t2.join()

	return [job1.divisors, job2.divisors]


if __name__ == "__main__":
	print  find_divisors_in_parallel(223339244); # (3)

  1. Encapsulate the C++ Job to “keep it simple”, without exporting a C++ callable.
  2. This method creates 2 jobs, does “fork and join” (or, as they say nowadays, “map and reduce”), then prints the results.
  3. Factoring any number would do.

The output: do you remember the “Start” and “end” printouts in the C++ class? After around 8 seconds the computation terminates, with no parallelism whatsoever:

Start
end
Start
end
[[1L, 2L, 4L, 53L, 106L, 212L, 1053487L, 2106974L, 4213948L, 55834811L], [111669622L]]

Working as designed. Python’s objects are protected by the Global Interpreter Lock (GIL). It is up to the programmer to release it in each thread to “give way” to the other threads. The trick is to call pure Python code only when holding the lock.

As usual in C++ we control resources with RAII. The idiom for the GIL is (https://wiki.python.org/moin/boost.python/HowTo#Multithreading_Support_for_my_function):

class ScopedGILRelease
{
public:
    inline ScopedGILRelease(){
        m_thread_state = PyEval_SaveThread();
    }
    inline ~ScopedGILRelease()    
        PyEval_RestoreThread(m_thread_state);
        m_thread_state = NULL;
    }
private:
    PyThreadState * m_thread_state;
};

Release the lock in the C++ class:

boost::python::list findDivisors() {
	ScopedGILRelease noGil = ScopedGILRelease();  // (1)
	std::cout << "Start" << std::endl;

	boost::python::list divisors;
	for (uint64_t i = begin; i < end; ++i)
		 if (number % i == 0)
			divisors.append(i);  // (2) Possible core dump!

	std::cout << "end" << std::endl;
	return divisors;
}

  1. When this variable goes out of scope, the lock is taken again. Like a “reversed” smart pointer.
  2. Here is where we will certainly take a core dump. But only in production.

Do you remember that “the trick is to call pure Python code only when holding the lock”? Line (2) may do just that, without the lock. You can try to massively grow the list (say erase the “if (number…” and save all the number in the list). I believe that, maybe (please read the official documents for the real answer!) the Python interpreter must allocate a bigger list, but without the lock all it gets is corrupted memory.

Let’s encapsulate the parallelizable section in a dedicated scope, saving the numbers in a variable which we do not share with Python:

boost::python::list findDivisors()
{
	std::cout << "Start" << std::endl;
	std::vector divisorsTemp;
	boost::python::list divisors;
	{
		ScopedGILRelease noGil = ScopedGILRelease();
		for (uint64_t i = begin; i < end; ++i)
			if (number % i == 0)
				divisorsTemp.push_back(i);
	} // noGil goes out of scope, we take the lock again.
	BOOST_FOREACH(uint64_t n, divisorsTemp) {
		divisors.append(n);
	}
	std::cout << "end" << std::endl;
	return divisors;
}


After six and a half seconds (-2 compared with the “accidentally sequential” version) we get the expected interleaving (Start Start – end end). We can invest those 2 seconds to think to a less duck-tape-and-chewing-gum-oriented solution.

This completes the introduction to Boost.Python. Now we know how to “push” C++ modules in Python applications either to re-use, either for efficiency reasons. Boost.Python connects the two worlds without sacrificing Python’s simplicity and without adding constraints to C++, even if some spots do need care. Above all, from now on we are going to always have the last word in the unavoidable “Python vs C++” flame in every forum of the world.

1It is true: it takes less time to create a whole program in Python than to fix a single bug in C++.

Try it. Ready, steady, go:

/usr/include/c++/4.8/bits/stl_map.h:646:7: note: no known conversion for argument 1 from 
‘int’ to ‘std::map<int, std::map<std::basic_string<char>, std::basic_string&lt
;char> > >::iterator {aka std::_Rb_tree_iterator<std::pair<const int, std::map<std::basic_string<char>, std::basic_string<char> > > >}’

/usr/include/c++/4.8/bits/stl_map.h:670:9: note: template<class _InputIterator> void std::map<_Key, _Tp, _Compare, _Alloc>::insert(_InputIterator, _InputIterator) [with _InputIterator = _InputIterator; _Key = int; _Tp = std::map<std::basic_string<char>, std::basic_string<char> >; _Compare = std::less<int>; _Alloc = std::allocator<std::pair<const int, std::map<std::basic_string<char>, std::basic_string<char> > > >

]]>
5479
Community Days 2014 https://www.italiancpp.org/event/community-days-2014/ Tue, 25 Feb 2014 23:00:00 +0000 http://www.italiancpp.org/?post_type=tribe_events&p=3190 cpp-it-people

 

Tutte le foto

 

 

cd14_logo

 

Abbiamo curato una track dedicata al C++ per 50+ persone!

 

C++11 in azione

Marco Arena

 

Produttività, performance e affidabilità con Visual C++ 2013

Guido Perderzini

   

Le estensioni C++/CX per WinRT

Raffaele Rialdi

 

Standard Library: STL e Boost, la BCL di C++

Alessio Gogna

 

Grafica in poche righe con Visual C++, Cinder e altre librerie open source

Ale Contenti

 

VC6? No, grazie. Come migrare a C++11

Ale Contenti & Raffaele Rialdi

 

ILoveItalianCpp

Da sx: Guido Pederzini, Eva Gjeci, Raffaele Rialdi, Ale Contenti, Marco Arena, Alessio Gogna

 

]]>
3190