The result of tag: (2 results)

话说Python内存管理机制篇三之内存分配策略(二)

by LauCyun Aug 01,2017 17:50:34 16,379 views

上篇谈Python内存管理机制篇二之内存分配策略(一)讲了blockpool,接着继续讲arena

3 arena

在Python中,多个pool的集合就是一个arena

3.1 arena大小

上篇谈Python内存管理机制篇二之内存分配策略(一)讲了,pool的大小的默认值为4kb,同样每个arena的大小也有一个默认的值。

在Python3.5中,每个arena的大小的默认值为256kb,那么一个arena有多少个pool呢?很容易算出来,一个arenaARENA_SIZE / POOL_SIZEpool

[Objects/obmalloc.c]

#define ARENA_SIZE              (256 << 10)     /* 256KB */

3.2 arena结构

好了,说了那么多,那么arena是个神马鬼呢?先来看如下代码:

[Objects/obmalloc.c]

/* Record keeping for arenas. */
struct arena_object {
    /* The address of the arena, as returned by malloc.  Note that 0
     * will never be returned by a successful malloc, and is used
     * here to mark an arena_object that doesn't correspond to an
     * allocated arena.
     */
    uptr address;

    /* Pool-aligned pointer to the next pool to be carved off. */
    block* pool_address;

    /* The number of available pools in the arena:  free pools + never-
     * allocated pools.
     */
    uint nfreepools;

    /* The total number of pools in the arena, whether or not available. */
    uint ntotalpools;

    /* Singly-linked list of available pools. */
    struct pool_header* freepools;

    /* Whenever this arena_object is not associated with an allocated
     * arena, the nextarena member is used to link all unassociated
     * arena_objects in the singly-linked `unused_arena_objects` list.
     * The prevarena member is unused in this case.
     *
     * When this arena_object is associated with an allocated arena
     * with at least one available pool, both members are used in the
     * doubly-linked `usable_arenas` list, which is maintained in
     * increasing order of `nfreepools` values.
     *
     * Else this arena_object is associated with an allocated arena
     * all of whose pools are in use.  `nextarena` and `prevarena`
     * are both meaningless in this case.
     */
    struct arena_object* nextarena;
    struct arena_object* prevarena;
};

从上面的源码中不难获知,一个arena是有arena_objectpool集合组成。如图1所示:


图1 arena结构

arena_object仅仅是一个arena的一部分,就像pool_header只是的pool一部分一样,看上去作用是一样的,但是arena_object管理的内存和pool_header管理的内存是有不一样。神马,有不一样?具体的请先看图2


图2 pool和arena的内存布局区别

差别就是:pool_header管理的内存与pool_header自身是一块连续的内存arena_object管理的内存与自身是分离的

看上去没什么大不了的,但是这后面隐藏着这样一个事实:

pool_header被申请时,它所管理的block集合的内存一定也被申请了;但是arena_object被申请时,它所管理的pool集合的内存则没有被申请。换句话说,arena_objectpool集合在某一时刻需要建立联系。注意,这个建立联系的时刻是一个关键的时刻,Python从这个时刻一刀切下,将一个arena_object切分为两种状态:未使用(没有建立联系)和可用(建立了联系)。

3.3 arena两种状态:未使用(没有建立联系)和可用(建立了联系)

当一个arenaarena_object没有与pool集合建立联系时,这时的arena处于“未使用”状态;一旦建立了联系,这时arena就转换到了“可用”状态。对于每一种状态,都有一个arena的链表。

[Objects/obmalloc.c]

/* The head of the singly-linked, NULL-terminated list of available
 * arena_objects.
 */
static struct arena_object* unused_arena_objects = NULL;

/* The head of the doubly-linked, NULL-terminated at each end, list of
 * arena_objects associated with arenas that have pools available.
 */
static struct arena_object* usable_arenas = NULL;

有两个链表:

  • “未使用”的arena的链表表头是unused_arena_objectsarenaarena之间通过nextarena连接,是一个单向链表。
  • “可用”的arena的链表表头是usable_arenasarenaarena之间通过nextarenaprevarena连接,是一个双向链表。

不能理解?好吧,画一张图来展示一下,如图3是某一时刻arena的一个可能状态:


图3 arena在某一时刻的可能状态

3.4  申请新的arena

在了解arena的创建之前,先看一下arena创建时相关的一些参数定义:

[Objects/obmalloc.c]

/* Array of objects used to track chunks of memory (arenas). */
// arena_object 数组
static struct arena_object* arenas = NULL;
/* Number of slots currently allocated in the `arenas` vector. */
// 当前arenas中管理的arena_object的个数, 初始化时为0
static uint maxarenas = 0;

/* The head of the singly-linked, NULL-terminated list of available
 * arena_objects.
 */
// “未使用”的arena_object链表
static struct arena_object* unused_arena_objects = NULL;

/* The head of the doubly-linked, NULL-terminated at each end, list of
 * arena_objects associated with arenas that have pools available.
 */
// “可用”的arena_object链表
static struct arena_object* usable_arenas = NULL;

/* How many arena_objects do we initially allocate?
 * 16 = can allocate 16 arenas = 16 * ARENA_SIZE = 4MB before growing the
 * `arenas` vector.
 */
// 初始化时需要申请的arena_object的个数
#define INITIAL_ARENA_OBJECTS 16

/* Number of arenas allocated that haven't been free()'d. */
// 已分配的还没有被释放的arena_object的个数
static size_t narenas_currently_allocated = 0;

/* Total number of times malloc() called to allocate an arena. */
// 调用malloc()的次数
static size_t ntimes_arena_allocated = 0;
/* High water mark (max value ever seen) for narenas_currently_allocated. */
// narenas_currently_allocated的最大值
static size_t narenas_highwater = 0;

Python运行期间,它会通过new_arena来创建一个arena,先看一下new_arena的过程:

[Objects/obmalloc.c]

/* Allocate a new arena.  If we run out of memory, return NULL.  Else
 * allocate a new arena, and return the address of an arena_object
 * describing the new arena.  It's expected that the caller will set
 * `usable_arenas` to the return value.
 */
static struct arena_object*
new_arena(void)
{
    struct arena_object* arenaobj;
    uint excess;        /* number of bytes above pool alignment */
    void *address;

#ifdef PYMALLOC_DEBUG
    if (Py_GETENV("PYTHONMALLOCSTATS"))
        _PyObject_DebugMallocStats(stderr);
#endif
    //代码[1]:判断unused_arena_objects为NULL,为NULL是则没有“未使用的”arena_object列表,则需要扩充
    if (unused_arena_objects == NULL) {
        uint i;
        uint numarenas;
        size_t nbytes;

        /* Double the number of arena objects on each allocation.
         * Note that it's possible for `numarenas` to overflow.
         */
        //代码[2]:确定本次需要申请的arena_object的个数(第一次为16,之后每次翻倍)
        numarenas = maxarenas ? maxarenas << 1 : INITIAL_ARENA_OBJECTS;
        if (numarenas <= maxarenas)
            return NULL;                /* overflow(溢出了) */
#if SIZEOF_SIZE_T <= SIZEOF_INT
        if (numarenas > PY_SIZE_MAX / sizeof(*arenas))
            return NULL;                /* overflow(溢出了) */
#endif
        nbytes = numarenas * sizeof(*arenas);
        //代码[3]:使用 PyMem_RawRealloc 申请内存
        arenaobj = (struct arena_object *)PyMem_RawRealloc(arenas, nbytes);
        if (arenaobj == NULL)
            return NULL;
        arenas = arenaobj;

        /* We might need to fix pointers that were copied.  However,
         * new_arena only gets called when all the pages in the
         * previous arenas are full.  Thus, there are *no* pointers
         * into the old array. Thus, we don't have to worry about
         * invalid pointers.  Just to be sure, some asserts:
         */
        assert(usable_arenas == NULL);
        assert(unused_arena_objects == NULL);

        //代码[4]:初始化新申请的arena_object,并将其放入unused_arena_objects链表中
        /* Put the new arenas on the unused_arena_objects list. */
        for (i = maxarenas; i < numarenas; ++i) {
            arenas[i].address = 0;              /* mark as unassociated */
            arenas[i].nextarena = i < numarenas - 1 ?
                                   &arenas[i+1] : NULL;
        }

        /* Update globals. */
        unused_arena_objects = &arenas[maxarenas];
        maxarenas = numarenas;
    }

    /* Take the next available arena object off the head of the list. */
    //代码[5]:从unused_arena_objects链表中取出一个“未使用”的arena_object
    assert(unused_arena_objects != NULL);
    arenaobj = unused_arena_objects;
    unused_arena_objects = arenaobj->nextarena; // 更新unused_arena_objects指针,指向下一个arena
    
    //代码[6]:申请arena_object管理的内存
    assert(arenaobj->address == 0);
    address = _PyObject_Arena.alloc(_PyObject_Arena.ctx, ARENA_SIZE);
    if (address == NULL) {
        /* The allocation failed: return NULL after putting the
         * arenaobj back.
         */
        arenaobj->nextarena = unused_arena_objects;
        unused_arena_objects = arenaobj;
        return NULL;
    }
    arenaobj->address = (uptr)address;
    
    //代码[7]:设置初始化相关参数
    ++narenas_currently_allocated;
    ++ntimes_arena_allocated;
    if (narenas_currently_allocated > narenas_highwater)
        narenas_highwater = narenas_currently_allocated;
    
    //代码[8]:设置pool集合的相关信息
    arenaobj->freepools = NULL;
    /* pool_address <- first pool-aligned address in the arena
       nfreepools <- number of whole pools that fit after alignment */
    arenaobj->pool_address = (block*)arenaobj->address;
    arenaobj->nfreepools = ARENA_SIZE / POOL_SIZE;
    assert(POOL_SIZE * arenaobj->nfreepools == ARENA_SIZE);
    // 将pool的起始地址调整为系统页的边界
    // 申请到256KB, 放弃了一些内存, 而将可使用的内存边界pool_address调整到了与系统页对齐
    excess = (uint)(arenaobj->address & POOL_SIZE_MASK);
    if (excess != 0) {
        --arenaobj->nfreepools;
        arenaobj->pool_address += POOL_SIZE - excess;
    }
    arenaobj->ntotalpools = arenaobj->nfreepools;

    return arenaobj;
}

为了更好理解new_arena的过程,画了图4


图4 arena在某一时刻的可能状态

在代码[1]处,Python首先会检查当前unused_arena_objects链表中是否还有“未使用”的arena,检查结果将会决定下一步的动作。

3.4.1 从unused_arena_objects中取一个arena初始化

假如有一个unused_arena_objects,如图5所示:


图5 从unused_arena_objects中取出一个arena

如果unused_arena_objects中还有“未使用”的arena,那么Python将直接在代码[5]处开始,从unused_arena_objects中取出一个arena,接着unused_arena_objects指向下一个arena,如图6所示。


图6 从unused_arena_objects中取出一个arena

然后在代码[6]处,Python申请一块为ARENA_SIZE(256KB)的内存,将内存的地址赋给arenaaddress,此时arena_object就和pool集合建立了联系,也就是成为“可用”的arena了,如图7所示。


图7 申请arena_object管理的内存

所以,到这里这个arena就和unused_arena_objects脱离了关系,被usable_arenas这个组织接收了。

在代码[7]中,Python设置了narenas_currently_allocatedntimes_arena_allocatednarenas_highwater参数。

在代码[8]中,Python设置了一些arena用于维护pool集合的信息。

  • 对申请到的ARENA_SIZE(256KB)内存进行处理,放弃了一些内存,将可用的内存边界pool_addreess调整到了与系统页对齐。
  • freepools设置为NULL,前面讲pool的时候知道,freepools是要等到有pool释放后才起作用。

实际上,pool集合所占用的ARENA_SIZE(256KB)内存进行边界对齐后,是交给pool_addreess来维护。如8所示。


8 设置arena用于维护pool集合的信息

好了,到这为此一个arena的初始化基本上完成,但是还有一个待解决的问题:usable_arenas组织什么时候才能接收这个arena呢?这是后话cheeky

3.4.2 扩充unused_arena_objects

上面讲了如何从unused_arena_objects中取出一个arena进行初始化。

现在回到代码[1]处的判断,如果unused_arena_objectsNULL,则需要扩大arena的集合(小块内存内存池)。Python在内部有一个maxarenas的变量来维护arenas中存储的arena_object个数。在代码[2]中处,Python将待申请的arena_object个数设置为当前arena_object个数(maxarenas)的两倍。当然,在首次初始化时,maxarenas为0,这时将numarenas初始化为16。之后,Python会对numarenas进行检查是否溢出。

如果检查通过的话,则在代码[3]处,通过PyMem_RawRealloc来扩大内存,并在代码[4]处对新申请的arena_object进行设置。特别要注意的是那个貌似毫不起眼的address,代码[4]处将新申请的arenaaddress一律设置为0,这也是一个arena是处于“未使用”状态还是“可用”状态的重要标识。在代码[6]处,一旦arenaarena_objectpool集合建立了联系,那么address就变成了非0。

最后代码[4]处,将新申请的arenas赋值给了unused_arena_objects,这样一来,就又有了“未使用”的arena了。接下来就是回到3.4.1中了。

...

Tags Read More


话说Python内存管理机制篇二之内存分配策略(一)

by LauCyun Jul 26,2017 13:35:42 24,834 views

在Python中,许多时候申请的内存都是小块的内存,这些小块内存在申请后,很快又会被释放,由于这些内存的申请并不是为了创建对象,所以并没有对象一级的内存池机制。这就意味着Python在运行期间会大量地执行mallocfree操作,导致操作系统频繁地在用户态和核心态之间进行切换,这将严重影响Python的执行效率。为了提高Python的执行效率,Python引入了一个内存池机制,用于管理对小块内存的申请和释放。这叫 Pymalloc机制

在Python中,整个小块内存池可以视为一个层次结构,在这个层次结构中,一共分为4层,从下至上分别是:blockpoolarena内存池,如图1。需要说明的是,blockpoolarena都是Python代码中可以找到的实体,而最顶层的内存池只是一个概念上的东西,表示Python对于整个小块内存分配和释放行为的内存管理机制。


图1 Python的小块内存的内存池全景

Python又分为大内存和小内存。大小以512字节为界限,对于大内存使用Malloc进行分配,而对于小内存则使用内存池进行分配。

1 block

在 Python 中,block是最小的内存单元,不同种类的block都有不同的内存大小,这个内存大小的值被称为size class。为了在当前主流的平台都能获得最佳性能,所有的block的长度都是8字节对齐的。

[Objects/obmalloc.c]
/*
 * Alignment of addresses returned to the user. 8-bytes alignment works
 * on most current architectures (with 32-bit or 64-bit address busses).
 * The alignment value is also used for grouping small requests in size
 * classes spaced ALIGNMENT bytes apart.
 *
 * You shouldn't change this unless you know what you are doing.
 */
#define ALIGNMENT               8               /* must be 2^N */
#define ALIGNMENT_SHIFT         3

同时,Python为block的大小设定了一个上限,当申请的内存大小小于这个上限时,Python可以使用不同种类的block来满足对内存的需求;当申请的内存大小超过这个上限时,Python就会对内存的请求转交给第一层的内存管理机制,即PyMem函数族来处理。这个上限值在Python3.5中被设定为512。

[Objects/obmalloc.c]
/*
 * Max size threshold below which malloc requests are considered to be
 * small enough in order to use preallocated memory pools. You can tune
 * this value according to your application behaviour and memory needs.
 *
 * Note: a size threshold of 512 guarantees that newly created dictionaries
 * will be allocated from preallocated memory pools on 64-bit.
 *
 * The following invariants must hold:
 *      1) ALIGNMENT <= SMALL_REQUEST_THRESHOLD <= 512
 *      2) SMALL_REQUEST_THRESHOLD is evenly divisible by ALIGNMENT
 *
 * Although not required, for better performance and space efficiency,
 * it is recommended that SMALL_REQUEST_THRESHOLD is set to a power of 2.
 */
#define SMALL_REQUEST_THRESHOLD 512
#define NB_SMALL_SIZE_CLASSES   (SMALL_REQUEST_THRESHOLD / ALIGNMENT)

根据SMALL_REQUEST_THRESHOLDALIGNMENT的限定,由此可知不同种类block的的size class分别为:8,16,24,32,...,512。而每个size class对应一个size class index,这个size class index从0开始,所以对于小于512字节的小内存的分配,得出如下结论:

表1 不同长度的block

Request in bytes Size of allocated block Size class idx
1-8 8 0
9-16 16 1
17-24 24 2
25-32 32 3
33-40 40 4
41-48 48 5
49-56 56 6
57-64 64 7
65-72 72 8
497-504 504 62
505-512 512 63

注:0, SMALL_REQUEST_THRESHOLD + 1 and up: routed to the underlying allocator.

举个栗子[1]

如果要申请一块29字节大小的内存,实际上PyObject_Malloc从内存池中划一个32字节的block(从size class index为3的pool里面划出),如图2所示:


图2

size classsize class index之间的转换:

[Objects/obmalloc.c]
// 从 size class index 转换到 size class
#define INDEX2SIZE(I) (((uint)(I) + 1) << ALIGNMENT_SHIFT)
/*
 * 即:
 * (0+1) * 8 = 8
 * (1+1) * 8 = 16
 */

// 从 size class 转换到 size class index
size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
/* 
 * 即:
 * (8 - 1) / 8 = 0
 * (16 - 8) / 8 = 1
 */

在Python中,block只是一个概念,在Python源码中没有与之对应的实体存在。对象在Python源码中有对应的PyObject;列表在Python源码中对应PyListObject、PyType_List,所以block就很奇怪了,它仅仅是概念上的东西,我们知道它是具有一定大小的内存,但它不与Python源码里的某个东西对应。然而,Python却提供了一个管理block的东西,这就是下面要剖析的pool

 

2 pool

一组 block的集合称为一个pool,换句话说,一个pool管理着一堆有固定大小的内存块。事实上,pool管理着一大块内存,它有一定的策略,将这块大的内存划分为多个小的内存块。

2.1 pool大小

[Objects/obmalloc.c]
/*
 * The system's VMM page size can be obtained on most unices with a
 * getpagesize() call or deduced from various header files. To make
 * things simpler, we assume that it is 4K, which is OK for most systems.
 * It is probably better if this is the native page size, but it doesn't
 * have to be.  In theory, if SYSTEM_PAGE_SIZE is larger than the native page
 * size, then `POOL_ADDR(p)->arenaindex' could rarely cause a segmentation
 * violation fault.  4K is apparently OK for all the platforms that python
 * currently targets.
 */
#define SYSTEM_PAGE_SIZE        (4 * 1024)
#define SYSTEM_PAGE_SIZE_MASK   (SYSTEM_PAGE_SIZE - 1)

/*
 * Size of the pools used for small blocks. Should be a power of 2,
 * between 1K and SYSTEM_PAGE_SIZE, that is: 1k, 2k, 4k.
 */
#define POOL_SIZE               SYSTEM_PAGE_SIZE        /* must be 2^N */
#define POOL_SIZE_MASK          SYSTEM_PAGE_SIZE_MASK

在Python中,一个pool的大小通常为一个系统内存页,由于当前大多数Python支持的系统的内存页都是4KB,所以Python内部也将一个pool的大小定义为4KB

2.2 pool结构

pool是由pool_headerblock集合(N多大小一样的block)组成。其pool_header的结构如下:

[Objects/obmalloc.c]

/* Pool for small blocks. */
struct pool_header {
    union { block *_padding;
            uint count; } ref;          /* number of allocated blocks    */
    block *freeblock;                   /* pool's free list head         */
    struct pool_header *nextpool;       /* next pool of this size class  */
    struct pool_header *prevpool;       /* previous pool       ""        */
    uint arenaindex;                    /* index into arenas of base adr */
    uint szidx;                         /* block size class index        */
    uint nextoffset;                    /* bytes to virgin block         */
    uint maxnextoffset;                 /* largest valid nextoffset      */
};

其中:

  • ref表示已分配block的数量
  • freeblock表示指向下一个可用的block的起始地址
  • nextoffset表示下一个可用的block的偏移位置
  • maxnextoffset表示pool的最大偏移位置

一个pool管理的所有block,它们的大小都是一样的。如图3


图3 pool的结构图

每一个pool都和一个size class index联系在一起,(block size=8, szidx=0)、 (block size=16, szidx=1)、...、(block size=512, szidx=63),用于内存分配时匹配到拥有对应大小blockpool,这就是pool_headerszidx的意义。

2.3 pool初始化

假设有一块4KB的内存,Python是如何初始化的呢?如下:

[Objects/obmalloc.c]

/* malloc.  Note that nbytes==0 tries to return a non-NULL pointer, distinct
 * from all other currently live pointers.  This may not be possible.
 */

/*
 * The basic blocks are ordered by decreasing execution frequency,
 * which minimizes the number of jumps in the most common cases,
 * improves branching prediction and instruction scheduling (small
 * block allocations typically result in a couple of instructions).
 * Unless the optimizer reorders everything, being too smart...
 */

static void *
_PyObject_Alloc(int use_calloc, void *ctx, size_t nelem, size_t elsize)
{
    size_t nbytes;
    block *bp;
    poolp pool;
    poolp next;
    uint size;

    ...   // pool指向了一块4KB的内存

        init_pool:
            /* Frontlink to used pools. */
            // 连接到 used_pools 双向链表, 作为表头
            // 注意, 这里 usedpools[0] 保存着 block size = 8 的所有used_pools的表头
            next = usedpools[size + size]; /* == prev */
            pool->nextpool = next;
            pool->prevpool = next;
            next->nextpool = pool;
            next->prevpool = pool;
            pool->ref.count = 1;

            if (pool->szidx == size) {
                /* Luckily, this pool last contained blocks
                 * of the same size class, so its header
                 * and free list are already initialized.
                 */
                bp = pool->freeblock;
                assert(bp != NULL);
                pool->freeblock = *(block **)bp;
                UNLOCK();
                if (use_calloc)
                    memset(bp, 0, nbytes);
                return (void *)bp;
            }
            /*
             * Initialize the pool header, set up the free list to
             * contain just the second block, and return the first
             * block.
             */
            // 开始初始化pool_header
            // 这里 size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;  其实是Size class idx, 即szidx
            pool->szidx = size; 

            // 将size class index转换为size,如2转换为24字节
            size = INDEX2SIZE(size); 

            // 注意 #define POOL_OVERHEAD ROUNDUP(sizeof(struct pool_header))
            // bp => 初始化为pool + pool_header size,  跳过pool_header的内存
            bp = (block *)pool + POOL_OVERHEAD;

            // 计算偏移量, 这里的偏移量是绝对值
            // #define POOL_SIZE SYSTEM_PAGE_SIZE  /* must be 2^N */
            // POOL_SIZE = 4kb, POOL_OVERHEAD = pool_header size
            // 下一个偏移位置: pool_header size + 2 * size
            pool->nextoffset = POOL_OVERHEAD + (size << 1);

            pool->maxnextoffset = POOL_SIZE - size;

            // freeblock指向 bp + size = pool_header size + size
            pool->freeblock = bp + size;

            // *freeblock赋值NULL
            *(block **)(pool->freeblock) = NULL;
            UNLOCK();
            if (use_calloc)
                memset(bp, 0, nbytes);
            return (void *)bp;
        }
    ...
}

最后返回的bp就是指向从pool中取出的第一块block的指针。而bp返回的实际是一个地址,这个地址之后有将近 4KB 的内存实际上都是可用的,但是可以肯定申请内存的函数只会使用[bp, bp+zise]这个区间的内存,这是由size class index决定的。

图4所示的一块经过初始化后的4KB内存:


图4 pool初始化后的4KB内存

接下来,看看Python在申请pool时,pool_header中的各个域是怎么变动的?

总体分配的代码如下:

[Objects/obmalloc.c]

        /*
         * Most frequent paths first
         */
        size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
        pool = usedpools[size + size];
        if (pool != pool->nextpool) {
            /*
             * There is a used pool for this size class.
             * Pick up the head block of its free list.
             */
            ++pool->ref.count;
            bp = pool->freeblock; //指针指向下一个可用block起始位置
            assert(bp != NULL);
            
            // 代码[1]
            // 调整 pool->freeblock (假设A节点)指向链表下一个, 即bp首字节指向的下一个节点(假设B节点) 
            // 如果此时!= NULL, 表示 A节点可用, 直接返回
            if ((pool->freeblock = *(block **)bp) != NULL) {
                UNLOCK();
                if (use_calloc)
                    memset(bp, 0, nbytes);
                return (void *)bp;
            }

            // 代码[2]
            /*
             * Reached the end of the free list, try to extend it.
             */
            // 有足够的空间, 分配一个, pool->freeblock 指向后移
            if (pool->nextoffset <= pool->maxnextoffset) {
                /* There is room for another block. */
                pool->freeblock = (block*)pool +
                                  pool->nextoffset;
                pool->nextoffset += INDEX2SIZE(size);

                // 注意, *freeblock指向NULL
                // 设置 *freeblock 的动作正是建立离散自由 block 链表的关键所在
                *(block **)(pool->freeblock) = NULL; 
                UNLOCK();
                if (use_calloc)
                    memset(bp, 0, nbytes);
                return (void *)bp;
            }

            // 代码[3]
            /* Pool is full, unlink from used pools. */
            // pool满了, 需要从下一个pool获取
            next = pool->nextpool;
            pool = pool->prevpool;
            next->prevpool = pool;
            pool->nextpool = next;
            UNLOCK();
            if (use_calloc)
                memset(bp, 0, nbytes);
            return (void *)bp;
        }

举个栗子[2]

假设我们从现在开始连续申请5块29字节内存,由于29字节对应的size class index为3,所以实际上会申请5块32字节的内存。

freeblock指向的是下一个可用的block的起始地址,这一点在图4中也可以看得出。当再次申请32字节的block时,只需返回freeblock指向的地址就可以了,这时freeblock需要向前进,指向下一个可用的block

pool_header中,nextoffsetmaxnextoffset是两个用于对pool中的block集合进行迭代的变量:从初始化pool的结果及图4中可以看到,它所指示的偏移位置正好指向了freeblock之后的下一个可用的block的地址。从这里分配block的动作也可以看到,在分配了block之后,freeblocknextoffset都会向前移动一个block的距离,如此反复,就可对所有的block进行一次遍历。而maxnextoffset指名了该pool中最后一个可用的blockpool开始位置的便移,它界定了pool的边界,当nextoffset > maxnextoffset 时,也就意味着已经遍历完了pool中所有的block了。

2.4 pool分配策略 - 情况1

内存块尚未分配完,且此时不存在回收的block;全新pool进来的时候,分配第一块block。其条件为:

(pool->freeblock = *(block **)bp) == NULL

其分配实现逻辑如下:

[Objects/obmalloc.c]

            bp = pool->freeblock; //指针指向下一个可用block起始位置

            // 代码[2]
            /*
             * Reached the end of the free list, try to extend it.
             */
            // 有足够的空间, 分配一个, pool->freeblock 指向后移
            if (pool->nextoffset <= pool->maxnextoffset) {
                /* There is room for another block. */
                pool->freeblock = (block*)pool +
                                  pool->nextoffset;
                pool->nextoffset += INDEX2SIZE(size);

                // 注意, *freeblock指向NULL
                // 设置 *freeblock 的动作正是建立离散自由 block 链表的关键所在
                *(block **)(pool->freeblock) = NULL; 
                UNLOCK();
                if (use_calloc)
                    memset(bp, 0, nbytes);
                return (void *)bp;
            }

所以栗子[2]的结果如图5所示:


图5 分配5块32字节后的4KB内存

这种情况下,申请内存只需要:申请、前进,申请、前进,...,申请、前进,这个过程很容易。

接下来讲第二种情况。

2.5 pool分配策略 - 情况2:回收了某几个block

上面栗子中,我们已经进行了5次连续的32字节的内存分配,但是现在程序释放了其中的第1块和第4块block,那么下一次再分配32字节的内存时,是用第1块还是第6块内存呢?我们肯定期望它使用第1块内存。

一旦Python运转起来,内存的释放动作将会导致pool中出现大量的离散的自由block,Python必须建立一种机制,将这些离散的自由block组织起来,再次使用。这个机制就是所谓的自由block链表。这个链表的关键就着落在pool_header中的那个freeblock身上。

代码[2]中有*(block **)(pool->freeblock) = NULL这个动作,这个动作正是建立离散自由block链表的关键所在。如果某个block用完后,就会释放,其释放代码如下:

[Objects/obmalloc.c]

/* free */

ATTRIBUTE_NO_ADDRESS_SAFETY_ANALYSIS
static void
_PyObject_Free(void *ctx, void *p)
{
    poolp pool;
    block *lastfree;
    poolp next, prev;
    uint size;
    ...

    pool = POOL_ADDR(p);
    // 判断p指向的block是否属于pool
    if (Py_ADDRESS_IN_RANGE(p, pool)) {
        /* We allocated this address. */
        LOCK();
        /* Link p to the start of the pool's freeblock list.  Since
         * the pool had at least the p block outstanding, the pool
         * wasn't empty (so it's already in a usedpools[] list, or
         * was full and is in no list -- it's not in the freeblocks
         * list in any case).
         */
        assert(pool->ref.count > 0);            /* else it was empty */
        // p被释放,p的第一个字节值被设置为当前的 freeblock 的值
        *(block **)p = lastfree = pool->freeblock; // [1]
        // freeblock的值被更新,指向其首地址,则一个 block 被插入到了离散自由的 block 链表中
        pool->freeblock = (block *)p;              // [2]

        // 相当于往list中头插入了一个节点

        ...
    }
    ...
}

假设这是Python释放的是block p,在经过上面代码中的[1]逻辑后,A中第一个字节的值被设置为了当前freeblock的值,而在经过[2]逻辑后,freeblock的值被更新了,指向了block p的首地址。就这样,一个block被插入到了离散自由block链表中。回到上面的栗子,当第1块和第4块block都被释放后,我们就看到了图6中一个离散自由block链表:


图6 释放了block之后产生的离散自由block链表

freeblock开始,可以很容易以freeblock = *freeblock的方式遍历这条链表,而当发现*freeblockNULL是,则表明到达了该链表的尾部。继续回到上面的栗子中,再分配32字节的内存时,其逻辑代码[1]如下:

[Objects/obmalloc.c]

        /*
         * Most frequent paths first
         */
        size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
        pool = usedpools[size + size];
        if (pool != pool->nextpool) {
            /*
             * There is a used pool for this size class.
             * Pick up the head block of its free list.
             */
            ++pool->ref.count;
            bp = pool->freeblock; //指针指向下一个可用block起始位置
            assert(bp != NULL);
            
            // 代码[1]
            // 调整 pool->freeblock (假设A节点)指向链表下一个, 即bp首字节指向的下一个节点(假设B节点) 
            // 如果此时!= NULL, 表示 A节点可用, 直接返回
            if ((pool->freeblock = *(block **)bp) != NULL) {
                UNLOCK();
                if (use_calloc)
                    memset(bp, 0, nbytes);
                return (void *)bp;
            }
            ...
        }

其结果如图7所示:


图7 使用离散自由block链表分配一个block后的4KB内存

2.6 pool分配策略 - 情况3:pool用完了

如果pool中内存空间都用完了,则执行如下逻辑:

[Objects/obmalloc.c]

        /*
         * Most frequent paths first
         */
        size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;
        pool = usedpools[size + size];
        if (pool != pool->nextpool) {
            /*
             * There is a used pool for this size class.
             * Pick up the head block of its free list.
             */
            ++pool->ref.count;
            bp = pool->freeblock; //指针指向下一个可用block起始位置
            assert(bp != NULL);
            
            ...

            // 代码[3]
            /* Pool is full, unlink from used pools. */
            // pool满了, 需要从下一个pool获取
            next = pool->nextpool;
            pool = pool->prevpool;
            next->prevpool = pool;
            pool->nextpool = next;
            UNLOCK();
            if (use_calloc)
                memset(bp, 0, nbytes);
            return (void *)bp;
        }

获取下一个pool(链表上每个poolblock size都是一致的)

好了,pool到此为止,下面进入arena

(全文完)

...

Tags Read More