C++ folly庫解讀（一） Fbstring —— 一個完美替代std::string的庫

string 常見的三種實現方式

eager copy

COW

SSO

Fbstring 介紹

Storage strategies

Implementation highlights

Benchmark

主要類

字串儲存資料結構

small strings

medium strings

large strings

如何區分字串型別 category

small strings

medium strings

large strings

category（）

小端

大端

capacity（）

size（）

字串初始化

initSmall

initMedium

initLarge

特殊的建構函式 —— 不複製使用者傳入的字串

字串複製

copySmall

copyMedium

copyLarge

析構

COW

non-const operator

non-const 與 const

Realloc

其他

__builtin_expect

CMOV 指令

builtin_unreachable && assume（0）

jemalloc

find 演算法

判斷大小端

參考資料

在引入

fbstring

之前，我們首先再回顧一下 string 常見的三種實現方式。

string 常見的三種實現方式

string 中比較重要的 3 個欄位：

char *data。指向存放字串的首地址（在 SSO 的某些實現方案中可能沒有此欄位）。

size_t size。字串長度。

size_t capacity。字串容量。capacity >= size。在字串相加、reserve 等場景下會用到此欄位。

eager copy

這個是最簡單、最好理解的一種，在每次複製時將原 string 對應的記憶體以及所持有的動態資源完整地複製一份，即沒有任何特殊處理。

優點：

實現簡單。

每個物件互相獨立，不用考慮那麼多亂七八糟的場景。

缺點：

字串較大時，複製浪費空間。

COW

這個也算是計算機裡的基本思想了。不同於 eager copy 的每次複製都會複製，此種實現方式為寫時複製，即 copy-on-write。只有在某個 string 要對共享物件進行修改時，才會真正執行複製。

由於存在共享機制，所以需要一個

std：：atomic

，代表被多少物件共享。

寫時複製：

優點：

字串空間較大時，減少了分配、複製字串的時間。

缺點：

refcount 需要原子操作，效能有損耗。

某些情況下會帶來意外的開銷。比如非 const 成員使用［］，這會觸發 COW，因為無法知曉應用程式是否會對返回的字元做修改。典型的如

Legality of COW std::string implementation in C++11

中舉的例子：

std

：：

string

（

“str”

）；

const

char

。

data

（）；

{

std

：：

string

（

）；

（

void

）

［

］；

// 觸發COW

}

std

：：

cout

‘\n’

；

// p指向的原有空間已經無效

其他更細節的缺點可以參考：

std::string 的 Copy-on-Write：不如想象中美好

SSO

Small String Optimization。

基於字串大多數比較短的特點

，利用 string 物件本身的棧空間來儲存短字串。而當字串長度大於某個臨界值時，則使用 eager copy 的方式。

SSO 下，string 的資料結構會稍微複雜點，使用 union 來區分短字串和長字串的場景：

class

string

{

char

start

；

size_t

size

；

static

const

int

kLocalSize

；

union

{

char

buffer

［

kLocalSize

］；

// 滿足條件時，用來存放短字串

size_t

capacity

；

}

data

；

}；

短字串，SSO：

長字串，eager copy：

這種資料結構的實現下，SSO 的閾值一般是 15 位元組。folly 的 fbstring 在 SSO 場景下，資料結構做了一些最佳化，可以儲存 23 個位元組，後面會提到。

優點：

短字串時，無動態記憶體分配。

缺點：

string 物件佔用空間比 eager copy 和 cow 要大。

Fbstring 介紹

fbstring 可以 100%相容 std：：string。配合三種儲存策略和

jemalloc

，可以顯著的提高 string 的效能。

fbstring 支援 32-bit、64-bit、little-endian、big-endian。

Storage strategies

Small Strings （<= 23 chars），使用 SSO。

Medium strings （24 - 255 chars），使用 eager copy。

Large strings （> 255 chars），使用 COW。

Implementation highlights

與 std：：string 100%相容。

COW 儲存時對於引用計數執行緒安全。

對 Jemalloc 友好。如果檢測到使用 jemalloc，那麼將使用 jemalloc 的一些非標準擴充套件介面來提高效能。

find（）

使用簡化版的

Boyer-Moore algorithm

。在查詢成功的情況下，相對於

string：：find（）

有 30 倍的效能提升。在查詢失敗的情況下也有 1。5 倍的效能提升。

可以與 std：：string 互相轉換。

Benchmark

在

FBStringBenchmark.cpp

中。

主要類

：：folly：：fbstring str（“abc”）

中的 fbstring 為 basic_fbstring的別名：

typedef basic_fbstring fbstring；

basic_fbstring 在 fbstring_core 提供的介面之上，實現了 std：：string 定義的所有介面。裡面有一個私有變數 store，預設值即為 fbstring_core。basic_fbstring 的定義如下，比 std：：basic_string 只多了一個預設的模板引數 Storage：

template

typename

，

class

std

：：

char_traits

，

class

std

：：

allocator

，

class

Storage

fbstring_core

class

basic_fbstring

；

fbstring_core 負責字串的儲存及字串相關的操作，例如 init、copy、reserve、shrink 等等。

字串儲存資料結構

最重要的 3 個數據結構 union{Char small*， MediumLarge ml*}、MediumLarge、RefCounted，定義在 fbstring_core 中，基本上所有的字串操作都離不開這三個資料結構。

struct

RefCounted

{

std

：：

atomic

size_t

refCount_

；

Char

data_

［

］；

static

RefCounted

create

（

size_t

size

）；

// 建立一個RefCounted

static

RefCounted

create

（

const

Char

data

，

size_t

size

）；

// ditto

static

void

incrementRefs

（

Char

）；

// 增加一個引用

static

void

decrementRefs

（

Char

）；

// 減少一個引用

// 其他函式定義

}；

struct

MediumLarge

{

Char

data_

；

size_t

size_

；

size_t

capacity_

；

size_t

capacity

（）

const

{

return

kIsLittleEndian

？

capacity_

capacityExtractMask

：

capacity_

；

}

void

setCapacity

（

size_t

cap

，

Category

cat

）

{

capacity_

kIsLittleEndian

？

cap

（

static_cast

size_t

（

cat

）

kCategoryShift

）

：

（

cap

）

static_cast

size_t

（

cat

）；

}

}；

union

{

uint8_t

bytes_

［

sizeof

（

MediumLarge

）］；

// For accessing the last byte。

Char

small_

［

sizeof

（

MediumLarge

）

sizeof

（

Char

）］；

MediumLarge

ml_

；

}；

small strings（SSO）時，使用 union 中的 Char small_儲存字串，即物件本身的棧空間。

medium strings（eager copy）時，使用 union 中的

MediumLarge ml_

Char* data_ ：指向分配在堆上的字串。

size_t size：字串長度。

size_t capacity ：字串容量。

large strings（cow）時，使用

MediumLarge ml_

和 RefCounted：

RefCounted。refCount_ ：共享字串的引用計數。

RefCounted。data_［1］：

flexible array.

存放字串。

ml*。data_指向 RefCounted。data，ml*。size_與 ml。capacity_的含義不變。

但是這裡有一個問題是：SSO 情況下的 size 和 capacity 存在哪裡了？

capacity ：首先 SSO 的場景並不需要 capacity，因為此時利用的是棧空間，或者理解此種情況下的 capacity=maxSmallSize。

size ：利用 small_的一個位元組來儲存 size，

而且具體儲存的不是 size，而是maxSmallSize - s

（maxSmallSize=23，再轉成 char 型別），因為這樣可以 SSO 多儲存一個位元組，具體原因後面詳細講。

small strings :

medium strings :

large strings :

如何區分字串型別 category

字串的 small/medium/large 型別對外部透明，而且針對字串的各種操作例如 copy、shrink、reserve、賦值等等，三種類型的處理方式都不一樣，

所以，我們需要在上面的資料結構中做些“手腳”，來區分不同的字串型別。

因為只有三種類型，所以只需要 2 個 bit 就能夠區分。相關的資料結構為：

typedef

uint8_t

category_type

；

enum

class

Category

：

category_type

{

isSmall

，

isMedium

kIsLittleEndian

？

0x80

：

0x2

，

// 10000000 ， 00000010

isLarge

kIsLittleEndian

？

0x40

：

0x1

，

// 01000000 ， 00000001

}；

kIsLittleEndian 為判斷當前平臺的大小端，大端和小端的儲存方式不同。

small strings

category 與 size 共同存放在 small_的最後一個位元組中（size 最大為 23，所以可以存下），考慮到大小端，所以有移位操作，這主要是為了讓 category（）的判斷更簡單，後面再詳細分析。具體程式碼在 setSmallSize 中：

void

setSmallSize

（

size_t

）

{

……

constexpr

auto

shift

kIsLittleEndian

？

：

；

small_

［

maxSmallSize

］

char

（（

maxSmallSize

）

shift

）；

……

}

medium strings

可能有人注意到了，在 MediumLarge 結構體中定義了兩個方法，

capacity（）

和

setCapacity（size_t cap， Category cat）

，其中 setCapacity 即同時設定 capacity 和 category ：

constexpr

static

size_t

kCategoryShift

（

sizeof

（

size_t

）

；

void

setCapacity

（

size_t

cap

，

Category

cat

）

{

capacity_

kIsLittleEndian

？

cap

（

static_cast

size_t

（

cat

）

kCategoryShift

）

：

（

cap

）

static_cast

size_t

（

cat

）；

}

小端時，將 category = isMedium = 0x80 向左移動

（sizeof（size_t） - 1） * 8

位，即移到最高位的位元組中，再與 capacity 做或運算。

大端時，將 category = isMedium = 0x2 與 cap << 2 做或運算即可，左移 2 位的目的是給 category 留空間。

舉個例子，假設 64 位機器，capacity = 100 ：

large strings

同樣使用 MediumLarge 的 setCapacity，演算法相同，只是 category 的值不同。

假設 64 位機器，capacity = 1000 ：

category()

category（）

為最重要的函式之一，作用是獲取字串的型別：

constexpr

static

uint8_t

categoryExtractMask

kIsLittleEndian

？

0xC0

：

0x3

；

// 11000000 ， 00000011

constexpr

static

size_t

lastChar

sizeof

（

MediumLarge

）

；

union

{

uint8_t

bytes_

［

sizeof

（

MediumLarge

）］；

// For accessing the last byte。

Char

small_

［

sizeof

（

MediumLarge

）

sizeof

（

Char

）］；

MediumLarge

ml_

；

}；

Category

category

（）

const

{

// works for both big-endian and little-endian

return

static_cast

Category

（

bytes_

［

lastChar

］

categoryExtractMask

）；

}

bytes_定義在 union 中，從註釋可以看出來，是為了配合 lastChar 更加方便的取該結構最後一個位元組。

配合上面三種類型字串的儲存，可以很容易理解這一行程式碼。

小端

大端

capacity()

獲取字串的 capaticy，因為 capacity 與 category 儲存都在一起，所以一起看比較好。

同樣分三種情況。

size_t

capacity

（）

const

{

switch

（

category

（））

{

case

Category

：：

isSmall

：

return

maxSmallSize

；

case

Category

：：

isLarge

：

// For large-sized strings， a multi-referenced chunk has no

// available capacity。 This is because any attempt to append

// data would trigger a new allocation。

（

RefCounted

：：

refs

（

ml_

。

data_

）

{

return

ml_

。

size_

；

}

break

；

case

Category

：：

isMedium

：

default

：

break

；

}

return

ml_

。

capacity

（）；

}

small strings ：直接返回 maxSmallSize，前面有分析過。

medium strings ：返回 ml_。capacity（）。

large strings ：

當字串引用大於 1 時，直接返回 size。因為此時的 capacity 是沒有意義的，任何 append data 操作都會觸發一次 cow

否則，返回 ml_。capacity（）。

看下 ml。capacity（）：

constexpr

static

uint8_t

categoryExtractMask

kIsLittleEndian

？

0xC0

：

0x3

；

constexpr

static

size_t

kCategoryShift

（

sizeof

（

size_t

）

；

constexpr

static

size_t

capacityExtractMask

kIsLittleEndian

？

（

size_t

（

categoryExtractMask

）

kCategoryShift

）

：

0x0

/* unused */

；

size_t

capacity

（）

const

{

return

kIsLittleEndian

？

capacity_

capacityExtractMask

：

capacity_

；

}

categoryExtractMask 和 kCategoryShift 之前遇到過，分別用來計算 category 和小端情況下將 category 左移 kCategoryShift 位。capacityExtractMask 的目的就是消掉 category，讓 capacity_中只有 capacity。

對著上面的每種情況下字串的儲存的圖，應該很好理解，這裡不細說了。

size()

size_t

size

（）

const

{

size_t

ret

ml_

。

size_

；

/* constexpr */

（

kIsLittleEndian

）

{

// We can save a couple instructions， because the category is

// small iff the last char， as unsigned， is <= maxSmallSize。

typedef

typename

std

：：

make_unsigned

Char

>：：

type

UChar

；

auto

maybeSmallSize

size_t

（

maxSmallSize

）

size_t

（

static_cast

UChar

（

small_

［

maxSmallSize

］））；

// With this syntax， GCC and Clang generate a CMOV instead of a branch。

ret

（

static_cast

ssize_t

（

maybeSmallSize

）

？

maybeSmallSize

：

ret

；

}

else

{

ret

（

category

（）

Category

：：

isSmall

）

？

smallSize

（）

：

ret

；

}

return

ret

；

}

小端的情況下，medium strings 和 large strings 對應的 ml_的高位元組儲存的是 category（0x80、0x40），而 small strings 儲存的是 size，所以正如註釋說的，可以先判斷 kIsLittleEndian && maybeSmall，會快一些，不需要呼叫 smallSize（）。而且現在絕大多數平臺都是小端。

如果是大端，那麼如果是 small，呼叫 smallSize（），否則返回 ml。size_；

size_t

smallSize

（）

const

{

assert

（

category

（）

Category

：：

isSmall

）；

constexpr

auto

shift

kIsLittleEndian

？

：

；

auto

smallShifted

static_cast

size_t

（

small_

［

maxSmallSize

］）

shift

；

assert

（

static_cast

size_t

（

maxSmallSize

）

smallShifted

）；

return

static_cast

size_t

（

maxSmallSize

）

smallShifted

；

}

比較簡單，不說了。

字串初始化

首先 fbstring_core 的建構函式中，根據字串的長度，呼叫 3 種不同型別的初始化函式：

fbstring_core

（

const

Char

const

data

，

const

size_t

size

，

bool

disableSSO

FBSTRING_DISABLE_SSO

）

{

（

！

disableSSO

size

maxSmallSize

）

{

initSmall

（

data

，

size

）；

}

else

（

size

maxMediumSize

）

{

initMedium

（

data

，

size

）；

}

else

{

initLarge

（

data

，

size

）；

}

initSmall

template

class

Char

inline

void

fbstring_core

Char

>：：

initSmall

（

const

Char

const

data

，

const

size_t

size

）

{

// If data is aligned， use fast word-wise copying。 Otherwise，

// use conservative memcpy。

// The word-wise path reads bytes which are outside the range of

// the string， and makes ASan unhappy， so we disable it when

// compiling with ASan。

#ifndef FOLLY_SANITIZE_ADDRESS

（（

reinterpret_cast

size_t

（

data

）

（

sizeof

（

size_t

）

））

）

{

const

size_t

byteSize

size

sizeof

（

Char

）；

constexpr

size_t

wordWidth

sizeof

（

size_t

）；

switch

（（

byteSize

wordWidth

）

wordWidth

）

{

// Number of words。

case

：

ml_

。

capacity_

reinterpret_cast

const

size_t

（

data

）［

］；

FOLLY_FALLTHROUGH

；

case

：

ml_

。

size_

reinterpret_cast

const

size_t

（

data

）［

］；

FOLLY_FALLTHROUGH

；

case

：

ml_

。

data_

reinterpret_cast

Char

**>

（

const_cast

Char

（

data

））；

FOLLY_FALLTHROUGH

；

case

：

break

；

}

else

#endif

{

（

size

！=

）

{

fbstring_detail

：：

podCopy

（

data

，

data

size

，

small_

）；

}

setSmallSize

（

size

）；

}

首先，如果傳入的字串地址是記憶體對齊的，

則配合 reinterpret_cast 進行 word-wise copy，提高效率。

否則，呼叫 podCopy 進行 memcpy。

最後，透過 setSmallSize 設定 small string 的 size。

setSmallSize ：

void

setSmallSize

（

size_t

）

{

// Warning： this should work with uninitialized strings too，

// so don‘t assume anything about the previous value of

// small_［maxSmallSize］。

assert

（

maxSmallSize

）；

constexpr

auto

shift

kIsLittleEndian

？

：

；

small_

［

maxSmallSize

］

char

（（

maxSmallSize

）

shift

）；

small_

［

］

’\0‘

；

assert

（

category

（）

Category

：：

isSmall

size

（）

）；

}

之前提到過，small strings 存放的 size 不是真正的 size，是

maxSmallSize - size

，這樣做的原因是可以 small strings 可以多儲存一個位元組。

因為假如儲存 size 的話，small中最後兩個位元組就得是\0 和 size，但是儲存maxSmallSize - size，當 size == maxSmallSize 時，small的最後一個位元組恰好也是\0。

initMedium

template

class

Char

FOLLY_NOINLINE

inline

void

fbstring_core

Char

>：：

initMedium

（

const

Char

const

data

，

const

size_t

size

）

{

// Medium strings are allocated normally。 Don’t forget to

// allocate one extra Char for the terminating null。

auto

const

allocSize

goodMallocSize

（（

size

）

sizeof

（

Char

））；

ml_

。

data_

static_cast

Char

（

checkedMalloc

（

allocSize

））；

（

FOLLY_LIKELY

（

size

））

{

fbstring_detail

：：

podCopy

（

data

，

data

size

，

ml_

。

data_

）；

}

ml_

。

size_

size

；

ml_

。

setCapacity

（

allocSize

sizeof

（

Char

）

，

Category

：：

isMedium

）；

ml_

。

data_

［

size

］

‘\0’

；

}

folly 會透過 canNallocx 函式檢測是否使用 jemalloc，如果是，會使用 jemalloc 來提高記憶體分配的效能。關於 jemalloc 我也不是很熟悉，感興趣的可以查查，有很多資料。

所有的動態記憶體分配都會呼叫 goodMallocSize，獲取一個對 jemalloc 友好的值。

再透過 checkedMalloc 真正申請記憶體，存放字串。

呼叫 podCopy 進行 memcpy，與 initSmall 的 podCopy 一樣。

最後再設定 size、capacity、category 和\0。

initLarge

template

class

Char

FOLLY_NOINLINE

inline

void

fbstring_core

Char

>：：

initLarge

（

const

Char

const

data

，

const

size_t

size

）

{

// Large strings are allocated differently

size_t

effectiveCapacity

size

；

auto

const

newRC

RefCounted

：：

create

（

data

，

effectiveCapacity

）；

ml_

。

data_

newRC

data_

；

ml_

。

size_

size

；

ml_

。

setCapacity

（

effectiveCapacity

，

Category

：：

isLarge

）；

ml_

。

data_

［

size

］

‘\0’

；

}

與 medium strings 最大的不同是會透過 RefCounted：：create 建立 RefCounted 用於共享字串：

struct

RefCounted

{

std

：：

atomic

size_t

refCount_

；

Char

data_

［

］；

constexpr

static

size_t

getDataOffset

（）

{

return

offsetof

（

RefCounted

，

data_

）；

}

static

RefCounted

create

（

size_t

size

）

{

const

size_t

allocSize

goodMallocSize

（

getDataOffset

（）

（

size

）

sizeof

（

Char

））；

auto

result

static_cast

RefCounted

（

checkedMalloc

（

allocSize

））；

result

refCount_

。

store

（

，

std

：：

memory_order_release

）；

size

（

allocSize

getDataOffset

（））

sizeof

（

Char

）

；

return

result

；

}

static

RefCounted

create

（

const

Char

data

，

size_t

size

）

{

const

size_t

effectiveSize

size

；

auto

result

create

（

size

）；

（

FOLLY_LIKELY

（

effectiveSize

））

{

fbstring_detail

：：

podCopy

（

data

，

data

effectiveSize

，

result

data_

）；

}

return

result

；

}

}；

需要注意的是：

ml*。data*指向的是 RefCounted。data_。

getDataOffset（）用 offsetof 函式獲取 data*在 RefCounted 結構體內的偏移，

Char data*［1］

為 flexible array，存放字串。

注意對

std：：atomic refCount_

進行原子操作的 c++ memory model ：

store，設定引用數為 1 ： std：：memory_order_release

load，獲取當前共享字串的引用數： std：：memory_order_acquire

add/sub。增加/減少一個引用： std：：memory_order_acq_rel

c++ memory model 是另外一個比較大的話題，可以參考：

incubator-brpc

Memory model synchronization modes

一文帶你看懂 C++11 的記憶體模型

C++11 中的記憶體模型下篇 - C++11 支援的幾種記憶體模型

特殊的建構函式 —— 不複製使用者傳入的字串

上面的三種構造，都是將應用程式傳入的字串，不管使用 word-wise copy 還是 memcpy，複製到 fbstring_core 中，且在 medium 和 large 的情況下，需要動態分配記憶體。

fbstring 提供了一個特殊的建構函式，讓 fbstring_core 接管應用程式自己分配的記憶體。

basic_fbstring 的建構函式，並呼叫 fbstring_core 相應的建構函式。

注意這裡 AcquireMallocatedString 為 enum class，比使用 int 和 bool 更可讀。

/**

* Defines a special acquisition method for constructing fbstring

* objects。 AcquireMallocatedString means that the user passes a

* pointer to a malloc-allocated string that the fbstring object will

* take into custody。

enum

class

AcquireMallocatedString

{}；

// Nonstandard constructor

basic_fbstring

（

value_type

，

size_type

，

size_type

，

AcquireMallocatedString

）

：

store_

（

，

）

{

}

basic_fbstring 呼叫相應的 fbstring_core 建構函式：

// Snatches a previously mallocated string。 The parameter “size”

// is the size of the string， and the parameter “allocatedSize”

// is the size of the mallocated block。 The string must be

// \0-terminated， so allocatedSize >= size + 1 and data［size］ == ‘\0’。

// So if you want a 2-character string， pass malloc（3） as “data”，

// pass 2 as “size”， and pass 3 as “allocatedSize”。

fbstring_core

（

Char

const

data

，

const

size_t

size

，

const

size_t

allocatedSize

，

AcquireMallocatedString

）

{

（

size

）

{

FBSTRING_ASSERT

（

allocatedSize

size

）；

FBSTRING_ASSERT

（

data

［

size

］

‘\0’

）；

// Use the medium string storage

ml_

。

data_

data

；

ml_

。

size_

size

；

// Don‘t forget about null terminator

ml_

。

setCapacity

（

allocatedSize

，

Category

：：

isMedium

）；

}

else

{

// No need for the memory

free

（

data

）；

reset

（）；

}

可以看出這裡沒有複製字串的過程，而是直接接管了上游傳遞過來的指標指向的記憶體。但是，正如註釋說的，這裡直接使用了 medium strings 的儲存方式。

比如 folly/io/IOBuf。cpp 中的呼叫：

// Ensure NUL terminated

writableTail

（）

；

fbstring

str

（

reinterpret_cast

char

（

writableData

（）），

length

（），

capacity

（），

AcquireMallocatedString

（））；

字串複製

同初始化，也是根據不同的字串型別，呼叫不同的函式：

fbstring_core

（

const

fbstring_core

rhs

）

{

assert

（

rhs

！=

this

）；

switch

（

rhs

。

category

（））

{

case

Category

：：

isSmall

：

copySmall

（

rhs

）；

break

；

case

Category

：：

isMedium

：

copyMedium

（

rhs

）；

break

；

case

Category

：：

isLarge

：

copyLarge

（

rhs

）；

break

；

default

：

folly

：：

assume_unreachable

（）；

}

copySmall

template

class

Char

inline

void

fbstring_core

Char

>：：

copySmall

（

const

fbstring_core

rhs

）

{

// Just write the whole thing， don’t look at details。 In

// particular we need to copy capacity anyway because we want

// to set the size （don‘t forget that the last character，

// which stores a short string’s length， is shared with the

// ml_。capacity field）。

ml_

rhs

。

ml_

；

}

正如註釋中所說，雖然 small strings 的情況下，字串儲存在 small中，但是我們只需要把 ml直接賦值即可，因為在一個 union 中。

copyMedium

template

class

Char

FOLLY_NOINLINE

inline

void

fbstring_core

Char

>：：

copyMedium

（

const

fbstring_core

rhs

）

{

// Medium strings are copied eagerly。 Don‘t forget to allocate

// one extra Char for the null terminator。

auto

const

allocSize

goodMallocSize

（（

rhs

。

ml_

。

size_

）

sizeof

（

Char

））；

ml_

。

data_

static_cast

Char

（

checkedMalloc

（

allocSize

））；

// Also copies terminator。

fbstring_detail

：：

podCopy

（

rhs

。

ml_

。

data_

，

rhs

。

ml_

。

data_

rhs

。

ml_

。

size_

，

ml_

。

data_

）；

ml_

。

size_

rhs

。

ml_

。

size_

；

ml_

。

setCapacity

（

allocSize

sizeof

（

Char

）

，

Category

：：

isMedium

）；

}

medium strings 是 eager copy，所以就是“深複製”：

為字串分配空間、複製

賦值 size、capacity、category。

copyLarge

template

class

Char

FOLLY_NOINLINE

inline

void

fbstring_core

Char

>：：

copyLarge

（

const

fbstring_core

rhs

）

{

// Large strings are just refcounted

ml_

rhs

。

ml_

；

RefCounted

：：

incrementRefs

（

ml_

。

data_

）；

}

large strings 的 copy 過程很直觀，因為是 COW 方式：

直接賦值 ml，內含指向共享字串的指標。

共享字串的引用計數加 1。

incrementRefs 和內部呼叫 fromData 這兩個個函式值得看一下：

static

RefCounted

fromData

（

Char

）

{

return

static_cast

RefCounted

（

static_cast

void

（

static_cast

unsigned

char

（

static_cast

void

（

））

getDataOffset

（）））；

}

static

void

incrementRefs

（

Char

）

{

fromData

（

）

refCount_

。

fetch_add

（

，

std

：：

memory_order_acq_rel

）；

}

因為 ml中指向的是 RefCounted 的 data［1］，所以我們需要透過 fromData 來找到 data_所屬的 RefCounted 的地址。我把 fromData 函式內的運算拆開：

static

RefCounted

fromData

（

Char

）

{

// 轉換data_［1］的地址

void

voidDataAddr

static_cast

void

（

）；

unsigned

char

unsignedDataAddr

static_cast

unsigned

char

（

voidDataAddr

）；

// 獲取data_［1］在結構體的偏移量再相減，得到的就是所屬RefCounted的地址

unsigned

char

unsignedRefAddr

unsignedDataAddr

getDataOffset

（）；

void

voidRefAddr

static_cast

void

（

unsignedRefAddr

）；

RefCounted

refCountAddr

static_cast

RefCounted

（

voidRefAddr

）；

return

refCountAddr

；

}

值得關注的是如何轉換不同型別結構體的指標並做運算，這裡的做法是

：

Char* -> void* -> unsigned char* -> 與size_t做減法 -> void * -> RefCounted*

析構

fbstring_core

（）

noexcept

{

（

category

（）

Category

：：

isSmall

）

{

return

；

}

destroyMediumLarge

（）；

}

如果是 small 型別，直接返回，因為利用的是棧空間。

否則，針對 medium 和 large，呼叫 destroyMediumLarge。

FOLLY_MALLOC_NOINLINE

void

destroyMediumLarge

（）

noexcept

{

auto

const

category

（）；

FBSTRING_ASSERT

（

！=

Category

：：

isSmall

）；

（

Category

：：

isMedium

）

{

free

（

ml_

。

data_

）；

}

else

{

RefCounted

：：

decrementRefs

（

ml_

。

data_

）；

}

medium ： free 動態分配的字串記憶體即可。

large ：呼叫 decrementRefs，針對共享字串進行操作。

static

void

decrementRefs

（

Char

）

{

auto

const

dis

fromData

（

）；

size_t

oldcnt

dis

refCount_

。

fetch_sub

（

，

std

：：

memory_order_acq_rel

）；

FBSTRING_ASSERT

（

oldcnt

）；

（

oldcnt

）

{

free

（

dis

）；

}

邏輯也很清晰：先對引用計數減 1，如果本身就只有 1 個引用，那麼直接 free 掉整個 RefCounted。

COW

最重要的一點，也是 large strings 獨有的，就是 COW。任何針對字串寫的操作，都會觸發 COW，包括前面舉過的［］操作，例如：

non-const

at（size_n）

non-const

operator［］（size_type pos）

operator+

append

……

我們舉個例子，比如

non-const

operator［］（size_type pos）

non-const operator[](size_type pos)

reference

operator

［］（

size_type

pos

）

{

return

（

begin

（）

pos

）；

}

iterator

begin

（）

{

return

store_

。

mutableData

（）；

}

來重點看下 mutableData（）：

Char

mutableData

（）

{

switch

（

category

（））

{

case

Category

：：

isSmall

：

return

small_

；

case

Category

：：

isMedium

：

return

ml_

。

data_

；

case

Category

：：

isLarge

：

return

mutableDataLarge

（）；

}

fbstring_detail

：：

assume_unreachable

（）；

}

template

class

Char

inline

Char

fbstring_core

Char

>：：

mutableDataLarge

（）

{

FBSTRING_ASSERT

（

category

（）

Category

：：

isLarge

）；

（

RefCounted

：：

refs

（

ml_

。

data_

）

{

// Ensure unique。

unshare

（）；

}

return

ml_

。

data_

；

}

同樣是分三種情況。small 和 medium 直接返回字串的地址，large 會呼叫 mutableDataLarge（），

可以看出，如果引用數大於 1，會進行 unshare 操作：

void

unshare

（

size_t

minCapacity

）；

template

class

Char

FOLLY_MALLOC_NOINLINE

inline

void

fbstring_core

Char

>：：

unshare

（

size_t

minCapacity

）

{

FBSTRING_ASSERT

（

category

（）

Category

：：

isLarge

）；

size_t

effectiveCapacity

std

：：

max

（

minCapacity

，

ml_

。

capacity

（））；

auto

const

newRC

RefCounted

：：

create

（

effectiveCapacity

）；

// If this fails， someone placed the wrong capacity in an

// fbstring。

FBSTRING_ASSERT

（

effectiveCapacity

ml_

。

capacity

（））；

// Also copies terminator。

fbstring_detail

：：

podCopy

（

ml_

。

data_

，

ml_

。

data_

ml_

。

size_

，

newRC

data_

）；

RefCounted

：：

decrementRefs

（

ml_

。

data_

）；

ml_

。

data_

newRC

data_

；

ml_

。

setCapacity

（

effectiveCapacity

，

Category

：：

isLarge

）；

// size_ remains unchanged。

}

基本思路：

建立新的 RefCounted。

複製字串。

對原有的共享字串減少一個引用 decrementRefs，這個函式在上面的析構小節裡分析過。

設定 ml_的 data、capacity、category。

注意此時還不會設定 size，因為還不知道應用程式對字串進行什麼修改。

non-const 與 const

大家可能注意到了，上面的 at 和［］強調了 non-const，這是因為 const-qualifer 針對這兩個呼叫不會觸發 COW ，還以［］為例：

// C++11 21。4。5 element access：

const_reference

operator

［］（

size_type

pos

）

const

{

return

（

begin

（）

pos

）；

}

const_iterator

begin

（）

const

{

return

store_

。

data

（）；

}

// In C++11 data（） and c_str（） are 100% equivalent。

const

Char

data

（）

const

{

return

c_str

（）；

}

const

Char

c_str

（）

const

{

const

Char

ptr

ml_

。

data_

；

// With this syntax， GCC and Clang generate a CMOV instead of a branch。

ptr

（

category

（）

Category

：：

isSmall

）

？

small_

：

ptr

；

return

ptr

；

}

可以看出區別，non-const 版本的 begin（）中呼叫的是 mutableData（），而 const-qualifer 版本呼叫的是 data（） -> c_str（），而 c_str（）直接返回的字串地址。（插一句，貌似之前fbstring的c_str（）的實現是lazy null terminator的，可以看下這位大佬寫的文章：）

所以，當字串用到［］、at 且不需要寫操作時，最好用 const-qualifer。

我們拿 folly 自帶的

benchmark 工具

測試一下：

#include

“folly/Benchmark。h”

#include

“folly/FBString。h”

#include

“folly/container/Foreach。h”

using

namespace

std

；

using

namespace

folly

；

BENCHMARK

（

nonConstFbstringAt

，

）

{

：：

folly

：：

fbstring

str

（

“fbstring is a drop-in replacement for std：：string。 The main benefit of fbstring is significantly increased ”

“performance on virtually all important primitives。 This is achieved by using a three-tiered storage strategy ”

“and by cooperating with the memory allocator。 In particular， fbstring is designed to detect use of jemalloc and ”

“cooperate with it to achieve significant improvements in speed and memory usage。”

）；

FOR_EACH_RANGE

（

，

）

{

char

str

［

］；

doNotOptimizeAway

（

）；

}

BENCHMARK_DRAW_LINE

（）；

BENCHMARK_RELATIVE

（

constFbstringAt

，

）

{

const

：：

folly

：：

fbstring

str

（

“fbstring is a drop-in replacement for std：：string。 The main benefit of fbstring is significantly increased ”

“performance on virtually all important primitives。 This is achieved by using a three-tiered storage strategy ”

“and by cooperating with the memory allocator。 In particular， fbstring is designed to detect use of jemalloc and ”

“cooperate with it to achieve significant improvements in speed and memory usage。”

）；

FOR_EACH_RANGE

（

，

）

{

const

char

str

［

］；

doNotOptimizeAway

（

）；

}

int

main

（）

{

runBenchmarks

（）；

}

結果是 constFbstringAt 比 nonConstFbstringAt 快了 175%

============================================================================

delve_folly

main

。

relative

time

iter

iters

============================================================================

nonConstFbstringAt

39。85

25。10

——————————————————————————————————————

constFbstringAt

175。57

22。70

44。06

============================================================================

Realloc

reserve、operator+等操作，可能會涉及到記憶體重新分配，最終呼叫的都是 memory/Malloc。h 中的 smartRealloc：

inline

void

checkedRealloc

（

void

ptr

，

size_t

size

）

{

void

realloc

（

ptr

，

size

）；

（

！

）

{

throw_exception

std

：：

bad_alloc

（）；

}

return

；

}

/**

* This function tries to reallocate a buffer of which only the first

* currentSize bytes are used。 The problem with using realloc is that

* if currentSize is relatively small _and_ if realloc decides it

* needs to move the memory chunk to a new buffer， then realloc ends

* up copying data that is not used。 It’s generally not a win to try

* to hook in to realloc（） behavior to avoid copies - at least in

* jemalloc， realloc（） almost always ends up doing a copy， because

* there is little fragmentation / slack space to take advantage of。

FOLLY_MALLOC_CHECKED_MALLOC

FOLLY_NOINLINE

inline

void

smartRealloc

（

void

，

const

size_t

currentSize

，

const

size_t

currentCapacity

，

const

size_t

newCapacity

）

{

assert

（

）；

assert

（

currentSize

currentCapacity

newCapacity

）；

auto

const

slack

currentCapacity

currentSize

；

（

slack

currentSize

）

{

// Too much slack， malloc-copy-free cycle：

auto

const

result

checkedMalloc

（

newCapacity

）；

std

：：

memcpy

（

result

，

currentSize

）；

free

（

）；

return

result

；

}

// If there‘s not too much slack， we realloc in hope of coalescing

return

checkedRealloc

（

，

newCapacity

）；

}

從註釋和程式碼看為什麼函式起名叫

smart

Realloc ：

如果（the currentCapacity - currentSize） _ 2 > currentSize，即 currentSize < 2/3 _ capacity，說明當前分配的記憶體利用率較低，此時認為如果使用 realloc 並且 realloc 決定拷貝當前記憶體到新記憶體，成本會高於直接 malloc（newCapacity） + memcpy + free（old_memory）。

否則直接 realloc。

其他

__builtin_expect

給編譯器提供分支預測資訊。原型為：

long

__builtin_expect

（

long

exp

，

long

）

表示式的返回值為 exp 的值，跟 c 無關。

我們預期 exp 的值是 c。例如下面的例子，我們預期 x 的值是 0，所以這裡提示編譯器，只有很小的機率會呼叫到 foo（）

（

__builtin_expect

（

，

））

foo

（）；

再比如判斷指標是否為空：

（

__builtin_expect

（

ptr

！=

NULL

，

））

foo

（

ptr

）；

在 fbstring 中也用到了

builtin_expect，例如建立 RefCounted 的函式 (FOLLY_LIKELY 包裝了一下

builtin_expect）：

#if __GNUC__

#define FOLLY_DETAIL_BUILTIN_EXPECT（b， t）（__builtin_expect（b， t））

#else

#define FOLLY_DETAIL_BUILTIN_EXPECT（b， t） b

#endif

// Likeliness annotations

// Useful when the author has better knowledge than the compiler of whether

// the branch condition is overwhelmingly likely to take a specific value。

// Useful when the author has better knowledge than the compiler of which code

// paths are designed as the fast path and which are designed as the slow path，

// and to force the compiler to optimize for the fast path， even when it is not

// overwhelmingly likely。

#define FOLLY_LIKELY（x） FOLLY_DETAIL_BUILTIN_EXPECT（（x）， 1）

#define FOLLY_UNLIKELY（x） FOLLY_DETAIL_BUILTIN_EXPECT（（x）， 0）

static

RefCounted

create

（

const

Char

data

，

size_t

size

）

{

const

size_t

effectiveSize

size

；

auto

result

create

（

size

）；

（

FOLLY_LIKELY

（

effectiveSize

））

{

// __builtin_expect

fbstring_detail

：：

podCopy

（

data

，

data

effectiveSize

，

result

data_

）；

}

return

result

；

}

從彙編角度來說，會將可能性更大的彙編緊跟著前面的彙編，防止無效指令的載入。可以參考：

likely() and unlikely()

What is the advantage of GCC's __builtin_expect in if else statements?

CMOV 指令

conditional move，條件傳送。類似於 MOV 指令，但是依賴於 RFLAGS 暫存器內的狀態。如果條件沒有滿足，該指令不會有任何效果。

CMOV 的優點是可以避免分支預測，避免分支預測錯誤對 CPU 流水線的影響

。詳細可以看這篇文件：

amd-cmovcc.pdf

fbstring 在一些場景會提示編譯器生成 CMOV 指令，例如：

const

Char

c_str

（）

const

{

const

Char

ptr

ml_

。

data_

；

// With this syntax， GCC and Clang generate a CMOV instead of a branch。

ptr

（

category

（）

Category

：：

isSmall

）

？

small_

：

ptr

；

return

ptr

；

}

builtin_unreachable && assume(0)

如果程式執行到了

builtin_unreachable 和

assume（0），那麼會出現未定義的行為。例如**builtin_unreachable 出現在一個不會返回的函式後面，而且這個函式沒有宣告為

**attribute\_\_（（noreturn））

。例如

6.59 Other Built-in Functions Provided by GCC

給出的例子：

void

function_that_never_returns

（

void

）；

int

（

int

）

{

（

）

{

return

；

}

else

{

function_that_never_returns

（）；

__builtin_unreachable

（）；

}

如果不加

__builtin_unreachable （）；

，會報

error： control reaches end of non-void function ［-Werror=return-type］

folly 將

builtin_unreachable 和

assume（0）封裝成了

assume_unreachable

：

［［noreturn］］

FOLLY_ALWAYS_INLINE

void

assume_unreachable

（）

{

assume

（

false

）；

// Do a bit more to get the compiler to understand

// that this function really will never return。

#if defined（__GNUC__）

__builtin_unreachable

（）；

#elif defined（_MSC_VER）

__assume

（

）；

#else

// Well， it’s better than nothing。

std

：：

abort

（）；

#endif

}

在 fbstring 的一些特性場景，比如 switch 判斷 category 中用到。這是上面提到過的 mutableData（）：

Char

mutableData

（）

{

switch

（

category

（））

{

case

Category

：：

isSmall

：

return

small_

；

case

Category

：：

isMedium

：

return

ml_

。

data_

；

case

Category

：：

isLarge

：

return

mutableDataLarge

（）；

}

folly

：：

assume_unreachable

（）；

}

jemalloc

官網

API 文件

大致的演算法和原理可以參考 facebook 的這篇部落格：

Scalable memory allocation using jemalloc

find 演算法

使用的簡化的 Boyer-Moore 演算法，文件說明是在查詢成功的情況下比 std：：string 的 find 快 30 倍。benchmark 程式碼在

FBStringBenchmark.cpp

我自己測試的情況貌似是搜尋長字串的情況會更好些。

判斷大小端

// It‘s MSVC， so we just have to guess 。。。 and allow an override

#ifdef _MSC_VER

# ifdef FOLLY_ENDIAN_BE

static

constexpr

auto

kIsLittleEndian

false

；

# else

static

constexpr

auto

kIsLittleEndian

true

；

# endif

#else

static

constexpr

auto

kIsLittleEndian

__BYTE_ORDER__

__ORDER_LITTLE_ENDIAN__

；

#endif

BYTE_ORDER為預定義宏：

，值是

ORDER_LITTLE_ENDIAN

、

ORDER_BIG_ENDIAN

、

ORDER_PDP_ENDIAN

中的一個。

一般會這麼使用：

/* Test for a little-endian machine */

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

c++20 引入了std::endian，判斷會更加方便。

參考資料：

《Linux 多執行緒服務端程式設計：使用 muduo C++ 網路庫》 by 陳碩

漫步Facebook開源C++庫folly（1）：string類的設計

（完）

小蜜蜂問答

小蜜蜂問答

C++ folly庫解讀（一） Fbstring —— 一個完美替代std::string的庫

推薦文章

小蜜蜂問答

小蜜蜂問答

C++ folly庫解讀（一） Fbstring —— 一個完美替代std::string的庫

相關文章

570ml的飯盒多大？

石油一噸有多少升？

如何提取單元格中從左至右的前N個字元？

Python 中儘量不要使用“+”來連線兩個字串

推薦文章