YX.S

2021-02-03

Laravel 寫入巨量資料方法實驗比較

當人家
學習筆記

當系統穩定發展營運到某個階段

突然要新增一個與現有資料相關聯的表時，是非常頭痛的一件事

所幸Laravel有提供一些讀巨量資料的方法來做使用

分別是 chunk() 以及 cursor()

官方是這樣寫的：

chunk()

If you need to process thousands of Eloquent records, use the chunk command. The chunk method will retrieve a "chunk" of Eloquent models, feeding them to a given Closure for processing. Using the chunk method will conserve memory when working with large result sets:

不負責任翻譯：

使用 chunk()，會將 model 分割成好幾個區塊，然後將給出的方法做 Closure 處理，在處理大資料時使用 chunk() 可以節省記憶體。

Flight::chunk(200, function ($flights) {
    foreach ($flights as $flight) {
        //
    }
});

另外 Laravel 還有提供 chunkById()，來避免因為 query 有篩選值，而造成的資料更新錯誤

cursor()

The cursor method allows you to iterate through your database records using a cursor, which will only execute a single query. When processing large amounts of data, the cursor method may be used to greatly reduce your memory usage:

不負責任翻譯：

cursor() 方法使我們可以使用一次query就跑完整個資料表，處理大資料時，cursor() 方法可以減少記憶體用量

foreach (Flight::where('foo', 'bar')->cursor() as $flight) {
    //
}

那麼，當要將現有的表讀出來，並取其值作為新表的外鍵時，該用什麼方法呢？

這次就來做個比較

實際比較：

1. 讀取的表大小約30多萬筆，經過篩選後剩約9萬2千筆資料作為新表的某欄位

2. 寫入資料的部分，使用 DB::table()->insert();

ps. cursor() 部分，因為10萬筆資料一次寫入DB，資料太大會造成 mysql 的錯誤，因此搭配 array_chunk() 來分割陣列，批次寫入

結果：

	時間(秒）	記憶體使用(KB)
`chunkById(1000)`	579.40	45039
`cursor() + array_chunk(1000)`	73.59	273212
`chunkById(3000)`	172.24	45039
`cursor() + array_chunk(3000)`	63.71	273212
`chunkById(5000)`	112.27	54810
`cursor() + array_chunk(5000)`	58.82	273212
`chunkById(10000)`	70.83	83494
`cursor() + array_chunk(10000)`	74.79	273212
`chunkById(13000)`	71.94	98873
`cursor() + array_chunk(13000)`	63.06	273212

再繼續增大批次的數量（1萬5），就會得到錯誤了：

SQLSTATE[HY000]: General error: 1390 Prepared statement contains too many placeholders

因此每次使用 chunk() 或者 cursor() 寫資料時，別超過 1萬筆來得好

chunk() 應該比較適合拿來做與資料庫的溝通，例如要修改資料時，他比較省記憶體

cursor() 則比較適合拿來做大資料的讀取，相較 chunk() 會比較快速，且相較 get() 也可以幫著節省一些記憶體

成為一個厲害的普通人

Laravel 寫入巨量資料方法實驗比較

chunk()

cursor()

實際比較：

結果：