Nginx在運行過程中會生成大量accesserror日誌,默認情況下日誌文件大小持續增長。爲方便操作(如打開、查詢),需對Nginx日誌進行切割,如按日期,而Nginx本身並不提供該功能。通過網路搜索到的方法千篇一律:利用NginxLog Rotation功能,操作日誌後,通過USR1信號重新打開日誌;將操作命令寫入腳本,通過cron任務定時執行。對於該種方案,本人不做任何評價。本文主要記錄如何通過awk實現Nginx日誌按預定方式進行切割。

Prerequisite

Nginx日誌格式通過指令log_format實現,默認的配置文件路徑爲/etc/nginx/nginx.conf

詳細說明見官方文檔Module ngx_http_log_module

log_format的默認格式爲

1
2
3
log_format main '$remote_addr - $remote_user [$time_local] '
'"$request" $status $bytes_sent '
'"$http_referer" "$http_user_agent" "$gzip_ratio"';

生成的日誌形如

1
75.97.107.190 - - [10/Aug/2017:13:05:50 +0800] "GET /smart HTTP/1.1" 200 6452 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36"

以按日期分割爲例,通過字符串[10/Aug/2017:13:05:50 +0800]作爲分割日誌的依據,日期的處理通過date命令實現,具體見man date

本文中awk的操作命令適用於版本3和版本4。關於awk的使用,本文不作說明,可自行查閱官方文檔Gawk: Effective AWK Programming

文中的操作不做條件判斷,如在某一天之後或某一個IP或某個關鍵詞。

Log Split

本文按照日期對日誌進行分割,假設日誌文件名稱爲~/access.log,結果輸出文件保存路徑爲/tmp,命名格式nginx-###.log

Yearly

按年份

1
2
# %Y year
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date="a" +\"%Y\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

此處/.* \[([^:]*):.*/匹配的是10/Aug/2017;建議使用/.* \[([^ ]*) .*/,匹配10/Aug/2017:13:05:50。同時將--date="a"更改爲--date=\""a"\"以增強代碼的 兼容性

修改後的代碼如下

1
2
3
4
5
# match 10/Aug/2017
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log
# match 10/Aug/2017:13:05:50
awk '{a=gensub(/.* \[([^ ]*) .*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-2017.log

Quarterly

按季度

1
2
# %q quarter of year (1..4)
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y\"-quarter-\"%q\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-2017-quarter-3.log

Monthly

按月份

1
2
3
# %b locale's abbreviated month name (e.g., Jan)
# %m month (01..12)
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y\"-\"%m\"-\"%b\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-2017-08-Aug.log

Weekly

按周

1
2
# %U week number of year, with Sunday as first day of week (00..53)
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y\"-week-\"%U\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-2017-week-32.log

Daily

按天

1
2
3
4
5
6
7
# %F full date; same as %Y-%m-%d
# %a locale's abbreviated weekday name (e.g., Sun)
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%F\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log
# file name has weekday name
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%F\"-\"%a\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-2017-08-10-Thu.log

Hourly

按小時

1
2
# %H hour (00..23)
awk '{a=gensub(/.* \[([^ ]*) .*/,"\\1","1",$0);a=gensub(/\:/," ","1",a);gsub(/\//,"-",a);"date --date=\""a"\" +\"%F\"-hour-\"%H\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-2017-08-10-hour-13.log

Per Minute

按分鐘

1
2
# %M minute (00..59)
awk '{a=gensub(/.* \[([^ ]*) .*/,"\\1","1",$0);a=gensub(/\:/," ","1",a);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y%m%d\"-\"%H:%M\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-20170810-13:36.log

Per Second

按秒

1
2
3
# %S second (00..60)
# %T time; same as %H:%M:%S
awk '{a=gensub(/.* \[([^ ]*) .*/,"\\1","1",$0);a=gensub(/\:/," ","1",a);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y%m%d\"-\"%T\"" | getline b; print > "/tmp/nginx-"b".log"}' ~/access.log

日誌文件的命名如nginx-20170810-13:36:09.log

Compression

gzip gz

如果日誌文件已經被壓縮保存,可先解壓,在通過管道符|將數據傳輸給awk

1
2
# .gz
zcat ~/access.log.gz | awk '{a=gensub(/.* \[([^ ]*) .*/,"\\1","1",$0);a=gensub(/\:/," ","1",a);gsub(/\//,"-",a);"date --date=\""a"\" +\"%F\"-hour-\"%H\"" | getline b; print > "/tmp/nginx-"b".log"}'

Example

以日誌/var/log/baseserver_access.log爲例,文件大小7.9G

按月份分割,操作命令如下

1
awk '{a=gensub(/.* \[([^:]*):.*/,"\\1","1",$0);gsub(/\//,"-",a);"date --date=\""a"\" +\"%Y\"-\"%m\"-\"%b\"" | getline b; print > "/tmp/nginx-"b".log"}' /var/log/baseserver_access.log

生成的文件如下

1
2
3
4
5
6
7
8
9
10
11
-rw-r--r-- 1 root root 101M Aug 10 14:37 /tmp/nginx-2016-10-Oct.log
-rw-r--r-- 1 root root 656M Aug 10 14:37 /tmp/nginx-2016-11-Nov.log
-rw-r--r-- 1 root root 729M Aug 10 14:37 /tmp/nginx-2016-12-Dec.log
-rw-r--r-- 1 root root 650M Aug 10 14:37 /tmp/nginx-2017-01-Jan.log
-rw-r--r-- 1 root root 735M Aug 10 14:37 /tmp/nginx-2017-02-Feb.log
-rw-r--r-- 1 root root 876M Aug 10 14:37 /tmp/nginx-2017-03-Mar.log
-rw-r--r-- 1 root root 831M Aug 10 14:37 /tmp/nginx-2017-04-Apr.log
-rw-r--r-- 1 root root 937M Aug 10 14:37 /tmp/nginx-2017-05-May.log
-rw-r--r-- 1 root root 1022M Aug 10 14:37 /tmp/nginx-2017-06-Jun.log
-rw-r--r-- 1 root root 1.1G Aug 10 14:37 /tmp/nginx-2017-07-Jul.log
-rw-r--r-- 1 root root 388M Aug 10 14:37 /tmp/nginx-2017-08-Aug.log

通過time得到的操作時間如下

1
2
3
real 40m45.080s
user 39m59.076s
sys 0m21.479s

將近8G的日誌,操作耗時超過40分鐘,原因是awk默認只使用單核心處理器。

Change Logs

  • 2017.08.10 16:39 Thu Asia/Shanghai
    • 初稿完成