日志系统从ELK到PLG
为什么要换掉ELK
话说使用ELK(包括Kafka和Filebeat)做日志统计分析已经有三年多了,用着还是不错的,功能很强大,当然学习门槛也挺高,当年搞了很久的grok,但更大的缺点是资源消耗有点大,所以在实际工作中,碰到资源紧张的环境就没法用了,有一点局限性。
最近听说Grafana搞了一个Loki,配合Promtail组成PLG日志系统还不错,于是打算试试。初步测试下来,PLG的资源占用的确少太多,主要还是因为ELK里ElasticSearch和Logstash都是JAVA写的,Logstash还用了JRuby,太吃内存:
ELK要能跑起来,基本要求:ElasticSearch-2G内存,Logstash-1G内存,Kibana-一两百M(node.js比JVM良心多了),Kafka是可选的不算,Filebeat-几十M(go更良心)
PLG则省心得多,三个加起来有个一两百M就能跑了……
PLG和ELK的对应关系基本上是这样:
- Grafana是页面前端,对应Kibana
 - Loki是数据存储和查询引擎,对应ElasticSearch
 - Promtail是日志收集分析客户端,对应Logstash+Filebeat
 
Loki和Grafana的安装
当然是用docker最简单,参见docker-compose.yml文件:
version: '2'
services:
  loki:
    image: grafana/loki:master
    container_name: loki
    restart: always
    ports:
      - 127.0.0.1:3100:3100
    volumes:
      - /var/lib/loki:/loki
      - /etc/loki:/etc/loki
  grafana:
    image: grafana/grafana:master
    container_name: grafana
    restart: always
    depends_on:
      - loki
    ports:
      - 127.0.0.1:3000:3000
    volumes:
      - /var/lib/grafana:/var/lib/grafana
注意,需要创建以下几个文件夹:
/etc/loki: owner/group为root
/var/lib/loki: owner/group为10001-容器里loki用户的ID
/var/lib/grafana: owner/group为472-容器里grafana用户的ID
loki的配置文件/etc/loki/local-config.yaml:
auth_enabled: false
server:
  http_listen_port: 3100
ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m       # Any chunk not receiving new logs in this time will be flushed
  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
  #  chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  max_transfer_retries: 0     # Chunk transfers disabled
schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h
      chunks:
        period: 24h
storage_config:
  boltdb_shipper:
    active_index_directory: /loki/boltdb-shipper-active
    cache_location: /loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: filesystem
  filesystem:
    directory: /loki/chunks
compactor:
  working_directory: /loki/boltdb-shipper-compactor
  shared_store: filesystem
limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h  # 7days
chunk_store_config:
  max_look_back_period: 2160h  # 90days
table_manager:
  retention_deletes_enabled: false
  retention_period: 2160h  # 90days
ruler:
  storage:
    type: local
    local:
      directory: /loki/rules
  rule_path: /loki/rules-temp
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true
基于默认的配置改了几个时间,具体参数定义详见官方文档。
现在可以启动了:
docker-compose up -d
启动完即可登录Grafana添加Loki数据源。默认的Grafana用户密码为:admin/admin。注意:Loki的地址为容器名:loki,不是localhost。
Promtail的安装和配置
直接在Loki的Release页面下载相应版本的Promtail,解压即可直接运行。
以读取Nginx日志为例配置promtail-config.yaml如下:
# Promtail Server Config
server:
  http_listen_port: 9080
  grpc_listen_port: 0
# Positions
positions:
  filename: /tmp/positions.yaml
# Loki服务器的地址
clients:
  - url: http://localhost:3100/loki/api/v1/push
scrape_configs:
  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: 'nginx-access'
          app: 'nginx-access'
          host: 'your_hostname'
          __path__: /var/log/nginx/*.access.log
    pipeline_stages:
      - match:
          selector: '{job="nginx-access"}'
          stages:
            - regex:
                expression: '^(?P<client_ip>[\w\.]+) - (?P<auth>[^ ]*) \[(?P<timestamp>.*)\] "(?P<verb>[^ ]*) ?(?P<request>[^ ]*)? ?(?P<protocol>[^ ]*)?" (?P<status>[\d]+) (?P<response>[\d]+) "(?P<referer>[^"]*)" "(?P<agent>[^"]*)"'
            - labels:
                client_ip:
                auth:
                timestamp:
                verb:
                request:
                response:
                referer:
                agent:
                status:
            - timestamp:
                source: timestamp
                format: "02/Jan/2006:15:04:05 -0700"
个人觉得虽然Logstash用的grok功能强大,但是使用难度也大,还是Promtail这种正则表达式的方式简单得多,基本上也够用了。
用supervisor加持一下运行:
promtail -config.file=/path_to/promtail-config.yaml
推送到[go4pro.org]