グローバルネットワーク fastlyスケーラブル 急成長を支える ......flow...

19
1 ©2019 Fastly’s Scalable Global Network Taiji Tsuchiya Fastly K.K. 急成長を支える Fastlyスケーラブル グローバルネットワーク

Upload: others

Post on 28-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • 1©2019

    Fastly’s Scalable Global NetworkTaiji TsuchiyaFastly K.K.

    急成長を支えるFastlyスケーラブルグローバルネットワーク

  • 2©2019

    About us

    • Provide Edge Cloud Platform

    • Founded in March 2011

    • HQ in San Francisco

    • 504 Employees (April 2019)

    – 33% of employees work remotely

    • Listed on NYSE (May 2019)

  • 3©2019

    Fastly’s Global Network• 66 POPs and 58 Tbps capacity, 75 IX points

    • Cache Servers: ~1,700

    • Daily Requests: 600+ Billion

    (As of 2019.09)

  • 4©2019

    Operations Team• Distributed team

    世界中に分散したリモートチーム

    – Most members are remote

    – Follow the Sun shifts– APAC 0:00-6:00 UTC

    (10:00-16:00 JST)

    – EMEA 06:00-12:00 UTC

    – US East 12:00-18:00 UTC

    – US West 18:00-24:00 UTC ● 24h 365d oncall● NetOps / CacheOps● Peering / Circuit turn-up● POP build● Tooling

  • 5©2019

    Agenda

    Fastlyが少人数の運用チームで、どのように

    大規模グローバルネットワークを運用しているかを紹介します。

    How do we expand/operate our global network with a small team?

    ● Scalable Network Architecture

    ● Network Automation

  • 6©2019 Confidential

    ScalableNetwork Architecture

  • 7©2019

    Fastly’s Scalable Network

    ● No Backbone

    ● No Router

    ● No Load Balancer

  • 8©2019

    No Backbone NetworkAll traffic flows via The Internet.

    ● Easy to add new POPs.● Easy to standardize Network configurations

    Transit Transit IX Peer

    AS54133

    The InternetAS54133

    The Internet

    AS54133

    The InternetAS54133

    The InternetEnd User

    ISP ContentOrigin

    AS54133

    The Internet

    AS54133

    The InternetIP anycast

  • 9©2019

    No Router Network Routing Software on the Switches.

    ● Router port is really expensive.Switch is reasonable to expand network globally.

    Switch (Arista EOS)

    Server Server Server

    userspace

    BGP service (BIRD)

    LBservice

    APIservice

    BGP(BIRD)

    BGP(BIRD)

    BGP(BIRD)

    Transit IX Peer

    eBGP

    eBGP

    Metricsservice

    Server

    BGP(BIRD)

  • 10©2019

    No (Hardware) Load Balancer NetworkInbound Traffic

    ● The Internet -> Fastly POP○ ISPs choose BGP best path

    ● Switch -> Cache○ ECMP Load Balancing

    + Fastly LB app(Faild)https://www.fastly.com/blog/building-and-scaling-fastly-net

    work-part-2-balancing-requests

    FastlyPOP A

    FastlyPOP B

    ISP

    Transit

    IP anycast

    Faild

    Server

    Faild

    Server

    Faild

    Server

    Faild

    ECMPSwitch

    Traffic

    Traffic

    IP anycast

    Sync● Health check● Generate routes

    to all servers

    https://www.fastly.com/blog/building-and-scaling-fastly-network-part-2-balancing-requestshttps://www.fastly.com/blog/building-and-scaling-fastly-network-part-2-balancing-requests

  • 11©2019

    No (Hardware) Load Balancer NetworkOutbound Traffic (Cache -> The Internet)

    ● ECMP Load Balancing per Transit/Peer

    ● Use MPLS paths

    to decide routing paths on Servershttps://pc.nanog.org/static/published/meetings/NANOG71/1438/20171002_Barroso_Developing_And_Evolving_v1.pdf

    172.20.0.0/24172.20.0.0/24

    Server

    Switch ATraffic

    Transit A Transit A

    Switch B

    ECMP

    Label A Label B

    Network Next hop MPLS Label172.20.0.0/24 Switch A Label A172.20.0.0/24 Switch B Label B

    https://pc.nanog.org/static/published/meetings/NANOG71/1438/20171002_Barroso_Developing_And_Evolving_v1.pdfhttps://pc.nanog.org/static/published/meetings/NANOG71/1438/20171002_Barroso_Developing_And_Evolving_v1.pdf

  • 12©2019 Confidential

    Network Automation

  • 13©2019

    One-Command Network Operations

    ● Control Peer Traffic

    ● Control Transit Traffic

    ● Drain Traffic for Transit/Peer maintenance

    ● Network Configuration

  • 14©2019

    Control Peer Traffic

    Traffic

    10.0.0.0/2410.0.1.0/2410.0.2.0/24

    10.0.0.0/2410.0.1.0/2410.0.2.0/24

    PeerSwitch

    PeerSlasher

    FlowCollector

    “netops slash --site XXX --provider XXX”

    10.0.0.0/24: 3Gbps10.0.1.0/24: 2Gbps10.0.2.0/24: 1Gbps

    - 3Gbps

    switch & interface

  • 15©2019

    Network Configuration Workflow

    Ansibly

    “ansibly deploy switch-XXX --commit”

    Peer

    Cache

    Build a new circuit● Call APIs● Send files● Validate Status

    Repository(GitHub)

    Review

    Datastore

    Pull RequestDay 0

    Day 1

    MergeCI tests

    Ansibly

    Dry run

    SwitchSwitchSwitchSwitch

    Datastore

    SwitchSwitchSwitchSwitch

  • 16©2019

    Next Step: Full Automation

    CallPerson

    Check traffic graph

    RunPeer Slash

    SwitchSwitch

    SwitchSwitch

    CallAPIs

    PeerSlasher

    RunPeer Slash

    SwitchSwitch

    SwitchSwitch

    Event Driven Platform(StackStorm)

    PeerSlasher

    Trigger

    Current

    Next step

  • 17©2019

    SummaryFastlyが少人数でグローバルネットワークを運用している裏側について共有しました。

    How Fastly handles Global Network Operations.

    ● Scalable Network Architecture○ No Backbone

    ○ No Router

    ○ No Load Balancer

    ● Network Automation○ Control Peer Traffic

    ○ Network Configuration

    ○ Full Automation

  • 18©2019

    Discussion• Any Comment or Question for Global Network Operation?

    コメントや聞きたいことがあればぜひ質問してください!

    • Please comment if you have any Tips and knowledge about Global Network Operations.グローバルネットワークの運用について、

    良い知見やTipsをお持ちの方がいらっしゃればぜひコメントいただきたいです!

    • Was this talk helpful for you?What is the point if you feel it’s hard to introduce these idea to your network?

    参考にしていただけそうなポイントはあったでしょうか?

    もし自社ネットワークへの導入が難しい場合、どのポイントがハードルになるでしょうか?

    何を解決できれば同じような仕組みを導入できそうでしょうか?

  • 19©2019

    Thank you!