AWS VPN High Availability

This is a refinement to my previous approach.  In previous model, there were two VyOS instances in every AWS region. In this model, there are only two VyOS instances in the hub region. All Amazon regions (including the hub region) connect to these VyOS instances. Each line below represents two tunnels. Amazon VPN comes with two tunnels. But both tunnels connect to the same server (VyOS) on the other end.

VPN2

Total cost comes down to (2 * $0.05 per hour * number of regions) + (2 * instance type for VyOS). In our deployment, I chose c3.2xlarge which is $0.42 per hour. For reserved instances that prices comes down to $.020+ cents per instances. For a total of four regions the cost per hour is (2 * 0.05 * 4) + (2 * 0.42) = $1.24 per hour (on demand instances). For 1 year reserved, the cost comes down to roughly $0.90 cents per hour. c3.2xlarge is probably bigger than what we need, but it has high network throughput.

Figure out your hub AWS region. Launch two VyOS AMI’s in two different availability zones

  • These should be in public subnet with public IP addresses
  • Enable termination protection if you want to be on the safe side
  • Change shutdown behavior to stop the instance (instead of terminate)
  • Disable source/destination checks (important)
  • Use a open security group until the configuration is done

Allocate two Elastic IPs (EIP) and associate them with the two instances

Upgrade VyOS to the latest version (accept the default values for all the prompts). Reboot after it done

$ add system image http://packages.vyos.net/iso/release/version/vyos-version-amd64.iso
$ reboot

In every region (including the hub), create two customer gateways (CGW), one for each VyOS instance

  • Use dynamic routing
  • Use a BGP ASN from private space (eg: 65000). Use the same value for all CGWs
  • Use the Elastic IP address of VyOS

Also in every region, create Virtual Private Gateway (VPG) and attach it to the VPC. And finally create two VPN connections (one for each CGW)

  • VPG should match the one created before
  • Routing should be dynamic

Once the VPNs are created, download the configuration for each one of them

  • Vendor: Vyatta
  • Platform: Vyatta Network OS
  • Software:Vyatta Network OS 6.5+

There is a lot common in all of these configuration files. Depending on the number of regions, you might end up with 2, 4, 6 or 8 configuration files. Separate the files into two groups. Ones that are associated with CGW1 and others for CGW2

$ ssh -i private-key vyos@elastic-ip-of-cgw
$ configure
set vpn ipsec ike-group AWS lifetime ‘28800'
set vpn ipsec ike-group AWS proposal 1 dh-group ‘2'
set vpn ipsec ike-group AWS proposal 1 encryption ‘aes128'
set vpn ipsec ike-group AWS proposal 1 hash ‘sha1'
set vpn ipsec ipsec-interfaces interface ‘eth0'
set vpn ipsec esp-group AWS compression ‘disable’
set vpn ipsec esp-group AWS lifetime ‘3600'
set vpn ipsec esp-group AWS mode ‘tunnel’
set vpn ipsec esp-group AWS pfs ‘enable’
set vpn ipsec esp-group AWS proposal 1 encryption ‘aes128'
set vpn ipsec esp-group AWS proposal 1 hash ‘sha1'
set vpn ipsec ike-group AWS dead-peer-detection action ‘restart’
set vpn ipsec ike-group AWS dead-peer-detection interval ‘15'
set vpn ipsec ike-group AWS dead-peer-detection timeout ‘45'

Next configure the interfaces. All VPN configurations refer to vti0 and vti1. But you cannot use the same VTI’s for multiple tunnels. So replace vti0/vti1 with vtiX/vtiY appropriately. Example:

set interfaces vti vti3 address ‘169.A.B.C/30'
set interfaces vti vti3 description ‘Oregon to Virginia Tunnel 1'
set interfaces vti vti3 mtu ‘1436'

set interfaces vti vti4 address ‘169.X.Y.Z/30'
set interfaces vti vti4 description ‘Oregon to Virginia Tunnel 2'
set interfaces vti vti4 mtu ‘1436'

In the site-to-site section of the downloaded configuration files, local-address will be set to the elastic IP address of VyOS. VyOS will not like that, because it does not know anything about the EIP. Change it to the local eth0 address (eg: 10.5.0.10). And apply the site-to-site configuration:

set vpn ipsec site-to-site peer X.Y.Z.A authentication mode ‘pre-shared-secret’
set vpn ipsec site-to-site peer X.Y.Z.A authentication pre-shared-secret ‘XX1'
set vpn ipsec site-to-site peer X.Y.Z.A description ‘Oregon to Virginia Tunnel 1'
set vpn ipsec site-to-site peer X.Y.Z.A ike-group ‘AWS’
set vpn ipsec site-to-site peer X.Y.Z.A local-address ‘10.A.B.C'
set vpn ipsec site-to-site peer X.Y.Z.A vti bind ‘vtiX'
set vpn ipsec site-to-site peer X.Y.Z.A vti esp-group ‘AWS’
...

Next configure BGP:

set protocols bgp 650xy neighbor 169.A.B.E remote-as ‘xyz1'
set protocols bgp 650xy neighbor 169.A.B.E soft-reconfiguration ‘inbound’
set protocols bgp 650xy neighbor 169.A.B.E timers holdtime ‘30'
set protocols bgp 650xy neighbor 169.A.B.E timers keepalive ‘30'
...

In my setup, I also changed the ntp servers and the hostname:

set system host-name my-hostname
delete system ntp
set system ntp server 0.a.b.ntp.org
set system ntp server 1.a.b.ntp.org
set system ntp server 2.a.b.ntp.org

Amazon instances only get a route for their subnet and not the entire VPC. If you check the output of show ip route, you will see a route for the VyOS subnet. Add a static route for the entire VPC. The follow example assumes you have a 10.X.0.0/16 VPC:

set protocols static route 10.X.0.0/16 next-hop 10.X.0.1 distance 10

Finally, configure the route/network BGP will advertise to the other end (Amazon). For BGP to advertise the route, the route should be in the routing table.

set protocols static route 10.0.0.0/8 next-hop 10.Y.0.1 distance 100
set protocols bgp 650xy network 10.0.0.0/8

Commit the changes and backup the configuration. And keep a copy of the configuration somewhere safe (not on the VyOS instances).

commit
save
save /home/vyos/backup.conf
exit

From the backed up configuration file, it is better to remove sections that are specific to the VyOS instance. This way, the configuration can be merged easily when instances need to be replaced later:

  • interfaces ethernet eth0
  • service
  • system

You can refer to VyOS documentation Wiki, but some commands I found useful:

show ip route
show ip bgp
show ip bgp summary
show ip bgp neighbor 169.A.B.E advertised-routes
show ip bgp neighbor 169.A.B.E received-routes
show vpn debug

At this point, all VPN tunnels in all VPC’s should be green. And they should be receiving exactly 1 route. Modify all the VPC route tables and enable route propagation. All instances should be able to reach other instances irrespective of which VPC they are in.

If it is necessary to replace a VyOS instance:

  • Kill the instance that is being replaced
  • Create another instance in the same public subnet with the same private IP
  • Choose the correct security group and SSH key
  • Disable the source/dest checks
  • Reassign the EIP from the old instance
  • SCP the backup configuration file to the new VyOS instance
  • SSH to the instance:
$ configure
$ delete system ntp
$ commit
$ merge /home/vyos/backup.conf
$ commit
$ save
$ exit

There are 4 tunnels from each VPC to the hub. If one VyOS box dies, traffic will start flowing through the other one. Start ping from an instance in VPC1 to another instance in VPC2. While this is running, reboot VyOS1 instance. You should see minimal disruption. Once the VyOS1 box comes up, reboot VyOS2, traffic should fail over appropriately.

Finally modify the security group/NACLs. NTP uses 123/udp (inbound and outbound). IPsec uses 500/udp and ESP/AH IP protocols (inbound and outbound). BGP uses 179/tcp. And of course you want SSH (22/tcp) open as well. You can modify the security group/NACLs by port/protocol. Another option is to whitelist the Amazon VPN tunnel IP address and let all traffic from those IPs.

AWS VPN High Availability

23 thoughts on “AWS VPN High Availability

  1. jimrippon says:

    Hi Seshu,

    Thanks for the really useful walkthrough, I came across your blog part-way through implementing an almost identical solution.

    I wonder if you came across a situation where Amazon were re-using the same Private ASN in their BGP advertisements from two different VPCs? I am seeing that routes aren’t propagating between VPCs when the same ASN is set for the remoteas and hoping there is an obvious solution I am missing?

    Jim

    Liked by 1 person

  2. Jim, we did not run into that issue. For each region, we got different ASN. If you want routes to have multiple hops, you need to update VyOS bgp neighbor section:

    ebgp-multihop 2
    update-source 169.2xx.2yy.abc

    Like

  3. Faced the same problem as Jim described. I was able to connect two vpcs located in California and Ireland with routing working fine, but then I also connected Sydney which was able to establish ipsec, but no routing was happening. What I found is that the remote-as number for California as well as Sydney were 72249. since this is not in our control, it can mess up the bgp routing. Still trying to find a solution to it.

    Like

  4. @Seshu, thanks for the reply, just saw it today. For me, Only Ireland is getting routes from other two regions, both sydney and clifornia – with as 7224 – are getting only one route ( of ireland ). I raised a ticket with amazon support and this is what their response summary is –
    ——
    Unfortunately, we do not have separate AS number for every regions. Some regions do use common AS number.

    The only thing you can do here is to remove the AS number 7224 (by overriding AS Path) before advertising it to other peers. You need to perform this on Vyatta device. This way other BGP peers will accept the prefixes advertised (since they do not see their own AS number in the AS PATH) and will install them in route table.

    I looked for command on Vyatta CLI which can override AS_PATH but was not able to find any. You can contact Vyatta support if they have any specific advice for your scenario.

    Other alternative, if it is feasible for you, may be to use Cisco CSR 1000v in Oregon VPC. It does support “as-override” which will serve your purpose.
    ——

    if you don’t mind, can you please post your bgp summary from vyos?

    Like

  5. Antoine says:

    Let’s say I have a direct connect coming in the hub vpc, how do I manage to have packets going between the spoke vpcs and the network at the other side of direct connect ?

    Like

  6. Safeer, if I understand you correctly, you are letting your routes propagate all the way through. In our case, we are advertising the whole 10/8 subnet to all regions. May be that is the difference!

    Like

  7. Seshu, You are right. I am tryign to use bgp to propogate the routes between vpcs. Lemme try reconfiguring it with static route. In this configuration –
    set protocols static route 10.0.0.0/8 next-hop 10.Y.0.1 distance 100
    set protocols bgp 650xy network 10.0.0.0/8
    What would be 10.Y.0.1 ? I understand 10.X.0.0/16 is the vpc in which vyos instances are hosted.

    Like

  8. JSull says:

    Can anyone tell me if it is possible to setup the Vyos instance and use it to terminate a VPN into the VPC WITHOUT using the Virtual Gateway? I can’t get the Vyos config to allow me to set eth0 to anything other than DHCP without locking myself out (which I believe Seshu mentioned several times.)

    Like

      1. JSull says:

        I can specify it in the instance and when I run an ifconfig I see that eth0 is set to the correct IP, but when I go into the Vyos config and try to remove DHCP and add the private IP boots me out of SSH (makes sense since WAN link is changed) and then I can’t get in till I restart and the non-saved config file rolls back. I am probably missing something stupid here.

        BTW, your site is extremely useful and well created. Thank you!

        Like

  9. About adding a route for the entire VyOS’ VPC you write:

    set protocols static route 10.X.0.0/16 next-hop 10.X.0.1 distance 10

    In that stanza, does the 10.X.0.1 stand for the VyOS’ own internal IP address?

    Like

    1. Amos, no it is the destination for subnet default router for Vyos. If Vyos is in 10.5.6/24 subnet, Amazons black box route destination will be 10.5.6.1.

      Like

      1. Thanks. That solved a problem I had accessing a local subnet. Now I still have to figure out why VyOS doesn’t forward traffic from any other instance on its side of the tunnel over the ipsec tunnel to the other side.

        Like

  10. Chris says:

    Hey Seshu,

    This post was really helpful and awesome. It’s a huge improvement over my single openswan instance with static routes that I was using before to connect my vpcs across regions. I can’t believe we’re almost half way through 2017 and AWS still has no service for connecting vpcs across regions!

    Anyhow, one question I had: Let’s say I have 2 regions in Europe and 2 regions in the US that I wanted to connect all together. In your above scheme, all the regions would be connected through the single hub region. Let say, I created my hub in a US region. Then traffic between my two regions in europe would have increased latency since they have to travel through the US hub first. I’m curious if you solved this problem at all? Is it possible to have multiple hubs, but still have connectivity to all regions?

    Best,
    Chris

    Like

      1. Chris says:

        Hey Seshu,

        Sorry, I mean like having 4 different regions. Let’s say:
        us-east-1
        us-west-2
        eu-west-1
        eu-west-2

        If I put my VPN hub in one of the US regions, then VPN traffic between my two EU regions would have increased latency since it would have to go through the US first. I don’t believe VPC peering would work since these are all separate regions. VPC peering only works within a single region. Anyhow, was curious if there was such a way to take your scheme above but have two different hubs (one in EU and one in US) but still have all the regions networked together. Something kind of like this: https://d0.awsstatic.com/aws-answers/answers-images/regional-transit-vpc-corporate-network.png (but without the whole corporate datacenter piece)

        -Chris

        Like

  11. ashapira says:

    Chris I use virtual GW as the hub on my setup.
    In your case I’d create a hub in each continent and let bgp figure out the best route.

    See https://github.com/amosshapira/thermal for a working automated example of how I do this (I use single VyOS instance on each spoke but have an AutoScalingGroup watching it and bringing it up automatically if it stops).

    Like

  12. Smiley says:

    For information, today we can change AS number of a VPC, in AWS side : see in Virtual Private Gateway details (it can be specified when created). However, it’s necessary to create a new one, so VPN connections may need to be reconfigured…

    Like

Leave a comment