Cannot Use Task IAM Roles in ECS

svozza · September 28, 2017, 3:35pm

I am currently running Drone in an ECS cluster and have created the Drone server and Agent as separate ECS tasks. Our build pipelines provision both infrastructure and deploy code all in the same pipelines. This means that the Drone agents need very liberal permissions when interacting with AWS in order to create and destroy infrastructure.

I had initially put these permissions at the EC2 instance level but obviously that means that everything in the cluster (there are other non-Drone related containers running) now has these crazy permissions. I decided to change the cluster to only give minimal permissions to the Drone server and other unrelated containers and the more elevated permissions to the agents.

However, unfortunately, in order to use Task IAM roles an environment variable in the container called AWS_CONTAINER_CREDENTIALS_RELATIVE_URI needs to be set dynamically based on the UUID of the task (more info here). The problem, of course, is that this is available in the agent container but not in any of the pipelines’ containers, thus, any AWS cli commands (both the CLI and SDK rely on this variable) that rely on the permissions set in the Task IAM role will fail unless they are also present in the EC2 instance the container is running on. This is obviously suboptimal from a security point of view.

There is also a way to set an IAM role at the ECS service level (services are composed of one or more tasks) but unfortunately that only works if the service has a load balancer; having to assign a load balancer to every service regardless of whether they need one or not also feels suboptimal. Is there a way around this that I’m not seeing?

bradrydzewski · September 28, 2017, 4:21pm

I have no experience with ECS, but teams using Kubernetes generally deploy agents with a docker:dind container, linked to the agent container. Drone is then configured to use the docker:dind container instead of on the host machine docker daemon.

Perhaps this approach could be used with ECS to workaround these issues.

svozza · September 29, 2017, 8:05am

Thanks for the quick reply, much appreciated! That sounds very complicated so I think I’ll just segregate the Drone server and agents in their own specific cluster.

omerxx · September 29, 2017, 9:32pm

Hi @svozza

I deployed Drone on ECS, it’s a cluster that hosts different services. I use an application load balancer (ALB) on top of the cluster. Each service has it’s role and drone (server and agents) use the role with no problem.
I can explain how I deployed it if you’d like.

I’d also like to hear did you deploy to them as different services and set the communication between them.
You’re welcome to talk to me - omer@devops.co.il

ktruckenmiller · October 2, 2017, 5:13pm

Do you have any code that you can reference for the dind-> drone-agent linking?

ktruckenmiller · October 2, 2017, 5:15pm

I don’t think it’s possible with the ALB right now, since you need TCP + HTTPS for incoming for drone. I just tried this with an NLB + ALB in AWS and it didn’t work as ECS doesn’t want you to use two different load balancers.

If anyone has any insight, I would love it, as I’m keen on getting my setup to 0.8.

omerxx · October 2, 2017, 6:49pm

My current setup in ECS is having drone deployed as a task that has drone server and a few agnts in it. They are linked and communicate over TCP directly.
This is not ideal though, I’m waiting for a change, either AWS ALB start supporting HTTP/2, or TCP. Or maybe drone would have another communication option setup.

ktruckenmiller · October 2, 2017, 8:20pm

ALB does support HTTP/2, but not both HTTP2 and TCP at the same time on different ports.

I am able to get away with multiple agents on a docker swarm setup that I have, but with ECS there is no solution for this that can scale effectively without an overlay network of custom origin.

omerxx · October 2, 2017, 9:46pm

No, ALB only support incoming HTTP/2 but then translates them to HTTP/1.
Which is why gRPC cannot communicate. I’ll be happy to learn otherwise.
Full HTTP/2 support can be a solution. As well as TCP mode

svozza · October 3, 2017, 10:42am

Thanks for the further replies, I’m only getting a chance to reply now. So I have deployed the Drone sever as an ECS service and the agents as a separate ECS service. That’s the only way to do it if you want to have multiple agents talking to one server. The agents service doesn’t need a load balancer (they know how to communicate to the Drone server using the DRONE_SERVER environment variable) so that’s why I can’t use the service role approach. My colleague has put the CloudFormation for a very similar cluster here so you can see the setup I’m discussing here:

github.com

robertstettner/drone-on-ecs/blob/master/template.yml

---
Description: Drone Continuous Integration (drone.io)


Parameters:

  VPC:
    Type: AWS::EC2::VPC::Id
    Description: The VPC that needs provisioning

  PublicSubnets:
    Type: List<AWS::EC2::Subnet::Id>
    Description: The subnets that the load balancer will cover

  PrivateSubnets:
    Type: List<AWS::EC2::Subnet::Id>
    Description: The subnets that the ASG will cover

  KeyName:
    Type: AWS::EC2::KeyPair::KeyName

This file has been truncated. show original

That example is running Drone 0.5 but I’m running 0.8. I’m not sure I follow on why you need TCP, I have the full Drone cluster just running with one ALB, which is a layer 7 load balancer that only operates at the HTTP/S level.

ktruckenmiller · October 3, 2017, 5:44pm

Can we see your cloudformation?

svozza · October 3, 2017, 6:53pm

Unfortunately, my template is in our private company github repo so I can’t but it is virtually identical to the one I’ve posted. The only difference is that I’m using drone 0.8.

ozbillwang · October 5, 2017, 11:17am

Could you give details on how you set ALB with Drone?

I explained my environment here:

http://discuss.harness.io/t/what-are-these-errors-meaning-in-drone-servers-log/603/6?u=ozbillwang

So I am curious how you can work it out with ALB.

ktruckenmiller · October 5, 2017, 12:17pm

^^

Yeah I was going to ask, it can’t be the same because they changed the ports in drone, and its now gRPC instead of websockets

svozza · October 5, 2017, 1:30pm

OK, that’s weird. I’m using websockets and it still works. I have set DRONE_SERVER to wss://drone.example.com/ws/broker and am forwarding any traffic sent to the load balancer on port 8000 to the Drone server task.

svozza · October 5, 2017, 1:35pm

Ah wait, my mistake. I’m actually on Drone 0.7.

Topic		Replies	Views
Drone Autoscaler Drone Support	11	1550	June 15, 2019
I wrote a medium post explaining my Drone deployment using ECS + EB Drone Support	5	2161	April 26, 2018
Update Amazon SDK for plugins/docker Drone Support	2	363	April 13, 2020
AWS Plugins using IAM Roles (No AWS Keys) Drone Support	2	2002	January 24, 2018
Ideal AWS infrastructure from maintainers POV Drone Support	0	269	November 4, 2019

Cannot Use Task IAM Roles in ECS

Related topics