Job Scheduling on AWS - Part 2 - Extending on premise job scheduler to AWS

This blog is the second in the series on Job scheduling in AWS. The first one focused on scheduling jobs using AWS native services. This blog focuses on an architecture to extend the on premise job scheduler to AWS. 

(Link to the first blog - https://ideasforcloud.blogspot.com/2021/06/job-scheduling-on-aws.html)

A common requirement when you have Hybrid infrastructure (On premise data center + AWS Cloud), is to have a single job scheduler  to schedule jobs on both the on premise and AWS Cloud environments. (Note - As I had indicated in the Part 1 blog, if your requirement is to source the data from an on premise data source, AWS Data Pipeline offers capabilities which helps you do that. You can install an AWS Data Pipeline Task Runner on your on premise server that can help manage your on premise data source). 

Here are the steps you can follow to use your on premise scheduler to schedule jobs on AWS. I have outlined the steps for setting this on 1 AZ. However depending on your DR and HA needs, you can make this multi-AZ.



Step 1 – Choose an instance type as per the job scheduler tool requirement and install your job scheduler agent on your Amazon EC2. Install all required software’s and update on that Amazon EC2. Create an AMI.

Step 2 – (Optional) – Create an Amazon RDS instance which will be used to hold the scheduler metadata. If you already have one on premise and if that is sufficient, you can skip this step.

Step 3 – Create your Amazon EC2 Auto Scaling Group with MIN=MAX=1. This parameter ensures that you will 1 EC2 instance always available.

Step 4 – Create an ‘Internal facing’ AWS Application load balancer (OSI Layer 7). This load balancer is internal facing and will not have access to internet. In addition, this load balancer provides a DNS name that can be used instead of the Amazon EC2 private IP. This ensures that your on premise job scheduler controller can connect to the agent via this DNS name and does not have to worry about changes to the Amazon EC2 private IP, if your Amazon EC2 terminates or a new one is created.

Note – Another way to maintain a constant IP will be to create an Elastic IP. However, Elastic IP is Public, and organization security policies do not recommend Public IP addresses unless really required. This is the reason I have chosen an internal facing load balancer here.

Step 5 – Open the appropriate AWS Security Groups to open a communication channel between your on premise controller and the agent on EC2.


Putting this all together –

As you were doing with your on premise job scheduler, you can continue to schedule jobs on the on premise controller. For scheduling and managing jobs on AWS, the on premise Job scheduler controller will connect and coordinate with the agent on the Amazon EC2. The agent will report the job run status back to the controller. If the Amazon EC2 fails or terminates, the Auto Scaling parameter ensures that a new Amazon EC2 is brought up using the AMI provided. The optional Amazon RDS ensures that any metadata that is needed by the scheduler is available when the new Amazon EC2 is ready. 


~ Narendra V Joshi


Comments