This tutorial is intended for beginners who aren’t familiar with EC2 yet, but are generally familiar with mongoDB. EC2 is actually pretty easy, but a lot of the basic info you need to get started is interspersed across numerous websites and articles. This post hopefully puts all the necessary details in one place.
The first thing to understand is that every EC2 instance runs an AMI (Amazon Machine Image) which is basically a bundle of one or more EBS (Elastic Block Storage) snapshots. The physical machine that your instance is hosted on has build in hard drive space, but it isn’t persistent. When you shut down or reboot the server whatever is on that disk will be wiped. Amazon already has a database of community AMI’s including basic Ubuntu installs. We can use one of these, then install the necessary packages, update configs, etc. and save the configured snapshot as our own AMI. Problem is, when you search the community AMI’s for ‘ubuntu’ you get some 500 results, so which one do we pick? http://alestic.com is a good resource for things related to EC2 and Ubuntu and they have a list of ‘official’ AMIs from Canonical. I’m basing my EC2 instance in amazon’s us-east1 data center so the AMI identifier for Ubuntu 11.04 EBS 64bit is ami-1aad5273. If your EC2 instances are located somewhere else, you’ll need the corresponding AMI identifier for that data center, which can be found on alestic.com
To start off, you can follow the EC2 getting started guide, except instead of the Basic Linux AMI you can use the Ubuntu AMI that I mentioned above. There’s also no need to terminate the instance at the end since we’ll just roll right into customizing this instance for MongoDB.
I like to start but getting any system updates that have come out since the AMI was created:
sudo apt-get update sudo apt-get upgrade
I also like to install the linux tools dstat and htop to monitor system performance.
After following Amazon’s Getting Started Guide you should have a blank Ubuntu box and be SSH’ed into it. The linux root partition is usually an EBS volume and I like to make a second EBS volume that I can mount for just the mongodb database directory. This way I can detach the database volume and move it to another running instance. So go into the AWS Management Console and click on Volumes on the left. Create a new volume that has ample space for your database. You can’t resize these things so leave room to grow. After you create the EBS volume you need to attach it to your EC2 instance and define a mount point. I usually use /dev/sde.
Next, let’s log into the EC2 instance by ssh. We need to format the new volume, mount it, and add it to /etc/fstab so it auto-mounts when we restart. (note: on Ubuntu Natty 11.04 the drive ends up appearing as /dev/xvde, but on older systems and other flavors of linux it might still be /dev/sde)
sudo mkfs -t ext4 /dev/xvde
I’m going to mount my new volume at /db
sudo mkdir /db sudo vim /etc/fstab
add the following line to the bottom of your /etc/fstab
/dev/xvde /db auto noatime,noexec,nodiratime 0 0
We can either restart to auto-mount it or we can manually mount it now using
sudo mount /dev/xvde /db
Now lets install mongodb. Here are the official docs.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10 sudo vim /etc/apt/sources.list deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen sudo apt-get update sudo apt-get install mongodb-10gen sudo mkdir /db/mongodb sudo chown mongodb:mongodb /db/mongodb
Now lets edit /etc/mongodb.conf and change the location of the database. Near the top change dbpath so it looks like this:
dbpath=/db/mongodb
I also like to change my oplogSize to something larger than the default so if a secondary instance is down I have longer to bring it back up before it becomes too stale to re-sync. I also recommend turning on journaling to prevent data corruption.
oplogSize = 10000 replSet = myReplicaSet journal = true
If you’re using a hostname in the replica set configuration instead of the IP address, you need to configure that in /etc/hostname and /etc/hosts
/etc/hostname:
db1
/etc/hosts:
127.0.0.1 db1 db1.mydomain.com localhost.localdomain localhost xxx.xxx.xxx.xxx db1 db1.mydomain.com
(where xxx.xxx.xxx.xxx is this machine’s IP address that you use in the replica set config. Usually the elastic IP.)
After changing hostname information you’ll need to restart the instance for it to take affect.
You need to add a hole in the EC2 firewall for the other replica nodes. Do this by going to the Security Groups section of the EC2 dashboard. Click on the security group you’re using and add a custom line TCP from port 27017, with /32 as the IP address for each node. (where xxx.xxx.xxx.xxx is the instances IP address). Each node of the replica set needs to be able to access every other node of the replica set. Best way to do this is use the same security group for all of them and add all IP addresses to the allowed list.
When you have the instance basically set, go back into the AWS control panel, right click the instance and choose Create Image. You can start up any number of these for the replica set, but you need to change the /etc/hostname and /etc/hosts file to reflect the individual IP address and hostname of the bot (db1, db2, db3, etc.)
From here on the instructions in MongoDB Replica Set Configuration docs are valid. You don’t need to specify the replSet name on the command line since we already set it in the config file. mongoDB should be already running, but you can restart it with /etc/init.d/mongodb restart if you change any configuration parameters.
9 responses to “How-to Set Up Ubuntu w/ MongoDB Replica Sets on Amazon EC2”
Excellent article. I didn’t need to set up a replica set at this point yet, but this was perfect getting getting and will be a big help later on.
(where xxx.xxx.xxx.xxx is this machine’s IP address)?
Is this the Private IP Address or a Elastic Ip Address
This is the elastic IP, what you’ll be putting in your replica set config. I’ll edit the text to clarify. Thanks for the comment.
Zac – I tried implementing this on an AWS EC2 and followed your instructions closely (changing the name of the disk, of course, as it would apply to my ami).
The problem I had was that the root directory is mounted on a relatively small 8.5G disk. I mounted the larger disk (about 450G) onto /db, per your instructions and started filling up that Mongo DB nicely for a while… until the root directory got “full” (and now I can’t do anything on the server except watch the Mongo DB get bigger).
What’s happening, in your opinion, is it that the root (/) is mounted onto a smaller disk than the /db directory? Or is it that I should configure Mongo differently as to have it not overwhelm the smaller disk somehow? Your insight is greatly appreciated…
Hey Zed, first make sure that the bigger disk you mounted in /db is correctly mounted. An easy way to do that is by running the command “df -h” which will give you the list of all devices mounted in your filesystem, the size of them, and what percent is used. Second, make sure that you changed the location of your database in /etc/mongodb.conf so that it uses /db instead of the default location. If the mongo database is, in fact, using the larger volume mounted at /db, then there might be something else filling up your root directory. You can play around with the command “du -sh /” in various directories to see how much stuff is in there.
MongoDirector (www.mongodirector.com) is a great hosting solution for MongoDB on Amazon EC2. It completely automates the entire process of deploying and managing Mongo replica sets and shards using a simple two step wizard. You can pick the number of replicas and shards and the regions in which you want to place them. Provisioned IOPS and RAID can be used for optimal performance. Automatic backups can also be configured. LVM snapshots are used for backup – so backups take the same amount of time irrespective of the size of data.
how to set up redhat with mongodb replica set on amazon ec2
Based on this post, the replicas you create from the image will mount the EBS volume you created right? But there is documentation online that specifies that a EBS volume should be attached to only ONE EC2 instance
Hi, I am following the instructions but getting this error in master mongod log file
NETWORK [ReplExecNetThread-0] Failed to connect to xx.xx.xx.xx:27017, reason: errno:111 Connection refused
2015-07-13T13:06:44.202-0500 W REPL [ReplicationExecutor] Failed to complete heartbeat request to host.domain.com:27017; Location18915 Failed attempt to connect to host.do$