media-nommer is a distributed media encoding system that gets its horsepower from Amazon AWS. This leaves you with a cheaper, more reliable alternative to some commercial paid encoding systems, since you are only paying for your Amazon AWS usage. The software is powered by a combination of Python, Twisted, and Boto, and is licensed under the flexible BSD License.
media-nommer is of interest to you if you need a way to encode media in a cheap, fast, and scalable manner. By using Amazon AWS‘s excellent EC2 service, you can automatically scale your number of encoder instances up to meet your demands. Of course, the entire process may be configured, limited, and adapted to meet your needs. Some example usage cases where media-nommer might be a good fit:
To learn more about media-nommer, see the An Introduction to media-nommer.
Project Status: Beta
License: media-nommer is licensed under the BSD License.
These links may also be useful to you.
media-nommer is a Python-based, distributed media encoding system. It aims to implement much of the same services offered by commercial encoding services. The primary advantage afforded by media-nommer is that one only pays for what they use on Amazon AWS, with no additional price mark-up. It may also be completely customized to meet your needs.
There are any number of encoding services around the web, some of which are cheap and very easy to use. We (DUO Interactive) were searching for a suitable fit for two different projects that an extreme amount of encoding scalability. However, the more we looked into it, the more we struggled with picking a service that had a good balance of affordability and permanence. We also were looking for a little more customizability in the encoding process.
Many encoding services use software such as FFmpeg, and encode media to codecs that they have not paid licensing fees for. If the service has any degree of success, it is only a matter of time before MPEG-LA and others come knocking to get their licensing fees. This often spells end-of-game for the smaller (often cheaper) services. On the other end of the spectrum, some 100% legal services have to raise their prices up to cover the large licensing fees, putting it outside the realm of what we’d like to pay. A good read on this subject can be found on the FFmpeg legal page.
For those of us who don’t need to encode to royalty-encumbered formats, a cheaper, more customizable, and more permanent solution was to just write our own. The result of all of this was the creation of media-nommer, a Python-based media encoding system with scalability and flexibility in mind.
media-nommer is comprised of two primary componenets, feederd and ec2nommerd.
Distributed encoding requires a director or orchestrator to keep everything running as it should. This is where the feederd daemon comes into play. Here are some of the things it does:
feederd may be ran on your current infrastructure, or on EC2. Your EC2 encoding nodes are never in direct contact with feederd, and instead communicate through Amazon AWS. This means no firewall holes need to be opened.
Where feederd is the manager, ec2nommerd is the worker. Your basic unit of work is an EC2 instance that runs ec2nommerd. This daemon simple checks a SQS queue for jobs that need to be encoded, and does so until the queue is empty. After a period of inactivity, your EC2 instances will terminate themselves, saving you money.
Due to the early state of this project, there is a good chance that parts of these instructions are out of date at any given time. If you run into any such issue please let us know on our issue tracker.
Note
The following instructions are directed at what your eventual production environment will need. See Hacking on media-nommer for details on setting up a development environment.
For the sake of sanity, good deployment practices, and uniformity, we’re going to make some assumptions. It is perfectly fine if you’d like to deviate, but it will be much more difficult for us to help you:
For the sake of clarity, the media-nommer Python package contains the sources for feederd and ec2nommerd. You will want to install this package on whatever machine you’d like to run feederd on. You do not need to do any setup work for ec2nommerd, as those are ran on an EC2 instance that is based off of an AMI that we have created for you.
Since we are not yet distributing media-nommer on PyPi, the easiest way is to install the package is through pip:
pip install txrestapi twisted
pip install --upgrade git+http://github.com/duointeractive/media-nommer.git#egg=media_nommer
Note
If you don’t have access to pip, you may download a tarball/zip, from our GitHub project and install via the enclosed setup.py. See the requirements.txt within the project for dependencies.
We have to install Twisted before media-nommer because of an odd behavior with pip.
Tip
Any time that you upgrade or re-install Twisted, you must also re-install media-nommer.
After you have installed the media-nommer Python package on your machine(s), you’ll need to visit Amazon AWS and sign up for the following services:
It is important to understand that even if you already have an Amazon account, you need to sign up to each of these services specifically for your account to have access to said services. This is a very quick process, and typically involves looking over an agreement and accepting it.
Tip
Signing up for these services is outside the scope of this document. Please contact AWS support with questions regarding this step. Their community forums are also a great resource.
Fees are based on what you actually use, so signing up for these services will incur no costs unless you use them.
After you’ve signed up for all of the necessary services, there are a few steps to work through within the AWS Management Console. Sign in and take care of the following.
You will need to create a media_nommer EC2 Security Group through the AWS management console for your EC2 instances to be part of. You don’t need to add any rules, though, unless you want SSH access to the encoding nodes.
Warning
Failure to create an EC2 security group will result in media-nommer not being able to spawn EC2 instances, which means no encoding for you.
Click on the EC2 tab and find the Key Pairs link on the navigation bar. You can then either create or import a key pair. Make sure to keep track of the name of your keypair, as you’ll need to specify it in your media-nommer configuration later on.
Warning
Failure to create or import an SSH key pair will also lead to media-nommer being unable to spawn EC2 instances.
Once media-nommer is installed, create a directory to store a few files in. Within said directory, create a file called nomconf.py. You will want to add these settings values (at minimum):
# The AWS SSH key pair to use for creating instances. This is just the
# name of it, as per your Account Security Credentials panel on AWS.
EC2_KEY_NAME = 'xxxxxxxxx'
# The AWS credentials used to access SimpleDB, SQS, and EC2.
AWS_ACCESS_KEY_ID = 'YYYYYYYYYYYYYYYYYYYY'
AWS_SECRET_ACCESS_KEY = 'ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ'
# The S3 bucket to store a copy of ``nomconf.py`` for nommer instances.
# NOTE: CREATE A BUCKET AND CHANGE THIS VALUE BEFORE YOU START!
CONFIG_S3_BUCKET = 'change-this-value'
# The AWS security groups to create EC2 instances under. They don't need
# any rules set.
EC2_SECURITY_GROUPS = ['media_nommer']
See media_nommer.conf.settings for a full list of settings (and their defaults) that you may override in your nomconf.py.
You will now want to start feederd using whatever init script or init daemon you use. We have had good results with Supervisor, but you can use whatever you’re comfortable with. Make sure that the server running feederd is reachable by the machines that will be sending API requests to start encoding jobs. Here is an example command string, assuming you’re in your top-level media-nommer directory (not media_nommer):
PYTHONPATH=media_nommer twistd -n --pidfile=feederd.pid feederd
From here, you’ll want to select and start using a Client API Libraries. These are what communicate with feederd‘s RESTful web API, and ultimately help get stuff done.
Distributed encoding is a complex task, with lots of moving parts and components. To create, monitor, and “motivate” our EC2 encoding instances during the process, we have feederd.
feederd is a Twisted plugin that runs as a daemon on the machine of your choice. This can be on your own bare metal server, or on an EC2 instance. The daemon keeps track of encoding jobs through Amazon SimpleDB, tells your EC2 encoder instances to get to work via Amazon SQS, and exposes a lightweight JSON web API for your own custom applications to schedule and manage jobs through.
If feederd is the brains of the operation, ec2nommerd is the brawn. feederd simply saves job state data to SimpleDB and creates SQS queue entries to let your EC2 instances know that there’s work to be done. Each of your EC2 instances run an ec2nommerd daemon, which pulls jobs from the queue (or terminates the instance if there isn’t any work to be done).
Along with providing an entry point to start, manage, and track the encoding process, feederd also handles scaling your encoding cloud up as needed. Based on your configuration, feederd will look at the current list of jobs that need to be encoded in comparison to the number of encoding instances you currently have running on EC2. It will then decide whether it should spawn additional instances to handle the load.
EC2 instances are spawned from a public AMI that we maintain as part of the project. If you wish to create your own custom AMI, you may easily specify your own or clone and modify ours.
Automated scaling is an optional feature, and can be configured and restricted in a number of different ways. For example, perhaps you don’t want any more than one or two EC2 instances running at any given time.
To actually get jobs scheduled and work done, you will need to communicate with feederd through its JSON API.
One of the requirements for media-nommer was that at no time would feederd and one of its ec2nommerd instances communicate directly. As noted earlier, job state information is saved and retrieved from SimpleDB by both feederd and ec2nommerd instances. SQS is used to delegate work to the ec2nommerd instances. This does a few things for us:
There is a lot of code that feederd and ec2nommerd shares, which can all be found in the media_nommer.core module. This is best thought of as a lower level API that these two components use to save and retrieve job state information.
Source for feederd can be found in the media_nommer.feederd module.
If feederd is the manager, ec2nommerd is the worker. You can also think of the relationship in that feederd feeds your nommers.
ec2nommerd is a Twisted plugin that runs as a daemon on EC2 instances in the cloud. These instances are automatically created by feederd. If an ec2nommerd finds itself with available encoding capacity, it will hit an Amazon SQS queue to see if there is any work available. If it finds anything, it pulls the job details from SimpleDB and hands the job off to the appropriate nommer for encoding.
If, after a period of time, the daemon receives no work to be done, it can be configured to terminate itself to save money.
Nommers are classes that contain all of the logic specific to different kinds of encoding workflows. For example, the FFmpegNommer nommer wraps the FFmpeg command for encoding. Each nommer sub-classes the BaseNommer class, which provides some foundational methods. View the source for these two classes for examples on how they work. You may sub-class or create your own nommer.
The following Nommer classes are currently included with media-nommer. See each for details on how the job_options key should look like in your API calls.
One of feederd‘s roles is hosting a web-based JSON API for your custom applications to communicate through. With this API, you may do such things as:
By using a simple web-based API, you are free to either use one of the existing client API libraries, or write your own.
The following API client libraries are for forming and sending JSON queries to media-nommer’s web API. This is important for sending and checking on encoding jobs. Your applications will use the API to make media-nommer do something (other than stare at you blankly).
If you create a library of your own, let us know via the issue tracker and we’ll add yours to the list.
All API calls are sent via POST with the body being JSON. Responses are also JSON-formatted.
Note
You should send these calls to the host/port that your feederd is running on. This defaults to 8001, but may specify when you call the twistd command to start the daemon.
This call submits a job for encoding. Here is an example call with POST body:
{
"source_path": "s3://AWS_ID:AWS_SECRET_KEY@BUCKET/KEYNAME.mp4",
"dest_path": "s3://AWS_ID:AWS_SECRET_KEY@OTHER_BUCKET/KEYNAME.mp4",
"notify_url": "http://myapp.somewhere.com/encoding/job_done/",
"job_options": {
"nommer": "media_nommer.ec2nommerd.nommers.ffmpeg.FFmpegNommer",
"options": [
{
'outfile_options': [
('vcodec', 'libx264'),
('preset', 'medium'),
('vprofile', 'baseline'),
('b', '400k'),
('vf', "yadif,scale='640:trunc(ow/a/2)*2'"),
('pass', '1'),
('f', 'mp4'),
('an', None),
],
},
{
'outfile_options': [
('vcodec', 'libx264'),
('preset', 'medium'),
('vprofile', 'baseline'),
('b', '400k'),
('vf', "yadif,scale='640:trunc(ow/a/2)*2'"),
('pass', '2'),
('acodec', 'libfaac'),
('ab', '128k'),
('ar', '48000'),
('async', '480'),
('ac', '2'),
('f', 'mp4'),
],
'move_atom_to_front': True,
},
],
},
}
To further elaborate:
The response to your request is also JSON-formatted, and looks like this:
{
"success": true,
"job_id": "1f40fc92da241694750979ee6cf582f2d5d7d28e1833"
}
If an error is encountered, success will be false, additional message and error_code keys will be set:
{
"false": true,
"error_code": "BADINPUT",
"message": "Bad input file. Unable to encode."
}
The following topics will be useful to you if you would like to help improve media-nommer.
For those that are particularly ambitious, or generous enough to want to contribute to the project, the barrier to entry is reasonably low. We will go over a few topics at a high level to get you started.
The best way to obtain the source is through our git repository on GitHub. If you are going to do anything aside from poke at the code, I would highly recommend visiting our GitHub project and forking the project via the Fork button.
For those not familiar with GitHub, this will create a copy of the repository under your username which you have commit access to. From your repository, you can make modifications and send Pull Requests to the upstream project asking for your changes to be merged in. Meanwhile, you can keep your fork up to date with changes from upstream. This is a whole lot faster, easier, and more fun than the traditional patch juggling method.
If you’d like to skip the whole forking ordeal and just get a copy of our upstream repository on your machine, you’ll probably want to do something like:
git clone https://github.com/duointeractive/media-nommer.git
This will leave you with a media-nommer directory in your current directory.
Tip
For those that aren’t familiar with git and/or GitHub, see the excellent GitHub help with lots of helpful how-tos.
There are a few additional dependencies to install when doing development work. Since everyone is being good little programmers and working under virtualenv, it’s just a matter of switching to said virtual environment and doing this from within your media-nommer dir:
pip install -r requirements.txt
Now you’re set to develop, run unit tests, and build our documentation.
You will now want to review the Configuring section of Installation document. You may create your nomconf.py in your media-nommer directory for convenience. Return to this document once you have finished following the configuration instructions.
Running unit tests is trivial with nose. cd to your media-nommer directory and just run:
nose
The cool thing about nose is that it will find anything that looks like a unit test throughout the entire codebase, without us having to tell it where to look.
As far as writing unit tests, we’d like to shoot for full coverage, so please do write tests for your changes. If you are working on something new, find the tests.py module neighboring the one you’re working on (or create one if it doesn’t already exist) and write your unit tests using the standard Python unittest module. If you need any examples of how our unit tests look, run this to find all of our tests.py modules from your root media-nommer directory:
find . -name tests.py
Look through these for a good idea of what we’re looking for.
Warning
We are very unlikely to accept code contributions without unit tests. It is understood that writing unit tests is boring, tedious, and un-fun, but it is a necessary evil for complex software.
Note
This is suitable for local testing and development, your actual deployment would probably omit the -n flag to allow daemonization, unless you’re using something like Supervisor.
If you are in your media-nommer directory, you may run feederd locally by doing this:
PYTHONPATH=media_nommer twistd -n --pidfile=feederd.pid feederd
ec2nommerd is designed to run on your EC2 instances, and is not at all meant to run on anything else. While it will do so just fine in most cases, a few features (such as self-termination) obviously won’t work.
If you are in your media-nommer directory, you may run ec2nommerd locally by doing this:
PYTHONPATH=media_nommer twistd -n --pidfile=ec2nommerd.pid ec2nommerd -l
Warning
Make sure to include the -l flag or your daemon will just deadlock while trying to query a web server that is internal to AWS.
We mostly adhere to PEP8, and expect contributors to do the same. A few quick hi-lights:
After you have made modifications in your GitHub fork, you need only send a Pull Request to us via the aptly named Pull Request on your fork’s project page. See GitHub’s guide to forking. It’s quick and easy, we promise.
The best way to get help right now is to submit an issue on the issue tracker. This is useful for questions, suggestions, and bugs.
Important to note is the fact that all contributions to the media-nommer project are, like the project itself, BSD-licensed. Please make sure you or your employer are OK with this before contributing.
The following pages are reference material for media-nommer itself. This is probably only interesting to you if you are adding new Nommers. If you simply want to get to encoding, you’ll want to select and start using one of the Client API Libraries.
This module contains settings-related things that are used by ec2nommerd and feederd. You will most likely be interested in the settings module within this one, as that’s where all of the settings and their values reside.
When the ec2nommerd and feederd Twisted plugins start, they use the update_settings_from_module() to override the defaults in the settings module with those specified by the user, typically via a user-provided module named nomconf.py (though that name can change with command line arguments).
If you need access to settings, simply import the global settings like this:
from media_nommer.conf import settings
Given another module with settings in uppercase variables on the module, override the defaults in media_nommer.conf.settings with the values from the given module.
Parameters: | settings_module (module) – A module with settings as upper-case attributes set. This is typically nomconf.py, although the user can elect to name them something else. |
---|
This module contains default global settings. When ec2nommerd and feederd daemons are started, media_nommer.conf.update_settings_from_module() takes the settings that the user explicitly set and overrides these defaults.
You may override any of these global defaults in your nomconf.py file. You will need to at the least provide the following settings:
Default: None (User must provide)
These AWS credentials are used for job state management via SimpleDB, and queueing with SQS.
Default: None (User must provide)
These AWS credentials are used for job state management via SimpleDB, and queueing with SQS.
Default: 'nommer_config'
The S3 bucket to store a copy of nomconf.py for nommer instances.
Default: The latest upstream AMI compatible with this git revision.
The AMI ID for the media-nommer EC2 instance.
Default: 'm1.large'
The type of instance to run on. Must be at least m1.large. t1.micro and t1.small instances are NOT supported by the default AMI.
Default: None (User must provide)
The AWS SSH key name with which to launch the EC2 instances.
Default: ['media_nommer']
Default: 3600 * 24
If a job sticks in an un-finished state after this long (in seconds), it is considered abandoned, and feederd will kill discard it.
Default: True
When True, allow the launching of new EC2 encoder instances.
Default: 60
How often feederd should see if it needs to spawn additional EC2 instances.
Default: 60
How often feederd will check for job state changes.
Default: 60 * 5
How often feederd will check for abandoned or expired jobs.
Default: 2
If the number of unfinished jobs exceeds our capacity (MAX_ENCODING_JOBS_PER_EC2_INSTANCE * <Number of Active EC2 instances>) by this number of jobs, look at starting new instances if we have not already exceeded MAX_NUM_EC2_INSTANCES.
Default: 2
The maximum number of jobs that should ever run on a single EC2 instance at the same time.
Default: 3
The maximum number of EC2 instances to run at a time.
Default: 60
Used by EC2 nodes to determine how long to wait between sending a status update via SimpleDB. The node will also check for inactivity greater than the configured value in NOMMERD_MAX_INACTIVITY, and terminate itself if inactivity has exceeded that value, and NOMMERD_TERMINATE_WHEN_IDLE is True.
Default: 60 * 50
How many seconds of inactivity (not working on any jobs) before an instance will terminate itself.
Default: 60
An interval (in seconds) to wait between calls to AWS to check for new jobs.
The path to the qtfaststart bin used by ec2nommerd.
Default: True
When True, allow the termination of idle EC2 instances based on the NOMMERD_MAX_INACTIVITY setting. It is important to keep in mind that you pay for an entire hour when you start an EC2 instance, so setting this timeout to anything below 10 minutes is probably a waste of money. This timeout is in seconds.
Default: 'media_nommer_ec2nommer_state'
The SimpleDB domain for storing heartbeat information from the EC2 encoder instances.
Default: 'media_nommer'
The SimpleDB domain name for storing encoding job state in. For example, the date the job was created, the current state of the job (PENDING, ENCODING, FINISHED, etc.), and the Nommer being used to do the encoding.
Default: 'media_nommer_jstate'
The SQS queue used to notify feederd of changes in job state. For example, when a job goes from PENDING to DOWNLOADING or ENCODING.
Default: 'media_nommer'
The SQS queue used to notify ec2nommerd of new jobs.
Default: (All included storage backends)
Storage backends. The protocol is the key, the value is the backend class used to work with said protocol.
Various configuration-related utility methods.
Given the URI to a S3 location with a valid nomconf.py, download it to the current user’s home directory.
Tip
This is used on the media-nommer EC2 AMIs. This won’t run on local development machines.
Parameters: | nomconf_uri (str) – The URI to your setup’s nomconf.py file. Make Sure to specify AWS keys and IDs if using the S3 protocol. |
---|
Given a user-defined nomconf module (already imported), push said file to the S3 conf bucket, as defined by settings.CONFIG_S3_BUCKET. This is used by the nommers that require access to the config, like FFmpegNommer.
Parameters: | nomconf_module (module) – The user’s nomconf module. This may be called something other than nomconf, but the uploaded filename will always be nomconf.py, so the EC2 nodes can find it in your settings.CONFIG_S3_BUCKET. |
---|
The core module is home to code that is important to both the media_nommer.ec2nommerd and media_nommer.feederd top-level modules. This is best thought of as a lower level API that all of the pieces of the encoding system use to communicate and interact with one another.
Note
There should be nothing that is specific to just ec2nommerd or just feederd here if at all possible.
Storage backends are used to handle download and uploading content to a number of different protocols in an abstracted manner.
media-nommer uses URI strings to represent media locations, whether it be the file to download and encode, or where the output should be uploaded to. A reference to the correct backend for a URI can be found using the get_backend_for_uri() function.
Given a protocol string, return the storage backend that has been tasked with serving said protocol.
Parameters: | protocol (str) – A protocol string like ‘http’, ‘ftp’, or ‘s3’. |
---|---|
Returns: | A storage backend for the specified protocol. |
Given a URI string , return a reference to the storage backend class capable of interacting with the protocol seen in the URI.
Parameters: | uri (str) – The URI to find the appropriate storage backend for. |
---|---|
Return type: | StorageBackend |
Returns: | A reference to the backend class to use with this URI. |
This module contains an S3Backend class for working with URIs that have an s3:// protocol specified.
Abstracts access to S3 via the common set of file storage backend methods.
The job state backend provides a simple API for feederd and ec2nommerd to store and retrieve job state information. This can be everything from the list of currently un-finished jobs, to the down-to-the-minute status of individual jobs.
There are two pieces to the job state backend:
Represents a single encoding job. This class handles the serialization and de-serialization involved when saving and loading encoding jobs to and from SimpleDB.
Tip
You generally won’t be instantiating these objects yourself. To retrieve an existing job, you may use JobStateBackend.get_job_object_from_id().
Parameters: |
|
---|
Returns True if this job is in a finished state.
Return type: | bool |
---|---|
Returns: | True if the job is in a finished state, or False if not. |
Serializes and saves the job to SimpleDB. In the case of a newly instantiated job, also handles queueing the job up into the new job queue.
Return type: | str |
---|---|
Returns: | The unique ID of the job. |
Abstracts storing and retrieving job state information to and from SimpleDB. Jobs are represented through the EncodingJob class, which are instantiated and returned as needed.
Any jobs in the following states are considered “finished” in that we won’t do anything else with them. This is a list of strings.
All possible job states as a list of strings.
Given a job’s unique ID, return an EncodingJob instance.
TODO: Make this raise a more specific exception.
Parameters: | unique_id (str) – An EncodingJob‘s unique ID. |
---|
Queries SimpleDB for a list of pending jobs that have not yet been finished.
Return type: | list |
---|---|
Returns: | A list of unfinished EncodingJob objects. |
Pops any new jobs from the job queue.
Warning
Once jobs are popped from a queue and delete() is ran on the message, they are gone for good. Be careful to handle errors in the methods higher on the call stack that use this method.
Parameters: | num_to_pop (int) – Pop up to this many jobs from the queue at once. This can be up to 10, as per SimpleDB limitations. |
---|---|
Return type: | list |
Returns: | A list of EncodingJob objects. |
Pops any recent state changes from the queue.
Warning
Once jobs are popped from a queue and delete() is ran on the message, they are gone for good. Be careful to handle errors in the methods higher on the call stack that use this method.
Parameters: | num_to_pop (int) – Pop up to this many jobs from the queue at once. This can be up to 10, as per SimpleDB limitations. |
---|---|
Return type: | list |
Returns: | A list of EncodingJob objects. |
The feederd module contains Twisted daemon that monitors queues, creates EC2 instances, schedules encoding jobs, handles response pings from external encoding services, and notifies your external applications when encoding has completed.
Note
Only more general management logic should live in this module. Anything specific like encoding commands or EC2 management should live elsewhere. feederd is just the orchestrator.
Nommers are classes that EC2 instances use with ec2nommerd to consume and produce media in the desired format. These usually involve wrapping external commands, setting job state details as the process continues, and uploading the results.
Classes in this module serve as a basis for Nommers. This should be thought of as a protocol or a foundation to assist in maintaining a consistent API between Nommers.
This is a base class that can be sub-classed by each Nommer to serve as a foundation. Required methods raise a NotImplementedError exception by default, unless overridden by child classes.
Variables: | job (EncodingJob) – The encoding job this nommer is handling. |
---|
Start nomming. If you’re going to override this, make sure to follow the same basic job state updating flow if possible.
Contains a class used for nomming media on EC2 via FFmpeg. This is used in conjunction with the ec2nommerd Twisted plugin.
This Nommer is used to encode media with the excellent FFmpeg utility.
Example API request
Below is an example API request. The job_options dict that is passed to the media_nommer.feederd JSON API is the important part that is specific to this nommer.:
{
'source_path': 'some_video.mp4',
'dest_path': 'some_video_hqual.mp4',
'notify_url': 'http://somewhere.com:8000/job_state_pingback',
'job_options': {
'nommer': 'media_nommer.ec2nommerd.nommers.ffmpeg.FFmpegNommer',
# This options key has ffmpeg command line arguments for a 2-pass
# encoding. If you're doing single pass, you'd only have one
# dict in this list.
'options': [
# First pass command specification.
{
# Just documenting this here so its existence is known.
'infile_options': [],
# Fed to ffmpeg as command line flags. A None key means
# that just the flag name is provided with no arg.
'outfile_options': [
('threads', 0),
('vcodec', 'libx264'),
('preset', 'medium'),
('profile', 'baseline'),
('b', '400k'),
('vf', 'yadif,scale=640:-1'),
# This denotes the first pass in ffmpeg.
('pass', '1'),
('f', 'mp4'),
('an', None),
],
},
# Second pass command specification.
{
'outfile_options': [
('threads', 0),
('vcodec', 'libx264'),
('preset', 'medium'),
('profile', 'baseline'),
('b', '400k'),
('vf', 'yadif,scale=640:-1'),
# Notice that this is now 2, for the second pass.
('pass', '2'),
('acodec', 'libfaac'),
('ab', '128k'),
('ar', '48000'),
('ac', '2'),
('f', 'mp4'),
],
},
], # end options list, max of 2 passes.
}, # end job_options
}
To show how this would be put together, here is the command that would be ran for the first pass:
ffmpeg -y -i some_video.mp4 -threads 0 -vcodec libx264 -preset medium -profile baseline -b 400k -vf yadif,scale=640:-1 -pass 1 -f mp4 -an /dev/null
Note that the an key in the outfile_options list of the first pass above has a None value. You’ll need to do this for flags or options that don’t require a value.
This module contains tasks that are executed at intervals, and is imported at the time the server is started. The intervals at which the tasks run are configurable via media_nommer.conf.settings.
All functions prefixed with task_ are task functions that are registered with the Twisted reactor. All functions prefixed with threaded_ are the interesting bits that actually do things.
Registers all tasks. Called by the ec2nommerd Twisted plugin.
Looks at the number of currently active threads and compares it against the MAX_ENCODING_JOBS_PER_EC2_INSTANCE setting. If we are under the max, fire up another thread for encoding additional job(s).
The interval at which ec2nommerd checks for new jobs is determined by the NOMMERD_NEW_JOB_CHECK_INTERVAL setting.
Calls threaded_encode_job() for any jobs to encode.
Checks in with feederd in a non-blocking manner via threaded_heartbeat().
Calls threaded_heartbeat().
Given a job, run it through its encoding workflow in a non-blocking manner.
Fires off a threaded task to check in with feederd via SimpleDB. There is a domain that contains all of the running EC2 instances and their unique IDs, along with some state data.
The interval at which heartbeats occur is determined by the NOMMERD_HEARTBEAT_INTERVAL <media_nommer.conf.settings.NOMMERD_HEARTBEAT_INTERVAL setting.
Contains the NodeStateManager class, which is an abstraction layer for storing and communicating the status of EC2 nodes.
Tracks this node’s state, reports it to feederd, and terminates itself if certain conditions of inactivity are met.
Looks at how long it’s been since this worker has done something, and decides whether to self-terminate.
Parameters: | thread_count_mod (int) – Add this to the amount returned by the call to get_num_active_threads(). This is useful when calling this method from a non-encoder thread. |
---|---|
Return type: | bool |
Returns: | True if this instance terminated itself, False if not. |
Determine this EC2 instance’s unique instance ID. Lazy load this, and avoid further re-queries after the first one.
Parameters: | is_local (bool) – When True, don’t try to hit EC2’s meta data server, When False, just make up a unique ID. |
---|---|
Return type: | str |
Returns: | The EC2 instance’s ID. |
Checks the reactor’s threadpool to see how many threads are currently working. This can be used to determine how busy this node is.
Return type: | int |
---|---|
Returns: | The number of active threads. |
Pat ourselves on the back each time we do something.
Used for determining whether this node’s continued existence is necessary anymore in contemplate_termination().
Determine whether this is an EC2 instance or not.
Return type: | bool |
---|---|
Returns: | True if this is an EC2 instance, False if otherwise. |
Sends a status update to feederd through SimpleDB. Lets the daemon know how many jobs this instance is crunching right now. Also updates a timestamp field to let feederd know how long it has been since the instance’s last check-in.
Parameters: | state (str) – If this EC2 instance is anything but ACTIVE, pass the state here. This is useful during node termination. |
---|
This module contains the logic for the Twisted daemon that oversees the encoding process, feederd. Only things that are specific to this daemon should reside within this module. Anything that could conceivably also be useful to media_nommer.ec2nommerd should probably be in media_nommer.core.
Contains the EC2InstanceManager class, which helps manage the currently active instances.
This class managers and creates EC2 instances from the configured encoder AMI. Additional instances are spawned based on the size of the job queue in relation to the amount of manpower available at any point in time.
Returns a list of boto Instance objects matching the media-nommer AMI, as per media_nommer.conf.settings.EC2_AMI_ID. Also filters only running instances.
Return type: | list |
---|---|
Returns: | A list of boto.ec2.instance.Instance objects representing currently active media-nommer ec2nommerd instances. |
This module contains tasks that are executed at intervals, and is imported at the time the server is started. Much of feederd‘s ‘intelligence’ can be found here.
All functions prefixed with task_ are task functions that are registered with the Twisted reactor. All functions prefixed with threaded_ are the interesting bits that actually do things.
Checks for job state changes in a non-blocking manner.
Calls the instance creation logic in a non-blocking manner.
Prune expired or abandoned jobs from the domain specified in the SIMPLEDB_JOB_STATE_DOMAIN setting. Also prunes feederd‘s job cache.
Calls threaded_prune_jobs().
Checks the SQS queue specified in the SQS_JOB_STATE_CHANGE_QUEUE_NAME setting for announcements of state changes from the EC2 instances running ec2nommerd. This lets feederd know it needs to get updated job details from the SimpleDB domain defined in the SIMPLEDB_JOB_STATE_DOMAIN setting.
Looks at the current number of jobs needing encoding and compares them to the pool of currently running EC2 instances. Spawns more instances as needed.
See source of media_nommer.feederd.ec2_instance_manager.EC2InstanceManager.spawn_if_needed() for the logic behind this.
Sometimes failure happens, but a Nommer doesn’t handle said failure gracefully. Instead of state changing to ERROR, it gets stuck in some un-finished state in the SimpleDB domain defined in SIMPLEDB_JOB_STATE_DOMAIN setting.
This process finds jobs that haven’t been updated in a very long time (a day or so) that are probably dead. It marks them with an ABANDONED state, letting us know something went really wrong.
Basic job caching module.
Caches currently active media_nommer.core.job_state_backend.EncodingJob objects. This is presently only un-finished jobs, as defined by media_nommer.core.job_state_backend.JobStateBackend.FINISHED_STATES.
On rare occasions, nommers crash so hard that no ERROR state change is made, and the job just gets stuck in a permanent unfinished state (DOWNLOADING, ENCODING, UPLOADING, etc). Rather than hang on to these indefinitely, abandon them by setting their state to ABANDONED.
The threshold for which jobs are considered abandoned is configurable via the FEEDERD_ABANDON_INACTIVE_JOBS_THRESH setting.
Returns a dict of all cached jobs. The keys are unique IDs, the values are the job objects.
Return type: | dict |
---|---|
Returns: | A dictionary with the keys being unique IDs of cached jobs, and the values being EncodingJob instances. |
Given a job’s unique id, return the job object from the cache.
Parameters: | job (EncodingJob) – A job’s unique ID or a job object. |
---|---|
Return type: | EncodingJob |
Returns: | The cached encoding job. |
Given a valid job state (refer to media_nommer.core.job_state_backend.JobStateBackend.JOB_STATES), return all jobs that currently have this state.
Parameters: | state (str) – The job state to query by. |
---|---|
Return type: | list of EncodingJob |
Returns: | A list of jobs matching the given state. |
Given a job object or a unique id, return True if said job is cached, and False if not.
Parameters: | job (str or EncodingJob) – A job’s unique ID or a job object. |
---|---|
Return type: | bool |
Returns: | True if the given job exists in the cache, False if otherwise. |
Loads all of the un-finished jobs into the job cache. This is performed when feederd starts.
Looks at the state SQS queue specified by the SQS_JOB_STATE_CHANGE_QUEUE_NAME setting and refreshes any jobs that have changed. This simply reloads the job’s details from SimpleDB.
Return type: | list of EncodingJob |
---|---|
Returns: | A list of changed EncodingJob objects. |
Removes a job from the cache.
Parameters: | job (str or EncodingJob) – A job’s unique ID or a job object. |
---|
Clears jobs from the cache after they have been finished.
TODO: We’ll eventually want to clear jobs from the cache that haven’t been accessed by the web API recently.
Updates a job in the cache. Creates the key if it doesn’t already exist.
Parameters: | job (EncodingJob) – The job to update (or create) a cache entry for. |
---|