Posted on by & filed under

There are many benefits to keeping your organization’s metadata in Github. This includes much more than just version control for Apex and Visualforce code. It is a thorough history of all your organization’s configuration changes by users and administrators, including object customizations, reports, dashboards, and email templates. These additions and deletions will show up in Github at whatever interval you schedule. Ideally there would be a trigger to update Github whenever any change was made, but a daily snapshot is a vast improvement over nothing!

Here is an example of the types of changes you will see in Github with the sync set up:

A handy feature I find myself using is the code search:

Some other benefits:

  • You will have the peace of mind knowing you can revert back to your configuration at any date in time since you started snapshotting your metadata.
  • In organizations with multiple developers or admins, it gives any one person the ability to see everything that was changed by other admins, developers, or even end users since the previous day. Github’s “show diff stats” is a great overview of what has been changed during the snapshot interval.
  • It solves the problem of giving you a type of history tracking at the object level. For instance, say you are an administrator working on a report. You can tell the report has been changed but you don’t know exactly how. You can open that report as a file on Github and view the history of it to see the revisions over time. The information would otherwise be lost.
  • It is nice to have a local copy of your metadata in the highly unlikely event Salesforce has a catastrophic failure. You can use that metadata to migrate to another CRM solution, or piece it out into various other systems.
  • If you every need to revert an object back to a prior state, you can download the metadata file from Github and deploy it to Salesforce via the Eclipse IDE, or Apache Ant/ migration tool.


Looking into this, I couldn’t find any existing solution which would migrate all of this metadata into a git repository on an regular interval basis. This led me to figure out how to do it with the Migration Tool and Apache Ant.

It is fairly straightforward to get the Migration Tool up and running on a Windows or Linux type of environment by following the instructions here:

The challenge arises in solving the task of how to pull down a copy off ALL of the metadata with the various Ant functions that the Migration package provides. Here are the metadata API methods we are given to work with:

  • retrieve
  • bulkRetrieve
  • listMetadata
  • describeMetadata
  • compileAndTest
  • deploy

I needed to use the first 4 of the 6 functions.

What you end up with is a build.xml which references a couple of package.xml files. Using Ant with this file is all you need to do to download all the metadata in your org.

The build.xml I came up with contains an interesting blend of static and dynamic entries:

There are many other ways you can structure your build.xml file, this was just a logical grouping that made the most sense to me. Here is more information on these groupings of metadata, and how we build the appropriate build/package.xml files to retrieve them.

  1. Top-level components which can be pulled down in bulk easily.
  2. Standard and Custom Objects. Object names retrieved from the metadata REST API. Subcomponents retrieved via Ant’s listMetadata.
  3. Other subcomponents which we have use Ant listMetadata to get the names of.
  4. Subcomponents which reside in folders. These consist of Reports, Dashboards, Documents and Email Templates only.

And now we’ll go into detail on each one:

1) Top-level components which can be pulled down in bulk easily

Using bulkRetrieve we are able to create the static entry in the build.xml file to download much of our metadata. This is the easiest way, so I use it to get everything I can. Here is an example of how ours turned out:

Your build.xml file entry may be slightly different, as it will contain the metadataTypes relevant to your org. These can differ depending on what types of licensing you have set up with salesforce. You can get these metadata types either by doing a describeMetadata request with Ant, which will give you this output:

Or by reading the documentation here:

The documentation and the results of the describeMetadata request were not consistent, so unfortunately our list of bulk metadata types needs to be defined as a static list. This is acceptable, since Salesforce doesn’t add whole new types of metadata very often. But keep in mind, this is something which will have to be kept up to date, perhaps once a year or so as Salesforce releases new, major features. An example of how the describeMetadata and documentation aren’t consistent is as follows. You can do a bulkRetrieve of Workflow, and all the child objects will be included in the XML which is returned:

However this will fail:

In the documentation they both say they allow all objects to be downloaded via wildcard, but this doesn’t appear to be the case.

By looking through the documentation and the results of the describeMetadata, and some trial and error, we can fairly easily build a list of all the metadata components we can easily download in our org. Here is the list I came up with for our org:

  • ApexClass
  • ApexComponent
  • ApexPage
  • ApexTrigger
  • CustomApplication
  • CustomLabels
  • CustomObjectTranslation
  • CustomPageWebLink
  • CustomSite
  • CustomTab
  • DataCategoryGroup
  • FieldSet
  • Flow
  • Group
  • HomePageComponent
  • HomePageLayout
  • Layout
  • PermissionSet
  • Portal
  • Profile
  • Queue
  • RecordType
  • RemoteSiteSetting
  • ReportType
  • Role
  • Scontrol
  • Workflow
  • StaticResource

Executing “ant bulkRetrieve” at the command line will download all the metadata for these objects into corresponding subdirectories on your machine.

2) Standard and custom objects via the metadata REST API and Ant’s listMetadata

Since the custom objects and subcomponents need to be fully qualified, I use the REST API with some PHP scripting to gather all of the standard and custom object names and the retrieve and parse the Ant listMetadata files to get all the individual subcomponents.

Here is the PHP script to get the object names:

With the Ant retrieve function, custom fields are always returned with custom objects. However, we have to specify them for standard objects. In addition, all other types of object subcomponents (record types, validation rules, etc) need to be explicitly fully qualified for all objects. This will yield a fully comprehensive object definition file instead of a bare bones version. I wrote some PHP to parse the Ant listMetadata for all of the object subcomponents, following this convention:

With all of our object names (from the REST API) and subcomponent names (from our Ant listMetadata) loaded into php arrays, we are able to generate a package.xml file which will retrieve copies of these comprehensive object metadata xml files.

Here is an abbreviated version of what the package.xml file will look like:

I saved this as a separate file name objects.xml, since I like to think of all the object definition metadata as a logical grouping.

3) Other subcomponents gathered from Ant’s listMetadata

We need to repeat a similar process for the rest of the object metadata which needs to have the individual subcomponents fully qualified.

These return text files which look like this:

We parse the subcomponent names out of them, and build another package.xml file which looks like this (abbreviated):

For lack of a better name, I called this package “remaining.xml”.

4) Subcomponents which reside in folders

The final items we need to specify in build.xml are the folders which contain subcomponents of type: Report, Dashboard, Document and EmailTemplate. We can get these folder names from the salesforce REST API. This is not the metadata API like we used previously, but rather the DATA API. It is interesting to note that 0-many pieces of metadata are contained in folders which are actually records in a database. There is a relationship between here between data and metadata. Here is an example of a PHP function which gets all the report folders with the REST API:

With these folder names, we can add in these bulkRetrieve requests for the contents of the folders like this (abbreviated):

This is a dynamically built section of an otherwise static build.xml file. A simple way to get this section in to the static file was to load the file, and do a search and replace on a placeholder comment to inject this block into our build.xml file.

Here is the placeholder comment:

And we do a search and replace with PHP like so:

Retrieving the metadata

Now that we have our build.xml file, and our two package.xml files constructed (objects.xml and remaining.xml), it is just a matter of using Ant to call the build.xml file to get all the metadata.

Due to the way I constructed my build.xml file, this required calling ant twice:

  1. ant bulkRetrieve
  2. ant bulkRetrieveFolders

There will be a lot of screen output and some waiting while ant does its thing. In our case it takes around 20 minutes to get everything.

Once all the metadata is downloaded, we need to synchronize it with a local git repository, then commit and push that local git repository up to Github.

For files which are added or changed, we can simply copy (overwriting) the files from the local metadata to the local git repo. However for files which have been removed, we need to explicitly remove them from the local git repo. To do this, I run a PHP script to find the files which are in the git repo but not in the metadata we just downloaded. Then we can loop through this list of files and “git rm” the files:

After our local git repo is synchronized with the metadata we downloaded, we can commit and push the files to Github:

If you cron this job to run every night (or the interval of your choice), you will begin to see your org’s changes reflected in Github!


Tags: ant, Github, php,,

2 Responses to “Syncing Your Org Metadata to Github”

  1. Raja Sampath

    Great post, Daniel. I was able to rewrite this in C#. Just running into one issue. When I run bulk retrieve with multiple sf:bulkretrieve in a single ANT target, the resulting package.xml is constantly getting overwritten. At the end of the processing, only content related to last sf:bulkretrieve exists in package.xml. Is there any simple workaround for not to overwrite but append to package.xml? Or my only option is to additional manipulation thro some code? I really appreciate your inputs on this


    • Daniel Peter

      Hi Raja, glad it was helpful.

      I only called builkretrieve once, so I didn’t run into this specific problem. Are you sure you have to call it multiple times? I do, however call retrieve with multiple package.xml files. I just name them different names like “objects.xml” or “remaining.xml” but they are all package.xml files. I then call these renamed package.xml files in the build.xml file in the “unpackaged” attribute.

      If you look at the official docs here: it says that you can only use the “unpackaged” attribute on the retrieve method, and not the bulkretrieve method. So no documented way to pass different filenames other than package.xml to bulkretrieve. Perhaps there is an undocumented way. I would suggest copying the file to different filenames to get around this with your c# script.

      If you want to send me your c# code I can suggest some ways to solve this. Perhaps even contribute the c# version to my php version on github?


      Daniel Peter
      Sr. Programmer Analyst, Safari Books Online LLC