Node.js: Event Emitters and Listeners

When building client side applications with Javascript, events and handling them are quite common. Consider, for example,


$('#toBeClicked').on('click', function() { alert('clicked') })

Here we handle a click event on the element with id ‘toBeClicked’. When the element is clicked, a ‘click’ event is emitted which is handled by the above statement written by us.

Just like in the DOM,  many objects in node emit events. When they do so, they inherit from the EventEmitter constructor.

Lets straight away build a custom event emitter and see whats going on,


EventEmitter = require('events').EventEmitter       // The Event Emitter constructor in the events.js module .

Now, we create our custom emitter which is an instance of EventEmitter,

emitter = new EventEmitter();

We listen for the error event,

emitter.on('error', function(msg1, msg2) { console.log('ERR: ' + msg1 + ' ' + msg2 ) })

Here, function(msg1, msg2) { console.log('ERR' + msg1 + ' ' + msg2 ) }

is the callback to be performed once the event is emitted,

We could also listen as,

emitter.addListener('error', function(msg1, msg2) { console.log('ERR: ' + msg1 + ' ' + msg2 ) } )

Now we emit the error event,

emitter.emit('error', 'Bug detected', '@ Step 4')

Once the ‘error’ event is emitted, the listener performs the callback and we see the error logged in the console(as we have written in the callback function)
ERR: Bug detected @ Step 4

We could add listeners on any more custom events and handle them when the event is emitted.

Now that we have got to know how events are emitted and handled via listeners, lets try out a small server that listens for requests and processes them.


var http = require('http'),
    sys = require('sys');

var server = http.createServer(function(request, response) {
  request.on('data', function (chunk) { console.log(chunk.toString()); });

  request.on('end', function() {
    response.write('Request Completed!');
    response.end();
  });

});

console.log("Starting up the server");
server.listen(8000);

Here, http.createServer method returns an object which inherits from the EventEmitter constructor.

Check out the nodejs api doc for the createServer method in the http.js module,

http.createServer([requestListener]), It says that requestListener is a function which is automatically added to the ‘request’ event.

To check whats going on behind the scenes here, lets dive into the code base of nodejs,
As can be seen from code, the createServer method is within the http.js module. Inspecting the http.js module,

exports.createServer = function(requestListener) {
  return new Server(requestListener);
};

Check for the _http_server.js module to find the Server constructor which has the following lines of code,

if (requestListener) {
  this.addListener('request', requestListener); // event listener for the server
}

As per the above snippet, ‘this‘(the current server instance ) listens for the ‘request’ event and attaches the requestListener function to be performed when the event is emitted.

Here,

function(request, response) {
  request.on('data', function (chunk) { console.log(chunk.toString()); });

  request.on('end', function() {
    response.write('Request Completed!');
    response.end();
  });
}

is our requestListener.

Now, further inspecting the _http_server.js modulewe could also see how the request event is emitted,

self.emit('request', req, res); // event emitter for the server

‘req’ and ‘res’ are the request, response objects that are passed as arguments to the requestListener function called when the ‘request’ event is emitted. Here self is ‘this’ ( our current server instance).

We could well make the server listen for the request event on our own. For this,

var server = http.createServer()
server.on('request', function(request, response) {
  request.on('data', function (chunk) { console.log(chunk.toString()); });

  request.on('end', function() {
    response.write('Request Completed!');
    response.end();
  })
});

Here when we create the server instance, we do not pass the requestListener( Do notice that requestListener was only optional in http.createServer([requestListener]) ). Instead we attach a listener of our own on the server which listens to the ‘request’ event and performs the callback function when the request event is emitted, i.e,

server.on('request', function(request, response) { ... });

Rails admin with authlogic

If you have used the rails_admin gem, you would have noticed that its tightly coupled with Devise. What happens if devise is not the pick and the application is already integrated with authlogic. Lets see,

After installing the rails_admin gem, create a ‘rails_admin.rb‘ file in config/initializers. Paste the following code,

  RailsAdmin.config do |config|
    config.authorize_with do
      redirect_to root_path, :alert => "You are not authorized!" unless current_user.admin?
    end
  end

  RailsAdmin.config do |config|
    config.authenticate_with do
      unless current_user
        redirect_to login_url
      end
    end
  end

The authentication block checks if the user is currently logged in and the authorization block checks if the logged in user is an admin or not.

Also add the rails admin route in the config/routes.rb file.

  mount RailsAdmin::Engine => '/admin', :as => 'rails_admin'

This should enable the admin to run at /admin, now authenticated and authorized using authlogic.

Prototypal Inheritance in Javascript

When I decided to get my hands on Javascript, the biggest road block for me was to understand and follow the prototypal inheritance model in Javascript. Coming from a Ruby background, the inheritance model I learned and understood was the classical one. The prototypal model had fundamental differences and did run me nuts at times. Here I try to make some note of what its all about going along a few snippets.

What exactly is a prototype?
As explained in ‘Javascript: The Definitive Guide’ – An object’s prototype is a reference to another object from which properties are inherited. Every JavaScript object has a second JavaScript object (or null, but this is rare) associated with it. This second object is known as a prototype, and the first object inherits properties from the prototype.

In Javascript, an object can be created with object literals, with the new keyword, and (in ECMAScript 5) with the Object.create() function

  1. obj = { }                                                    //object literals, objects created will have Object.prototype as their prototype
  2. obj = Object.create(args, [args2])        //Object.create(), Objects created like this will have the first argument to the create method as their prototype
  3. obj = new Constructor invocation     //new Keyword, objects created this way will have the value of the prototype property of the constructor function as their prototype

Object.prototype: Its one of those rare objects which do not have a prototype, and hence is not associated with any other object.

Lets try and study the prototype of objects created via each of these 3 ways.
1) object literals{ }

obj = { }

obj will have always Object.prototype as its prototype. Inspecting obj in your console, and you will find this.

Prototype1

__proto__ is the prototype object ( Object.prototype) from which obj inherits the properties.

You can always use Object.getPrototypeOf to check the prototype of an object. The below screenshot shows the prototype object of obj.

prototype2

2) Object.create()

obj = Object.create(Object.prototype)

Doing this is exactly the same as creating an object with literals( obj = { }). This is because the first argument to the create method forms the prototype of the newly created object, obj. In our statement we have specified Object.prototype as the first argument to be used. Also note that Object.create also takes a second optional argument which defines the properties of the object.

We could specify an object or null as the first argument to the create method. If we pass null as the first argument to create, then the newly created object won’t inherit any property.

 Mammal = Object.create(null, {knownTime: { value: 'infinity' }} )

We have an object Mammal with no inherited properties. It has a property of its own, (knownTime with a value set to ‘infinity’).

Man = Object.create(Mammal)

Object.getPrototypeOf(Man) will return an object with a single property, knownTime. Also,

Man.knownTime
   -> 'infinity'

Now create another object inheriting from ‘Man’

John = Object.create(Man)
John.knownTime
   -> 'infinity'
 

Now, lets fiddle around these objects.
Try adding new properties to these objects.

Set a max height of Mammals.

Mammal.maxHeight = '200'
Man.maxHeight
   -> '200'
John.maxHeight
   -> '200'

Set a nice rhythmic vocal for mammals.

Mammal.sound = function() { return( "Wholaa!" ) }
Mammal.sound()
  -> "Wholaa!"
Man.sound()
  -> "Wholaa!"
John.sound()
  -> "Wholaa!"

Sure, Man has a different kind of a pitch from the other mammals.

Man.sound = function() { return( "Cheerpp!" )}
Man.sound()
   -> "Cheerpp!"
John.sound()
   -> "Cheerpp!"

Object.getPrototypeOf(John) returns an object(Man – with a property ‘sound’) which has another object(Mammal – with properties ‘knownTime, maxHeight and sound’) as its prototype.

prototype3

 

 

 

 

 

 

 

The prototype chain looks as follows

Prototype_Chain4 - New Page

Object Mammal has properties ‘knownTime’, ‘maxHeight’ and and the method ‘sound’.  Man has Mammal as its prototype object and hence inherits these properties from Mammal.  However, it overrides the ‘sound’ method. John has Man as its prototype object and hence inherits properties from Man(which includes Man’s own properties and properties inherited from Mammal).

3) new keyword followed by a Constructor Invocation

Try this,

obj = new Object()

Here again, this is exactly the same as Object.create(Object.prototype) and {}. The prototype of the object, obj is Object.prototype.

Here, Object is the constructor which when invoked using the new keyword will return a new object which inherits properties from the prototype of the constructor (here, Object.prototype).

Constructor(as per the Definitive Guide): A constructor is a function designed for the initialization of newly created objects. Constructors are invoked using the new keyword. Constructor invocations using new automatically create the new object, so the constructor itself only needs to initialize the state of that new object. The critical feature of constructor invocations is that the prototype property of the constructor is used as the prototype of the new object.

Also, note that every JavaScript function (except functions returned by the EC-MAScript 5 Function.bind() method) automatically has a prototype property. Hence, a constructor being merely a function also has the prototype property.
To make this more sense, lets create a constructor and try and figure out what exactly happens.

function Startup(config) {
   this.config = config ;
   this.motto = function(verbiage) {
      return("Our motto is" + verbiage)
    }
 }

Now, create a new instance

s1 = new Startup(true)

s1 is an object with Startup.prototype as its prototype, and hence inherits properties from Startup.prototype

s1.config
  -> true
s1.motto("we change")
  -> "Our motto is we change"

Create another instance,

s2 = new Startup(false)

s2 is an object with Startup.prototype as its prototype, and hence inherits properties from Startup.prototype

s2.config
  -> false
s2.motto("we build")
  -> "Our motto is we build!"

Now, we did like a bit more functionality to be achieved by all objects instantiated from Startup. For that to happen, we need to add more properties to Startup.prototype

Startup.prototype.space = function(people) { return(people/ 5) }
Startup.prototype.stream = function() {
   if(this.config == true)
       return("service") ;
   else
       return("business");
 }

Now, you could use these properties on the instances,

s1.space(25)
   -> 5
s1.stream()
   -> "service"

s2.space(40)
   -> 8
s2.stream()
   -> "business"

Prototoype_5 - New Page (1)

As is clear from the image above, we have 4 objects here, the constructor object(Startup) with its property(prototype), the prototype object (Startup.prototype) with all its properties and the instances(s1 & s2).

Now suppose we alter the prototype property of Startup

Startup.prototype = { stream : function() {return("No more a startup") }}

As is quite evident,

s1.stream()
   -> "No more a startup!"
s2.stream()
   -> "No more a startup!"

On this account, it might be clear on how to add new methods to be used by arrays, dates and other such objects created via a Constructor Invocation. With Array being the constructor function and Array.prototype being the prototype of all objects instantiated via the Array Constructor invocation,

A clone on all arrays,

Array.prototype.clone = function() {
   return this.concat()
   }

[1,2,3].clone()
   -> [1,2,3]

Clear off the array,

Array.prototype.clear = function() {
    this.length = 0 ;
    return this ;
   }

[1,2,3].clear()
    -> []

Node.js: Streams and pipes.

Node.js is used for building a lot of network applications and there’s a lot of data being passed around. This could well be huge in size. In node, all this data is processed the moment its received, piece by piece. This is done with the help of streams. Here we discuss the usage of streams by writing a small node script that handles file upload.

Here’s the actual piece of code that handles a file upload and responds back to the client with the progress of the upload.

  var http = require('http'),
  sys = require('sys'),
  fs = require('fs');

  var server = http.createServer();
  console.log("Starting up the server");
  server.listen(8000);

  server.on('request', function(request, response) {
    var file = fs.createWriteStream('copy.csv');
    var fileSize = request.headers['content-length'];
    var uploadedSize = 0;

    request.on('data', function (chunk) {
      uploadedSize += chunk.length;
      uploadProgress = (uploadedSize/fileSize) * 100;
      response.write(Math.round(uploadProgress) + "%" + " uploaded\n" );
      var bufferStore = file.write(chunk);
      if(bufferStore == false)
        request.pause();
    });

    file.on('drain', function() {
      request.resume();
    })

    request.on('end', function() {
      response.write('Upload done!');
      response.end();
    })

  });

The basics: We create a node server that listens on port 8000. Upon receival of a request, we create a write stream ( the destination file path ). Each chunk of data received is written on to the destination path, the upload progress is calculated and responded back.

Lets break up the above snippet into pieces and make an analysis of whats happening.

A writeStream is created and ‘copy.csv’ is the destination path to which the received data will be written.

  var file = fs.createWriteStream('copy.csv');

The following piece forms the core of the upload process.

  request.on('data', function (chunk) {
    var bufferStore = file.write(chunk);
    if(bufferStore == false)
      request.pause();
    uploadedSize += chunk.length;
    uploadProgress = (uploadedSize/fileSize) * 100;
    response.write(Math.round(uploadProgress) + "%" + " uploaded\n" );
  });

  file.on('drain', function() {
    request.resume();
  })

Looking at the code – on receiving each chunk of data ( via the read stream ), its written to the write stream as
file.write(chunk);

Right now, we need to pause a bit to check whether there might be a cause of worry in this whole read-write streaming process. The answer is yes, and is very obvious. There exists a real possibility that the rate at which the data is written to the writeStream is less than the rate at which its read from the readStream. This is a genuine cause of concern and hence cannot be ignored. How we handle this forms our next two lines of code.

file.write(chunk) stores the data onto a buffer. It returns true if the write was performed and returns false if the write failed due to the buffer being full. So, we need to handle this by pausing the readStream if the buffer storage is full.

  var bufferStore = file.write(chunk);
  if(bufferStore == false)
    request.pause();

Also, we need to re-start streaming data from the read stream once the buffer is drained out. The following lines of code does just that.

  file.on('drain', function() {
    request.resume();
  })

Pipes in node: Here, we have handled the logic of keeping the read – write rate to be in sync. Node.js provides us with pipes which has this logic already encapsulated in it.

The following line,

request.pipe(file) // The notion is quite similar to UNIX pipes. Pipes the input into an output stream.

would be equivalent to

  request.on('data', function(chunk) {
    var bufferStore = file.write(chunk);
    if(bufferStore == false)
      request.pause();
  })

  file.on('drain', function() {
    request.resume();
  })

Pipe by itself maintains the read write rate to be in sync by pausing and resuming when necessary.

Now since we have handled our cause of concern, all that is left is to calculate the upload percentage upon receiving each chunk of data and respond back with the calculated percentage.

  uploadedSize += chunk.length;
  uploadProgress = (uploadedSize/fileSize) * 100;
  response.write(Math.round(uploadProgress) + "%" + " uploaded\n" );

Do note that the actual size of the upload file is calculated from the request headers.

var fileSize = request.headers['content-length'];

Now, when the request ends ( i.e the ‘end’ event is emitted by the request ), the final chunk of response is given back to the client indicating that our file upload has been done successfully.

To test this, run the node server and try making a request, something like this:

curl -v --upload-file "upload_file.csv" "http://localhost:8000"

and the upload progress could be tracked.

Mercury: The WYSIWYG html editor

I had this application where different users would want to edit custom html pages to be shown up in there web sites. Each user will have his own domain( all domains pointing to the same Rails application ) and the custom html page had to be loaded as per the current domain. To do this, in search of a WYSIWYG html editor which is easy to setup and simple to start off, I ended up in Mercury. Whats really nice was that Mercury also had a gem to be used for Rails developer and as I am one, I had no more hesitation in get started with mercury.

To get started off with mercury, add


gem 'mercury-rails'

to the Gemfile and bundle it.

Run the rails generator for the mercury files.


rails g mercury:install

A couple of questions will be posted. Press ‘yes’ to install the layout files.

Now checking out the directory structure,  you could see three additional files.

mercury.js and mercury.css in the js and stylesheets assets respectively. Also, a new layout file for the mercury editor, mercury.html.erb 

I did remove the mercury css file later on.

One thing that needs to be noticed here is that the mercury.js file is heavy and it woudn’t be a good idea to load it in all the pages. We would want to load it in only the pages that needs to be edited. Checkout the mercury layout file and you can see that the mercury.js file is included.

    <head>
        <meta name="viewport" content="width=device-width, maximum-scale=1.0, initial-scale=1.0">
        <%= csrf_meta_tags %>
        <title>Mercury Editor</title>
        <%= stylesheet_link_tag 'mercury' %>
        <%= javascript_include_tag 'jquery-1.7', 'mercury' %>
    </head>

Now to prevent mercury.js from being loaded up in the pages, we could move all the other js files in our application to a separete directory and then require the directory in our application.js

My application.js will have,

//= require_tree ./main

where main is the directory which has all the application specific javascript. (Probably could be a better name🙂 )

Now peep into the routes file, you could see this extra line,


mount Mercury::Engine => '/'

What this line does is that it allows the html pages in your application to be edited. An extra ‘/editor’  will have to be added at the beginning of each url path to load the mercury editor for the page.

Consider you have the url ‘localhost:3000/pages‘ , all you need to load it in the mercury layout is to change it to ‘‘localhost:3000/editor/pages‘ . You have mercury loaded up to edit your page and can now see it in the mercury editor’s layout.

Screenshot

However this isn’t just enough to start editing the page. You need to specify editable regions in the page.
In pages.html.erb 

    <div class="control-group">
        <h3 class="section_header page-header">Pricing page</h3>
        <div id="faq" class="mercury-region" data-type="editable" data-mercury="full">
            <%= render :file => file_path(@domain 'faq') %>
        </div>
    </div>

Consider this piece of code. A div with id=”faq” is made editable with class=”mercury-region” and attributes data-type=”editable” and data-mercury=”full”.

Now you can see the editable region.

Screenshot-1

This following line in above piece of code

<%= render :file => file_path(@domain, 'faq') %>

invokes a helper method and loads the already created sample faq template which can now be edited and saved for the particular domain. As simple as that.

Similarly you could edit more pages here. This is how the contacts page can be edited.

    <div class="control-group">
        <h3 class="section_header page-header">Contact page</h3>
        <div id="contact" class="mercury-region" data-type="editable" data-mercury="full">
            <%= render :file => file_path(@domain, 'contact') %>
        </div>
    </div>

Also, you probably might want to change the save url of the mercury editor for the particular page. That is the controller action to which the mercury edited contents will be ‘POST’ or ‘PUT’ (depends on the configuration set in the mercury.html.erb)

To change the mercury save url for this particular page, I wrote the script in the erb file ( pages.html.erb )

    <script>
        $(window).on('mercury:ready', function () {
            Mercury.saveUrl = "<%= pages_upload_admin_domain_path(@domain) %>";
        });
    </script>

You might also want to change the page thats to be redirected to once we are done with editing using mercury. We could bind on mercury’s save event to get this done.

    $(window).bind('mercury:saved', function() {
        $(window.location.replace('/admin/domain'));
    });

All this saved data would have to be dealt with in the controller action. Inspecting the params in the controller action ( the mercury Save url) ,

{"content"=>
    {"faq"=>
        {"type"=>"full",
         "data"=>{},
         "value"=> "<h1>This is where I have done my FAQ editing</h1>"
         "snippets" => {}
        }     
    },

    {"contact"=>
        {"type"=>"full",
         "data"=>{},
         "value"=> "<h1>This is where I have done my Contacts editing</h1>"
         "snippets" => {}
        }
    }
}

There are two things of notice here. The contents hash contains all the mercury related stuff.  Each hash in the contents hash has a key which is equal to the id of the mercury editable html divisions ( see the view code pasted above ), here ‘faq‘ and ‘contact‘. The actual edited html content can be found in the hash with key ‘value’ ( <h1>This is where I have done my Contacts editing</h1>).  ‘The controller action could decide on how to save this html content.

What have I done to solve my case mentioned at the starting?

I created a pages directory in my public. Within the pages directory I created sub directories which corresponds to the domain. For eg, the domain localhost corresponds to the directory named localhost inside the public/pages directory and the domain remotehost corresponds to the remotehost directory.

I then saved all these edited html content as html files within these domain specific directories. When a particular domain was loaded, the html pages ( for eg, faq and contact) was rendered from the corresponding domain directories in the public folder .

Delayed Jobs in Rails: Adding custom attributes

Ok, so this was my exact scenario. When I was doing a bulk emailing application,  there was the need for the client to upload his set of email ids as a file and then save it to the database. The process of saving these contact mail_ids for a particular mail group was a delayed process, handled by Rails delayed job . 

@mail_group.delay.save_group_contacts

where @mail_group is the active record group to which the mails_ids being uploaded and saved belong.

The requirement was to show a progress bar for the process of the mail_ids being saved to the the mail group. To handle this, I decided to add custom attributes to the delayed jobs table so as to identify the owner of the delayed job and also find the progress of the job.

To do this,

1) DB migration to add the custom attributes

    class AddColumnToDelayedJob < ActiveRecord::Migration
      def change
        add_column :delayed_jobs, :job_process_status, :integer, :default => 0
        add_column :delayed_jobs, :job_owner_id, :integer
        add_column :delayed_jobs, :job_owner_type, :string
      end
    end

2) A model for the delayed jobs table.

    module Delayed
      class Job < ActiveRecord::Base
        self.table_name = "delayed_jobs"
        attr_accessible :job_owner_id, :job_process_status, :job_owner_type
        belongs_to :job_owner, :polymorphic => true
      end
    end

As seen, three extra attributes (job_owner_id, job_owner_type attributes for establishing a polymorphic association with the job owner of the delayed job and a job_process_status attribute for updating the progress of the job) were added to the delayed jobs table.

Delayed jobs were then created with the job_owner_id and job_owner_type.

    @mail_group.delay(job_owner_id: @mail_group.id, job_owner_type: @mail_group.class.name).save_group_contacts

However this would not be enough to update the custom attributes. An attempt to create a delayed job would produce this

    ActiveModel::MassAssignmentSecurity::Error:
        Can't mass-assign protected attributes: job_owner_id, job_owner_type

As a quick fix, add a config/initializers/delayed_job.rb
and paste in the following code

    class Delayed::Job < ActiveRecord::Base
      self.attr_protected if self.to_s == 'Delayed::Backend::ActiveRecord::Job'   #loads protected attributes for                                                                                        # ActiveRecord instance
    end

Now the delayed job would get saved with the job_owner_id and job_owner_type.

Also, in the mail_group model, set an association to the delayed jobs table.

    class MailGroup < ActiveRecord::Base
      has_many :deferred_jobs, :as => :job_owner, :class_name => "::Delayed::Job"
    end

Now you can access all the delayed jobs of a particular @mail_group as

    @mail_group.deferred_jobs

The job process status which is updated by the running job can also be accessed as

    @mail_group.deferred_jobs.job_process_status

Git Reset, Revert, Merge Conflicts and the Case Of The Faulty Merge

Git, as we know is a fast, open source, distributed version control system that is quickly replacing subversion in open source and corporate programming communities. As a developer, many a times i have been amazed by the power of git and how it takes care of our code repo. It track files in the project, we periodically commit the state of the project when we want a saved point. The history of our project is shared with other developers for collaboration, merge between their work and ours, and compare or revert to previous versions of the project or individual files.

As mentioned earlier, Git, at a fast pace, is replacing subversion in open source and corporate programming communities. Hence most open source developers would have had a taste of git and its power. We all would have done a git init, push, pull, rebase and stuff in our day to day programming activity and those would be quite trivial to most developers.

However there are certain facets of git(merges, conflicts, reverts and such) which does create some kind of confusion to developers, at least when they use it for the first time. What made me write down this post is an incident that happened to my colleague while he was on work. Will get into that shortly.  Before  getting into that, let me just stitch in a brief on Revert and Reset in git.

Revert and Reset

Git provides us multiple methods for fixing up mistakes while in development mode. This is important, because it saves not just our work but the others who are involved in the same project.

If you have actually done a mess with your working directory, but actually haven’t committed the changes, the best way is to perhaps do a hard reset.

$ git reset --hard HEAD

This would just wipe off the changes that you have made in your git index and also any outstanding changes that you have made in your repo.

Now suppose you have committed your changes, but haven’t pushed it into master,  and then suddenly you feel like you shoudn’t have made the previous commit(or a sequence of your previous commits), you could again reset hard. This is as simple as doing

$ git reset --hard HEAD~n

This would set the HEAD of the git index to ‘n’ commits prior to your current head. The problem though with doing a git reset –hard is very obvious. This is how your commit log looks like with at A its head

o  ->  o  ->  o  ->  D  ->  C  ->  B  ->  A

Suppose you do

$ git reset --hard HEAD~3

Now your commit log would be.

o  ->  o  ->  o  ->  D

This means that the changes that you made right from A to C have been vanished and you are not going to get it back. The bottom line is simple. You are not able to change the effects made by a single commit(ofcourse, the exception is your last commit as we have already seen).

git-revert is just for that.

The current commit log would look like this

o  ->  o  ->  o  ->  D  ->  C  ->  B  ->  A

At any point of time, you realize that ‘C’ is bound to break your code(hopefully it still hasn’t), you may well want to undo the changes made by C. This could be done by

$ git revert (commit id of C).

This would create a new commit that undoes the commit C. You will be given a chance to enter a new commit message, but the default message that indicates its ‘the reverse of the commit C’ would be the most indicative commit message to have.

o  ->  o  ->  o  ->  D  ->   ->  B  ->  A  ->  rC

where rC is the reverse of C.

This revert is a straightforward revert(i.e. it just undoes the data made by the commit reverted). Since all thats being talked about is a single branch, there aren’t any complications that would arise here.

Merge and reverting a faulty merge

Now let me talk about the incident that i had mentioned earlier. All these happened as a result of an accidental Merge. My friend did this

$ git pull origin experimental

while he was still sitting in his master branch. The experimental branch has now been merged into the branch master. This was totally unintentional(he never planned to do a merge). There were no merge conflicts however. The mainline code broke. We had to revert this faulty merge.

Master  ->            P   ->   x    ->  M
                                \                     /
                                   \                /
Experimental ->        A    ->   B

This would give you a picture. P is the point of branching. x is some commit made in the mainline branch totally unrelated to the side line branch. The side line branch itself has got two commits of its own, A and B. M is the merge commit (experimental has been merged with master). The code broke. Hence, we need to revert M(the merge commit).

Master  ->            P   ->   x    ->  M  -> W
                                \                     /
                                   \                /
Experimental ->        A    ->   B

Now as seen, the merge has been reverted(W is the reverse of M). This was done with

$ git revert -m 1 (Commit id of M)

This adds W to the commit log as well. Now the faulty code in the experimental branch was worked upon and fixed and its been made ready for the merge (again!). The experimental branch is now merged with the master branch. What was weird(for us, at that point of time) and noticeable was that the code changes that were made after the ‘merge revert’ appeared in the master branch whereas the ones made before the revert didn’t appear. i.e.

Master - >          P -> x -> M -> W -> x -> x -> M2

Experimental ->        A -> B  -  -  -  -  -  -  -   C -> D

Again, x are the commits unrelated to the experimental branch. M2 is the second merge. Commits in the experimental branch,C and D, fixes the faulty code in A and B. Whats to be noticed is that, after the updated experimental branch has been merged, none of the changes made by A and B would appear in the master branch, whereas the changes made in C and D would.The reason was found out soon.

Linus Torvalds explains the situation:

     Reverting a regular commit just effectively undoes what that commit
     did, and is fairly straightforward. But reverting a merge commit also
     undoes the _data_ that the commit changed, but it does absolutely
     nothing to the effects on _history_ that the merge had.

     So the merge will still exist, and it will still be seen as joining
     the two branches together, and future merges will see that merge as
     the last shared state – and the revert that reverted the merge brought
     in will not affect that at all.

Thats what just happened here. W(merge revert) undoes the data made by M(merge) but does nothing to the commit history brought in by M.There fore when the second merge,M2, is made, the commit history is checked and M is found to be ‘last shared state’. Hence, only those changes that has been made after the ‘last shared state’, M, will be merged into the master branch now(i.e. commits C and D). None of the data created in A and B would merge, because as per the commit history, they are already merged.

Solution to this problem is also explained by Linus himself. The fix is to ‘revert the revert that brought in W‘, i.e, revert W before you do in the next merge,M2.

Thus the main line commit log would be

P  ->  x  ->  M  ->  W  ->  x  ->  x  ->   ->  M2.

where Y is the reverse of W and M2 is the merge made after that.

$ git revert (commit id of W)

adds Y to the commit log. The above commit log would be equivalent to

P  ->  x  -> M  ->  x  ->  x  ->  M2

where there is no W nor a Y and then the second merge has been performed, M2. Now this would be fine, and all the changes made in the experimental branch should be seen in the master branch(ignoring merge conflicts). If there are any merge conflicts arising, git leaves the index and the working tree in a special state that gives us all the information needed to resolve the merge.

Merge Conflict

A Merge conflict would throw in the following message:

CONFLICT (content): Merge conflict in sample_script.rb Automatic merge failed; fix conflicts and then commit the result

Trying to switch to the experimental branch would give you this

error: you need to resolve your current index first

The files with conflicts will have markers upon them.

<<<<<<< HEAD:sample_script.rb "We would be starting off now" ======= "This would be the end" >>>>>>> d31f96832d54c2702914d4f605c1d641511fef13:sample_script.rb

Now we need to resolve these conflicts manually followed by adding the file and commit it.

$ git add sample_script.rb
$ git commit -a

The commit message would already be filled in indicating that its a conflict resolving commit. I always prefer not to add in anything extra on that.

gitk

It would also be helpful to have the ‘gitk’ tool when you are analyzing your commit logs, specially when you have more than once branch. You would be given a neat graphical representation of your working directory.

$ sudo apt-get install gitk

if you already don’t have one.

Image

This definitely would be helpful in getting a better picture.