Pages

Sunday, December 20, 2020

Migrating issues from GitHub to Jira

Recently we made the transition from managing our issues in GitHub Projects to Jira.

After finding out the plugin that was previously available was deprecated and the scripts out there supported GitHub API V2 (that was deprecated in 2012), I adapted it for the V3 API and had nice results, so here it is for whoever wants to use it.


Basically, the current available flow for a one time migration from GitHub to Jira is to use GitHub's API to export the issues and then do a bulk import in Jira.  Here's a walk through of what you'll need to do:

  1. Create a Personal Access Token in GitHub that will allow to pull issues: How To
  2. Copy the following script locally (it's in Ruby, so make sure you have a ruby env):
    require 'json'
    require 'open-uri'
    require 'csv'
    require 'date'
    
    # Github credentials to access your private project
    USERNAME="[username]"
    PASSWORD="[Personal Access Token]"
    
    # Project you want to export issues from
    USER="[user/org that owns the repo]"
    PROJECT="[repo name]"
    
    # Your local timezone offset to convert times
    TIMEZONE_OFFSET="+3"
    
    BASE_URL="https://api.github.com/repos/#{USER}/#{PROJECT}/issues"
    
    GITHIB_CACHE=File.dirname(__FILE__) + "/gh_issues.json"
    
    USERS={
      "[GitHub username]" => "[Jira user/email]",
      ...
    }
    
    SAMPLE_ISSUES=[
      # List of issue IDs you'd like to test with
    ]
    
    LABEL_BLACKLIST=[
      # List of labels that shouldn't be copied, such as those that were used for type/priority
    ]
    
    def markdown(str)
      return str.gsub('```', '{code}')
    end
    
    if File.exists? GITHIB_CACHE
      puts "Getting issues from Cache..."
      issues = JSON.parse(File.open(GITHIB_CACHE).read)
    else
      puts "Getting issues from Github..."
    
      page = 1
      issues = []
      last = [{}]
      while issues.size < page*100 and last.size > 0
        URI.open(
          "#{BASE_URL}?status=open&per_page=100&page=#{page}",
          http_basic_authentication: [USERNAME, PASSWORD]
        ) { |f|
          last = JSON.parse(f.read)
    
          last.each { |issue|
            if issue['comments'] > 0
              puts "Getting #{issue['comments']} comments for issue #{issue['number']} from Github..."
              # Get the comments
              URI.open(issue["comments_url"], http_basic_authentication: [USERNAME, PASSWORD]) { |f|
                issue['comments_content'] = JSON.parse(f.read)
              }
            end
          }
    
          issues += last
          puts "Got #{issues.size} issues so far..."
        }
        page += 1
      end
    
      File.open(GITHIB_CACHE, 'w') { |f|
        f.write(issues.to_json)
      }
    end
    
    puts
    puts
    puts "Processing #{issues.size} issues..."
    
    csv = CSV.new(File.open(File.dirname(__FILE__) + "/issues.csv", 'w'))
    sample = CSV.new(File.open(File.dirname(__FILE__) + "/sample.csv", 'w'))
    
    puts "Initialising CSV file..."
    # CSV Headers
    header = [
      "Summary",
      "Description",
      "Date created",
      "Date modified",
      "Issue type",
      "Priority",
      "Reporter",
      "Assignee",
      "Labels"
    ]
    # We need to add a column for each comment, so this dictates how many comments for each issue you want to support
    20.times { header << "Comments" }
    csv << header
    sample << header
    
    issues.each do |issue|
      puts "Processing issue #{issue['number']}..."
    
      if issue['pull_request']
        puts "  PR found, skipping"
        next
      end
    
      # Work out the type based on our existing labels
      case
        when issue['labels'].any? { |l| l['name'] == 'Bug' }
          type = "Bug"
        when issue['labels'].any? { |l| l['name'] == 'Epic' }
          type = "Epic"
        when issue['labels'].any? { |l| l['name'] == 'Feature' }
          type = "Story"
        else
          type = "Task"
      end
    
      # Work out the priority based on our existing labels
      case
        when issue['labels'].any? { |l| ['Priority', 'Prod'].include? l['name'] }
          priority = "High"
      end
    
      # Needs to match the header order above, date format are based on Jira default
      row = [
        issue['title'],
        "#{issue['body'].empty? ? "" : markdown(issue['body']) + "; "}[GitHub Link|#{issue["html_url"]}]",
        DateTime.parse(issue['created_at']).new_offset(TIMEZONE_OFFSET).strftime("%d/%b/%y %l:%M %p"),
        DateTime.parse(issue['updated_at']).new_offset(TIMEZONE_OFFSET).strftime("%d/%b/%y %l:%M %p"),
        type,
        priority,
        USERS[issue['user']['login']],
        USERS[(issue['assignee'] || {})['login']],
        issue['labels'].map { |label| label['name'] }.filter { |label| !LABEL_BLACKLIST.include?(label) }.map { |label| label.gsub(' ', '-') }.join(" ")
      ]
    
      if issue['comments_content']
        issue['comments_content'].each do |c|
          # Date format needs to match hard coded format in the Jira importer
          comment_time = DateTime.parse(c['created_at']).new_offset(TIMEZONE_OFFSET).strftime("%d/%b/%y %l:%M %p")
    
          # Put the comment in a format Jira can parse, removing #s as Jira thinks they're comments
          comment = "#{comment_time}; #{USERS[c['user']['login']]}; #{markdown(c['body']).gsub('#', '').gsub(',', ';')[0...1024]}"
    
          row << comment
        end
      end
    
      csv << row
      if SAMPLE_ISSUES.include? issue['number']
        sample << row
      end
    end
    
  3. Fill in the fields in the script, like user name, access token, etc.
  4. Run the script with: "ruby <script name>". You'll see 3 files created:
    1. gh_issues.json - that's a cache of the issues from GitHub.  As long as it exists, the script will not download the info again but rather reuse this.  Delete the file to re-download.
    2. sample.csv - a short list of issues that's used for import testing so you can check all your fields are translated properly.
    3. issues.csv - that's the full fledged list of all issues.  Only import it in the end when you're done with all the validations.
  5. Follow the Jira instructions Here to import the issues into Jira (use sample.csv to test, issues.csv to do the full import)
That's it.  Last version I found and picked up from was from 2012.
I updated it to the 2020 APIs. Let's see how long those last :)