DataDriven
LearnPracticeInterviewDiscussDailyJobs

The Molecule Report

A easy Python interview practice problem on DataDriven. Write and execute real python code with instant grading.

Domain
Python
Difficulty
easy
Seniority
L4

Problem

Given a DNA sequence string, return a dict with exactly these keys: 'is_valid' (True iff the sequence contains only the characters A, C, G, T), 'gc_content' ((count_G + count_C) / len(sequence) * 100 as a float, or 0.0 for an empty sequence), 'nucleotide_counts' (a dict of the counts of A, C, G, and T), and 'most_common_dinucleotide' (the most frequent 2-character substring of consecutive characters; on a tie, return the alphabetically largest / last such pair, e.g. 'TG' over 'AT' and 'GC'; empty string '' if the sequence has fewer than 2 characters).

Summary

Four letters. A lot of math hidden in the sequence.

Practice This Problem

Solve this Python problem with real code execution. DataDriven runs your Python code in a real environment and grades it automatically.

Related

  • All Practice Problems
  • Mock Interview Mode
  • Python Interview Questions
  • Data Engineering Interview Prep Guide
  • Daily Challenge
  • Data Engineering Lessons