Data Scientist (Technology focus) (closed)
Job Description: Data Scientist (Technology focus)
Role
The Data Scientist processes and analyzes structured and unstructured data using a variety of tools and methods such as artificial intelligence, machine learning algorithms, natural language processing, simulations, and advanced text analytics. The Data Scientist provides insights from a variety of data sources (such as images, voice, text and others) using specialized applications to implement new generation modeling methods beyond regular statistical modeling.
The Data Scientist has good interpersonal communication and presentation skills necessary to works across lines of business with executives, product managers, operations, marketing research, and analytics teams to understand the problem-to-solve and develop insights to run and grow the business.
Responsibilities
- Work as a team member across lines of business communicating with IT, business customers, and data stewards throughout the program lifecycle to support data analysis, development of analytical solution to deliver insight
- Communicate effectively with all stakeholders to deliver analytical solutions across lines of business
- Prepare and present analytical presentations
- Support the program level lifecycle from understanding the problem-to-solve and the job-to-be-done to deliver relevant insights
- Develop solutions frameworks to design, build, and implement complex analytics methods and tactics
- Create self learning models on large data sets and improving performance
- Experience in working with large volume of data and in high performance compute environment
- Familiarity with Hadoop and Map reduce programming
- Create performance metrics to monitor and track effectiveness of models and tactics
- Define and coordinate access necessary to perform data transformations, model development, and model validation
- Design and develop data mining techniques to gather, process, and analyze complex data sets including structured and unstructured data such as web and device logs, text, and video data
- Support the development, assembly, and reviews of analytical presentations
- Support the development and use of SQL and NOSQL queries to access structured and unstructured data sources in Oracle DBMS, IBM DB2, Netezza, Greenplum, data files, device logs, and similar data formats to enable statistical modeling
Educational Qualification
- PHD in Mathematics /Statistics / High Performance computing / Econometrics from leading University
- 1-3 years of experience
Required skills :-
- Overall 8+ years of experience in Data Warehousing Design and development in a distributed environment.
- At least 1 year of working experience with large scale Hadoop environments build and support including design, capacity planning, cluster set up, performance tuning and monitoring.
- Deep understanding and Hands-on experience with the overall Hadoop eco-system (HDFS, Map Reduce, Pig/Hive, Hbase etc).
- Demonstrate analytical and problem solving skills, particularly those that apply to a "Big Data" environment.
- Expert level understanding of relational database concepts, dimensional database concepts and database architecture and design.
- Experienced as an Architect in at least two large engagements involved in creating the Technology and application Architecture and Design
- Experience in leading a team of developers and designers in project implementations form requirements to testing and deployment.
- Familiarity with entire product life cycle: design, implementation, testing, deployment and maintenance and development methodologies like Iterative model and Agile.
- Understanding of data warehouse approaches, industry standards and best practices, OLTP/OLAP databases.
- Knowledge of the latest technology trends in the Big data, Data warehousing, analytics and Cloud.
- Proficient in any of leading RDBMS databases like Oracle, Sybase, DB2, Teradata, MS SQL Server and in Unix/Linux as well as Windows platforms.
- Strong aptitude, problem solving skills, written communication skills including design documentation, proof of concept and prototype analysis and documentation
Desirable skills :-
- Exposure to any of the MPP databases like Netezza, Teradata, Aster Data, Greenplum.
- Understanding of tools and utilities like Mahout, R and knowledge of techniques like text analysis, statistical analysis, machine learning
