ChallengeRocket
  • Product
    • Recruitment Challenges
    • Skill Assessment
    • Direct Hire
    • Hackathons
    • Intern Challenges
  • Challenges
  • Case-studies
  • Employers
  • Log in
  • Join talent network
  • Book demo
Menu
  • Home
  • Challenges
  • NVIDIA® Jetson™ Developer Challenge

This Challenge is completed

NVIDIA® Jetson™ Developer Challenge

NVIDIA® Jetson™ Developer Challenge
  • Winners announced
  • Winners announced
prize pool $42,789

SEE RESULTS

SEE RESULTS

Oct 23, 2017 - Feb 18, 2018 23:59 UTC
Voting: Feb 19 - Mar 04, 2018 23:59 UTC
  • Challenge outline
  • Resources
  • Participants
  • Projects
  • FAQ
  • Results
  • Updates
  • Rules
NVIDIA® Jetson™ Developer Challenge
  • Challenge outline
  • Resources
  • Participants
  • Projects
  • FAQ
  • Results
  • Updates
  • Rules

SL

Seunghyun Lee

Added: Feb 15, 2018

TAGS

  1. All in One for Automated Life,
  2. AI home,
  3. Autonomous Driving,
  4. Health Care,
  5. TV Game

TYPE OF PROJECT

2 Cam based App with TX2

VOTES: 3 LIKES: 6

V - Watching n Talking for Smart Car n Home

  • play
  • V - Watching n Talking for Smart Car n Home
  • pdf

    Project description

    My dream's been to make a real "ASURADA", a car-embedded robot, from Japan animation " 新世期GPX サイバーフォーミュラ Cyber Formula" by myself. So, I have wanted to implement a friendly AI product with which people feel reliable or friendship not only in car, but also in house! 


    I will call my project V! ( "V!" looks like upside down shape of "Ai").  


    My goal is to make this hand-held device have the ability to see and hear and speak like a human assistant. 


    While driving, it will warn you several dangerous situations like a sudden car interception or braking of front car, plus, traffic light change to red, yellow or green. You can control a radio volume/channel via hand gesture as it will read your hand motion. Plus, It will always monitor your face to prevent you from drowsy driving.


    Also, it will hear what you are talking and reply like a funny friend or sometimes give you a reliable local information. 


    In house, you can take it to your living room, maybe good to put it in front of TV. 


    Then, it will monitor your pose and provide you a health care game or wrong-pose warning sound in case you are sleeping in a wrong pose. 


    Also, it will tell you to step back when you are too close to TV(it would be nice for babies or children who likes watching TV so closely). 


    To satisfy my goal, I need to analyze video(object and keypoint detection) and audio(what people saying)


    -------------------------------------------------------PROEJECT DETAIL------------------------------------------------------------


    For those, I studied hard a lot of object/ facial, hand, body keypoint detection methods including YoloV2, SSD, RFCN, DAN, OpenPose, OpenFace and so on and I choose YoloV2 as a base (face/hand/car/traffic light) detector. And not only I used the existing imageset like WIDER FACE or Ibug DB, I collected about 8000 face/hand/traffic images by myself from my car black box and recording myself. I used a open labelling tool for annotating all images manually.  


    For data augmentation, I used the open imgaug python tool to finally get a million images. 


    In case of speech recognition, I used DeepSpeech(https://github.com/mozilla/DeepSpeech) and a open Chatbot(https://github.com/AastaNV/ChatBot) provided by nvidia. 


    For DCNN optimization, I tried several methods/tools like Tensorflow's transform_graph with 'quantize_weights' or TensorRT, Caffe-Jacinto, Caffe-Ristretto. I faced tons of issues and unfortunately, I got bad results. I mean, optimized(quantized or prunned)


    I tried to use separable depth-wise conv layers in YoloV2's feature extraction step, but it also showed me not enough good accuracy of classification/bbox regression while reducing the existing full yolo v2 model size(200MB) to 130MB. And Yolo network was too sensitive to get quantized. 


    So I just tried to train full yolo with the darknet 19 pretrained model and my custom imageset. And I trained tiny-yolo(v2) with the full yolo, as I failed to converge to low loss level when I trained it from scratch with just the darknet-19. 


    For Hand keypoint detection, I used a open hand gesture project(https://github.com/lmb-freiburg/hand3d) and replaced the existing hand detection(HandSegNet) with YoloV2's hand cropped image) for better speed. 


    For Face Landmark detection, I used OpenFace as I had some minor installtion issues when I install DAN open source to TX2. And I check eye open/closed times and mouth open/close to see if you are drowsy. 


    Currently, I used several open projects(all under GPL licenses except OpenPose/OpenFace which both allow their project to be used for only academic or non-profit purpose) to make the demo work before submission deadline, but I will replace all of those with my own integrated DCNN/GAN networks.


    I will update my blog to explain what I have done in detail soon. 

    • previous project
    • next project

    Comment


    Please login to leave a comment


    Comments (6)

    1. Seunghyun Lee

      you can see my caffe based yolo v2's detection results on my github!

    2. Seunghyun Lee

      I succeeded in converting darknet yolo v2 to caffemodel! I uploaded the model / prototxt on my gitub. So, I will keep going on converting it to tensorRT for more optimization!

    3. Seunghyun Lee

      I replaced Face Landmark detector(Theano) with TF version for better optimization and re-trained it from scratch with 300W DB. It runs > 25 FPS. I modified the existing net architecture a lot. I removed all stage 2 net and replaced all conv with separable dw conv(270MB->7MB!!). it runs x2 faster!!

    4. Seunghyun Lee

      Currently It runs in real time (24~31 FPS) on TX2 on detection.
      yolov2+ face keypoint net run in 15~18 FPS, when it detect a hand, yolov2+hand net runs in 17~20FPS while face net is disabled.
      But, after opimizing nets with TenorRT or TF, I guess all net would run in more than 20FPS.

    5. Seunghyun Lee

      In case of Hand keypoint detection, I use just middle part of the whole model, 'PoseNet'. I removed 'HandSegNet' and 'PosePriorNet, ViewPointNet' and I quantized the 'PoseNet' part to get the reduced and faster model. Size is changed from 188.4MB(2 pickles) -> 70 MB(1 frozen pb) -> 17.6 MB(1 qt pb)

    6. Seunghyun Lee

      i'm trying to optimize Hand and Face keypoint detectors using TensorRT.
      As I replaced the existing HandSegNet and Face Detector with darknet, I runs 2 times faster. After TensorRT, I expect to see x3~4 speed up.
      Plus, I'm using Adafuit I2S Amp and Respeaker Hardware for voice recognition and TTS


    ChallengeRocket
    Tech talent
    Challenges Blog Find jobs Employers
    Companies
    Business HR Blog Pricing
    Challengerocket
    FAQ EU Join Us Contact Us
    Copyright © 2023 ChallengeRocket. All rights reserved.
    Privacy Terms and Conditions Service status

    Let’s talk

    Proven effectiveness - get up to x3 more candidates and shorter recruitment time.

    In view of your consent, the data you provide will be used by ChallengeRocket Sp. z o.o. based in Rzeszów (address: Pl. Wolności 13/2, 35-073, +48 695 520 111, office@challengerocket.com) to send messages as part of the newsletter subscription. Don't worry, only us and the entities that support us in our activities will have access to data. All information on data processing and your rights can be obtained by contacting us or at www.challengerocket.com in the Privacy Policy tab.

    We will reply within 2 business days.

    Log in


    Forgot your password?

    OR
    Don’t have an account?
    Create a candidate account or a company account

    Log in

    Forgot your password?

    Create a candidate account

    Already have an account?
    Log in
    OR
    • At least 10 characters
    • Uppercase Latin characters
    • Lowercase Latin characters
    • At least one number or symbol

    Not a candidate?  Sign up as an employer

    Reset your password

    Remember your password? Log in Log in for business

    Create an employer account

    Sign up for free.
    Select the best plan to publish job ofers & challenges.

    Company name introduced here will be visible on your job ads.
    • At least 10 characters
    • Uppercase Latin characters
    • Lowercase Latin characters
    • At least one number or symbol

    Not an employer?  Sign up as a candidate