29
Some Real-Time-ish Problems-ish THOMAS GUSTAFSSON

THOMAS GUSTAFSSON Some Real-Time-ish Problems-ish•A Scania vehicle is determined by its Scania On-board Product Specification (SOPS) −A SOPS describes which function product codes

  • Upload
    others

  • View
    15

  • Download
    3

Embed Size (px)

Citation preview

  • Title Slide

    Some Real-Time-ishProblems-ish

    THOMAS GUSTAFSSON

  • Title and Content

    • Present some real-time problems from my career

    • Think hard and discuss among yourselfs for a while

    • Discussions about possible solutions

    6 December 2019 Info class internal Department / Name / Subject 2

    Agenda

  • Title and Content

    • Scania is world-leading manufacturer of

    − heavy trucks

    − buses

    − engines

    • Some 50000 employees around the world

    • R&D in Södertälje, Sweden, and Sao Paulo, Brazil

    6 December 2019 Info class internal Department / Name / Subject 3

    Scania

  • Title and Content

    • Ca 3500 employees mainly focused to Södertälje, Sweden

    • Examples of R&D development

    − engine

    − gearbox

    − chassi

    − cab

    − Scania electrical system

    − a lot of ECUs and the software for some of them

    • A lot of SW is developed by Scania, so there are a lot of SW developeropportunities

    6 December 2019 Info class internal Department / Name / Subject 4

    Scania R&D

  • Title and Content

    • Test automation framework for complete-vehicle HIL testing (past)

    • Logging system for autonomous vehicles (current)

    • Base software for autonomous vehicles (current)

    • All work has boiled down to system development in general and sw developmentand sw engineering in particular

    6 December 2019 Info class internal Department / Name / Subject 5

    My career at Scania

  • Title and Content

    • A Hardware In the Loop (HIL) connects inputs and outputs of Electronic Control Units (ECUs) and ”fools” them to be in a real vehicle

    • Purpose of HIL is to

    − run regression test suites

    − make sure new software changes do not change old behavior

    − run dangerous tests

    − remove a brake management system sensor in 90 km/h

    − manipulate hw that is difficult to get to in real vehicle

    − sensors in engine

    6 December 2019 Info class internal Department / Name / Subject 6

    Test automation framework for HIL testing

  • Title and Content

    • A Scania vehicle is determined by its Scania On-board Product Specification(SOPS)

    − A SOPS describes which function product codes (FPCs) the vehicle has, and their values

    − For instance, FPC1 describes the type of vehicle, A is truck, and B is bus

    • To run a test suite:

    − read SOPS and configure input to test system according to FPC conditions

    − flash ECUs

    − parameterize ECUs

    − run tests

    − collect test results

    6 December 2019 Info class internal Department / Name / Subject 7

    Test automation framework for HIL testing

  • Title and Content

    • Electrical system consists of some main CAN buses

    − red for safety critical systems, e.g., engine management system, brake management system, air production system

    − yellow for less critical systems, e.g., information cluster, external lights system

    − green for non-critical systems, e.g., infortainment system

    − brown for ADAS related sensors

    • And a lot of sub-buses to ECUs

    • As many as possible of these CAN buses shall be recorded during testing in HIL

    6 December 2019 Info class internal Department / Name / Subject 8

    Test automation framework for HIL testing

  • Title and Content

    • Web based GUI

    • Utility programs like boot program, html and websockets server, process monitor

    • Log domains: CAN, CCP/XCP, cameras, IO, ethernet, etc.

    • Each log domain is its own

    • Originally Windows based but today Linux based

    • C++ with Cmake build chain targetting both platforms

    6 December 2019 Info class internal Department / Name / Subject 9

    Logging system

  • Title and Content

    • Most log domains are implementing the actor pattern

    − a dedicated thread reading the sensor data and putting it into a FIFO

    − a dedicated thread reading from the FIFO doing some useful stuff with the data

    6 December 2019 Info class internal Department / Name / Subject 10

    Logging system

    sensor to be logged

    thread

    1

    thread

    2

    Do something

    like publish to

    middleware

  • Title and Content

    6 December 2019 Info class internal Department / Name / Subject 11

    Logging system

    Log computer CANn

    Bus1

    CANn

    Bus2

    Camera1

    Camera2

    Switch

    cameralog

    domain

    CAN log

    domain

    publish/subscribemiddleware

    write to disk

    lidar1

    lidar2

    Switch

    lidar log domain

  • Title and Content

    • An application framework is needed for the applications to be based on

    − abstraction layer so the underlying OS can be switched out

    − an application consists of init method, step function, and an execution profile like periodicexecution

    − Upon start up, all applications start and the OS schedules them

    6 December 2019 Info class internal Department / Name / Subject 12

    Base software

  • Title and Content

    • The phase between applications may change each startup

    • The behavior of the system can, but most likely will not, be slightly different eachtime it runs

    • Determinism is important so test results are meaningful

    6 December 2019 Info class internal Department / Name / Subject 13

    Base software

  • Title and Content

    • A small PIC processor reads the seat heater knob

    − each knob position corresponds to a resistance

    • A digital code shall be generated to an ASIC that controls the heater

    6 December 2019 Info class internal Department / Name / Subject 14

    Seat heater controller

    12 3

    4

    PIC ASIC

  • Title and Content

    • Best practice requirements on code

    − no interrupts or show interrupt bursts are properly handled

    − code guidelines: no dynamic memory, no pointers

    − consistency checks in code

    − checksums on permanent data structures

    − variables originating from different code paths that shall match

    − use of watchdog

    6 December 2019 Info class internal Department / Name / Subject 15

    Seat heater controller

  • Title and Content

    • How to log many (40+) CAN buses in HIL?

    • How to make response time analysis on logged CAN?

    • How to ensure time sync on logged log domains?

    • How to ensure synchronized execution of applications using applicationframework?

    • What is a good sw architecture for the seat heater problem?

    6 December 2019 Info class internal Department / Name / Subject 16

    Real-time problems to ponder

  • Title and Content

    • Given:

    − Around 40 CAN buses

    − up to 10 meters apart (CAN specification says how long stubs can be)

    − the CAN frames shall be synchronized in time

    • Solution 1

    − centralized recording

    − time synchronization by the recorder’s clock

    • Solution 2

    − decentralized recording

    − time synchronization must be solved by some means

    6 December 2019 Info class internal Department / Name / Subject 17

    Test automation framework and HIL testing

  • Title and Content

    • CAN frames are sent on a bus

    • Several CAN controllers can connect to the bus

    • In the arbitration phase, it is determined which CAN controller that can continueto send its frame

    • All CAN controller read and get the frame at (roughly) the same time

    6 December 2019 Info class internal Department / Name / Subject 18

    Test automation framework and HIL testing

  • Title and Content

    6 December 2019 Info class internal Department / Name / Subject 19

    Test automation framework and HIL testing

    CAN controller

    CAN controller

    CAN synchronization

    Computer Computer

    Sender ofsynch msg payload: 123

    timestamp: 100023,channel: 1,payload: 123

    timestamp: 99,channel: 1,payload: 123

  • Title and Content

    • Now we know that first CAN controller’s timestamp 99 references the same timepoint as second CAN controller’s timestamp 100023

    • A computer program can thus

    − get all CAN frames

    − wait for the same synch message from each CAN controller

    − form a global time

    − sort buffered frames

    − repeat

    6 December 2019 Info class internal Department / Name / Subject 20

    Test automation framework and HIL testing

  • Title and Content

    • Performs a worst case response time analysis on each message based on published work by Reinder J. Bril: Controller Area Network (CAN) schedulabilityanalysis: Refuted, revisited and revised, 2007

    • Message properties found in CAN databases and which CAN frames used from real logs

    • Outputs a list of potential response time problems

    6 December 2019 Info class internal Department / Name / Subject 21

    Test automation framework and HIL testing

  • Title and Content

    • It works and since the load is split over several computers it can support logging40+ CAN buses

    • The solution consists of several programs

    − synch message sender

    − CAN frame receiver

    − Merger that sorts all CAN frames

    − Logger of sorted CAN frames to file

    − Worst case response time analsysis program

    • Distributed solution is much more complex than the centralized solution. Even ifthey would have the same number of lines of code

    6 December 2019 Info class internal Department / Name / Subject 22

    Reflections on CAN logging solution

  • Title and Content

    • This problem can be split into two problems

    − single computer logging

    − distributed logging

    • Log domains on single computer can rely on the same PC clock for timestamping

    • Important to timestamp as close to the source as possible

    • After the timestamping is done, the time it takes to save to disk does not matter

    6 December 2019 Info class internal Department / Name / Subject 23

    Synched time between log domains

  • Title and Content

    • In distributed logging, a global time must be established

    • There are protocols for this

    − NTP can achieve millisecond level synch

    − PTP can achieve sub-millisecond or even sub-microsecond level synch

    • When a global time is established, the same hold as for single computer

    − timestamp as close to the source as possible

    6 December 2019 Info class internal Department / Name / Subject 24

    Synched time between log domains

  • Title and Content

    • On a high abstraction level there are two options for a sw architecture

    − event based

    − time triggered

    • Most sensible sw architectures for this problem and incapable CPU is timetriggered

    − state machine

    − time slots

    6 December 2019 Info class internal Department / Name / Subject 25

    Seat heater

  • Title and Content

    • Time slots

    − start by resetting free running timer

    − do some work

    − wait for reaching specific timer value

    − Repeat for next slot

    • Slots dedicated for

    − sampling

    − clocking digital signal

    − logic

    • Check different counter regularly and rewrite GPIO registers

    6 December 2019 Info class internal Department / Name / Subject 26

    Seat heater

  • Title and Content

    • Solution in user-space

    − an application must be aware of all other applications and denote this the ticker that knowsabout clock ticks

    − each application must wait to be started by the ticker

    − each application has a period time that is converted into clock ticks by the ticker

    − when remaining clock ticks reaches zero for an application, the ticker releases it

    − the synchronization primitives can be, e.g., mutexes and condition variables

    6 December 2019 Info class internal Department / Name / Subject 27

    Base software application synchronization

  • Title and Content

    • Solution in user-space

    6 December 2019 Info class internal Department / Name / Subject 28

    Base software application synchronization

    application 1

    init method sends period time

    ticker

    ticker says go!

    waits at start of step function

    ticker says go!

    waits at start of step function

    clock ticks

    clock ticks

  • Title and Content

    • Solution in kernel space

    − The operating system must have some mechanism to start applications synchronized

    − The operating system must have notion of period times

    − The operating system must have notion of priority and possibly priority inheritance

    6 December 2019 Info class internal Department / Name / Subject 29

    Base software application synchronization