Free Essay

Science

In:

Submitted By mayur17990
Words 7421
Pages 30
TARCAD: A Template Architecture for Reconfigurable Accelerator Designs
Muhammad Shafiq, Miquel Peric` s a Nacho Navarro Eduard Ayguad´ e Computer Sciences Dept. Arquitectura de Computadors Computer Sciences Barcelona Supercomputing Center Universitat Polit` cnica de Catalunya Barcelona Supercomputing Center e Barcelona, Spain Barcelona, Spain Barcelona, Spain {muhammad.shafiq, miquel.pericas}@bsc.es nacho@ac.upc.edu eduard.ayguade@bsc.es

Abstract—In the race towards computational efficiency, accelerators are achieving prominence. Among the different types, accelerators built using reconfigurable fabric, such as FPGAs, have a tremendous potential due to the ability to customize the hardware to the application. However, the lack of a standard design methodology hinders the adoption of such devices and makes difficult the portability and reusability across designs. In addition, generation of highly customized circuits does not integrate nicely with high level synthesis tools. In this work, we introduce TARCAD, a template architecture to design reconfigurable accelerators. TARCAD enables high customization in the data management and compute engines while retaining a programming model based on generic programming principles. The template features generality and scalable performance over a range of FPGAs. We describe the template architecture in detail and show how to implement five important scientific kernels: MxM, Acoustic Wave Equation, FFT, SpMV and Smith Waterman. TARCAD is compared with other High Level Synthesis models and is evaluated against GPUs, an architecture that is far less customizable and, therefore, also easier to target from a simple and portable programming model. We analyze the TARCAD template and compare its efficiency on a large Xilinx Virtex-6 device to that of several recent GPU studies.

I. I NTRODUCTION The integration levels of current FPGA devices advanced to the point where all functions of a complex application kernel can be mapped in a single chip. However, these high density FPGAs appear just like a sea of logic slices and embedded functionality cores such as general purpose processors, multipliers/adders, multi-ported SRAMs and DSP slices etc. Currently, it all depends on the FPGA application designer and how well he maps an application to the device. This practice is problematic for several reasons. First, it is a low-level approach that requires a great deal of effort for mapping the complete application. Second, reusability of modules across projects is significantly reduced. And, last but not least, it is difficult to scientifically compare hardware implementations that adhere to different high-level organizations and interfaces. This emphasizes the need to abstract out these particular hardware structures in a standard architectural design framework. Most of the studies that have ported applications to multiple accelerator architectures (like, for example, Cope et al [1], Garland et al. [2] or Shafiq et al. [3]) identify that two factors are the most critical ones to achieve high performance for an application. The first factor is the intrinsic

parallelism available in the algorithm being mapped on the accelerator. The second factor is how efficiently the designer arranges the data to be fed to the computational resources. FPGA’s have the potential to exploit both of these factors in the best optimized way. However, future FPGAs will not become mainstream accelerators if they are unable to solve the long-standing challenge of implementing applications in a well defined, simple and efficient way. A plethora of application kernels from the HPC domain have been ported to reconfigurable devices. However, most designs are specialized to a single environment due to the lack of a standard design methodology. This work is a step towards the harmonization of data-flow architectures for various FPGA-based applications written in HDLs (e.g. Verilog, VHDL) and High Level Languages (HLL). The architectures generated by HLL to HDL/Netlist tools (e.g. such as ROCCC [4] or GAUT [5]) also follow a simplified and standardized compilation target, but they have been designed specifically as compiler targets, which limits their applicability to HDL designers. In addition, these models are too constrained to support complex memory organizations or unorthodox compute engines which are often required to best exploit FPGAs. This work proposes an architectural template named TARCAD that allows to efficiently exploit FPGAs supported by a simple programming methodology. TARCAD not only enables HDL designers to work on a highly customizable architecture, it also defines a set of interfaces that make it attractive as a target for a HLL-to-HDL compilation infrastructure. This paper discusses the generic architectural layout of the TARCAD template for reconfigurable accelerators. The proposed architecture is based on the decoupling of the computations from the data management of the application kernels, a concept reminiscent of Smith’s Decoupled Access Execute (DAE) architectures [6]. This makes it possible to independently design specialized architectures for both parts of the kernel in a data-flow envelope supported by our architectural layout. Computation scales depending on the size of the FPGA or the achievable bandwidth from the specialized memory configuration that feeds the compute part. We evaluate the architectural efficiency of an FPGA device for several applications using TARCAD and compare it with GPUs. This is an interesting comparison because both platforms require applications with data level parallelism and control divergence independent kernels.

Host Input Assembler Thread Manager

External Memory Memory Fetch Unit Input Smart Buffers

Host

External Interface Communication Interface

sp

Data Path Unrolled Pipelined Loop Body
Data Cache Load/Store Data Cache Load/Store

Memory Unit

Output Smart Buffers

Processing Unit
Memory Store External Memory

Controller Data Path

data between the accelerator and the global memory. Among different options, PMC improves the accelerator kernel performance by providing programmable strided accesses. This makes it possible for PMC to directly handle 1D, 2D and 3D tiling of large data sets rather than doing the same in software at the host processor. C. The Application Specific Data Management Block TARCAD’s application specific management block helps to arrange data for efficient usage inside the computations. This block consists of four sub-blocks. These can be identified in Figure-2 as Data-Set (DS) Manager, Configurable Memory Input Control, Algorithm Specific Memory Layout and the Programmable Data Distributer. Out of these subblocks, the Algorithm Specific Memory Layout (mL) plays a central role in designing an efficient accelerator by providing re-arrangement and the reuse of data for compute blocks. The memory layouts can be common for various applications as shown by Shafiq et al. [8]. TARCAD can also adopt a similar common memory layout but in this paper we only consider that a memory layout for an application is customized using the block RAMs (BRAMs) of the device. The pattern of writing data to a customized memory layout can be very different from the reading pattern from this memory layout. A simple example is the memory layout for the FFT (decimation in time) architecture where data is written sequentially while it is read into the architecture in a bit-reversed order. Therefore, TARCAD keeps separate write and read interfaces (CFG MEM-IN-CONTROL and the Programmable Data Distributer) to the memory layout block as shown in the Figure-2. The configurable memory input control (CFG-MEM-IN-CONTROL) is used to write data to the memory layout. It is based on a finite state machine (FSM) and works according to a preset design. This memory input control expects various streams of independent data sets through the streaming FIFO channels (DS-ix). Each of the DS-ix can have multiple sub-channels to consume the peak external bandwidth. However, all sub-channels in a DSix will represent the same data set.
External to Device

Data Bus

External Memory

(a)

(b)

(c)

Figure 1. The compute Models: (a) A Generic GPU (b) ROCCC (c) GAUT

II. T HE TARCAD A RCHITECTURE A. Accelerator Models for Supercomputing The TARCAD proposal targets both HDL accelerator designers by providing them with a standard accelerator design framework, as well as High Level Synthesis (HLS) tool developers by giving them a standard layout to map applications on. HLS tools define an architectural framework into which they map the algorithmic descriptions. The compute models of two such tools, ROCCC [4] and GAUT [5], are shown in Figure 1(b) and (c), together with the compute model of GPUs shown in Figure-1(a). The basic compute model for ROCCC requires streaming data input from an external host. This data is stored in smart buffers before being consumed by the compute units and again before being sent back to main memory. The GAUT architecture, on the other hand, provides an external interface to access data based on data pointers. The memory model of GAUT is simple and can keep large chunks of data using BRAM as buffer memory. Another architecture that is nowadays highly popular, the Graphics Processing Units (GPUs), use their thread indexes to access data from up to five dimensions. A large number of execution threads helps to hide external memory data access latencies by allowing threads to execute based on data availability. The TARCAD architectural layout provides a generic design framework to map application specific accelerators onto reconfigurable devices. The micro-architectural details of TARCAD layout are presented in Figure-2. It is evident from the figure that the TARCAD layout can be partitioned into four representative main blocks and their constituent sub-blocks. A detailed description for these main blocks (External Memory Interface, Application Specific Data Management Block, Algorithm Compute Back-End and the Even Managing Block) follows. B. The External Memory Interface In general, the nature of accelerators is to work on large contiguous data sets or the streams of data. However, data accesses within a data set or across multiple data sets from an algorithm are not always straight forward. Therefore, accelerators can be made more efficient by providing some external support to manage the data accesses in a more regular way. TARCAD supports a Programmable Memory Controller (PMC) as an external interface to the main memory. This controller is inspired from the works done by Hussein et al. [7]. It helps to transfer pattern based chunks of

Global Memory Programmable Memory Controller (PMC) Data-Set (DS) Manager for CFG Device
DSi0 DSi1 DSi2 DSiN
Events Manager

CFG MEM-IN-CONTROL

Event-0 Event-1 Event-k

DSo0

DSo1

DSo2

DSoM

Reconfigurable Device

CFG MEM-OUT-CONTROL br-0 br-1 br-p Algorithm Compute Block Instantiation-0 Algorithm Compute Block Instantiation-1 Algorithm Compute Block Instantiation-H LM

Algorithm Specific Memory Layout Using BRAMs

Programmable br-0 br-1 Data Distributer br-p br-0 br-1 br-p

LM

LM

Figure 2.

TARCAD architectural layout

The Data-Set Manager provides a command data interface between the reconfigurable device and the external-to-device PMC unit. This Data-Set Manager helps to fill the DS-ix streaming FIFOs. On the reading side of the memory layout, the Programmable Data Distributer is used which is also a FSM. However, it is programmable in the sense of distributing different sets of data to the different instantiations of the same compute block (see Section II-D). D. The Algorithm Compute Back-End The compute Back-End consists of Branch-Handlers, Compute Block Instantiations and Configurable Memory Output Control. The computational block is the main part of this Back-End and it can have multiple instantiations for an algorithm. Each instantiation of the compute block interfaces with the programmable data distributer through its BranchHandler. These Branch-Handlers are kind of FIFO buffers to support data pre-fetch for avoiding a time plenty, in-case, of a branch divergence in the compute block. The TARCAD architecture expects a compute block as a combination of arithmetic compute units with minimal complexity in the flow of data inside the compute block. All compute blocks either keep a small set of their computational results in the local memory (LM) shareable with other instantiations or forward the results to configurable memory output control (CFG MEM-OUT-CONTROL). CFG MEMOUT-CONTROL collects data from the compute blocks for specific set of output data set (DS-Ox). The results collected at CFG MEM-OUT-CONTROL are either routed back to the global memory by the Data-Set Manager or written back to the CFG MEM-IN-CONTROL. E. The Event Managing Block The role of Event Manager is to guide and monitor the kernel mapped on TARCAD. The Event Manager can be a FSM or a simple processor with multiple interrupt inputs. In our current work, we consider the Event Manager to be a FSM. In general each event in the Event Manager guides and monitors for any single phase of kernel execution. The event manager is initialized by the user before the execution of a kernel. It holds information like the set of events (signals from various blocks) for each phase, input/output memory pointers and the data sizes for different data sets used in the execution of each phase of a kernel. The Event Manager monitors the execution of the kernel and takes actions at the appropriate event. The actions are in the form of exchanging information (setting/getting state data by the event manager) with all the other state machine based blocks. The Event Manager keeps a set of counters shared in all phases while a set of registers for each phase initialized by the user. F. TARCAD Implementation The motive behind the TARCAD layout is to support efficient mapping of application specific accelerators on to the reconfigurable devices. Therefore, these specific mappings of various designs require to physically change or

TARCAD Blocks Template Library Compute Unit's Annotated HDL Memory Layout Definitions Accelerator Specific Parameter Set DATE Translator TARCAD Mapped HDL

Figure 3.

TARCAD Implementation: Environment of the DATE System

scale the data paths, FSMs, the special memory layouts and the compute blocks. These changes for a reconfigurable device can be made only at compile time. Therefore, we are propounding the implementation of the TARCAD using a template expansion method. This is a metaprogramming (i.e. code generation) process that generate a specific HDL of the accelerator based on the TARCAD layout. The template expansion is provided by our prototype translator called Design of Accelerators by Template Expansion (DATE) system [9]. This is an in-house research tool to support template based expansions for high level domain abstractions. A simple block diagram in Figure-3 shows the environment of the DATE system. The main inputs from the user to the DATE system are annotated HDL based template code for the compute block and the data flow definitions for the memory layout. The annotations used in coding the HDL are similar to those used in the DATE templates [9]. A set of parameters is also passed to the DATE translator to adjust and generate other HDL design modules by using the TARCAD templates for various blocks maintained inside the TARCAD template library. For example, some important parameters related to the Event Manager are the total number of phases through which a kernel will execute, the total repetitions of a phase, the maximum number of events connected to that phase, the total number of data pointers used in the phase and the equations for memory block accesses for each of the pointer in the phase. However, the actual list of data pointers, the monitoring and activation events and the event’s target blocks are initialized using special commands directly by the Data Set manager at the execution start-up or during the run-time. III. A PPLICATION K ERNELS ON TARCAD TARCAD layout can be mapped for all kinds of application kernels. The following section, present some example application kernels mapped on TARCAD. A. Matrix-Matrix Multiplication (MM-M) Matrix-Matrix multiplication can have numerous design possibilities. Here we use a memory layout and compute block which are efficient for large sized matrices and the matrices are accessed in the same “row major order“ from the external memory. As shown in the Figure-4(a), matrices A and B are fetched in the order of one row and multiple columns. The process of fetching matrices data and writing the results back is managed by the Event Manager with the help of Data Set Manager and CFG MEM-IN/OUT-Controls. A small piece of pseudo code which represents the Event Manager’s FSM

DSA
FB

CFG MEM | IN | CTR

--- a21 a1m -- a12 a11 bn1 --- b31 b21 b11 bn2 --- b32 b22 b12
-----------

br0ins-0 br0ins-1 br0ins-br0ins--

1 ISa

DSB
FB

bnp --- b3p b2p b1p
BRAM Based FIFOs

FB

= A_pointer 2 ISb = B_pointer 3 SSra = A_row_size 4 SSmb = B_matrix_size 5 loop(EVre) : 6 if (EVrr) : i=0 ; i++ 7 FSa = ISa + i x SSra 8 FSaz = SSra 9 FSb = ISb 10 FSbz = SSmb 11 end_if 12 end_loop

(a)

(b)

matrix-B is scattered around the multiple circular buffers equal to the number of compute block instantiations in the back-end. Therefore, the dot product of an element from the row of Matrix-A is done with multiple columns of MatrixB. Each instantiation of the compute block accumulates the results for the element wise dot product of a row (Matrix-A) and a column (Matrix-B). B. Acoustic Wave Equation (AWE) Solver A common method to solve the Acoustic Wave Equation (AWE) numerically consists of applying a stencil operator followed by a time integration step. Some details on the AWE solver and its implementations are described by Araya et al. [10]. In our TARCAD based mapping of the AWE solver, the two volumes of previous data sets for the time integration part are forwarded to the compute block by using simple FIFO channels in the TARCAD’s memory layout. However, our implementation of the stencil operations follows the memory layout of an 8 × 9 × 8 odd symmetric 3D stencil as shown by Shafiq et al. [3]. In our TARCAD based mapping of the AWE kernel, we consider real volumes of data that are normally larger than the internal memory layout of the accelerator. Therefore, a large input volume is partitioned into its sub-volumes as shown in Figure-5(a). A sub-volume block also needs to copy the so-called ”ghost points” (input points that belong to the neighboring sub-volume). For example, Block 7 shown in Figure 5(a) needs to be fetched as an extended block that includes ghost points from the neighboring Blocks 2, 6, 12 and 8. However, these ghost points are only required for the current volume being used in stencil computations. The TARCAD layout supports offloading the management of block-based data accesses to the programmable memory controller (PMC). In the AWE case, for simplicity, TARCAD accesses the same pattern of the extended sub-volumes from all three input volumes. The CFG MEM-IN-CONTROL discards the ghost points accessed for the two previous volumes used in time integration. The PMC is programmed by the host to access the three volumes of data –block by block– on the request of the Event Manager. The example pseudo code for the FSM of Event Manager is shown in Figure-5(b). In the first three lines of the pseudo code, the FSM does an initialization of the initial source pointers (ISx) for the three input volumes. In the next line, a reset to zero of
Z=M Partitioned Blocks 0 5 Y=P=∞ 10 1 6 11 2 7 12 3 8 13 4 9 14
1 ISa

Figure 4. MM-M : (a) Matrices elements distribution into application specific memory layout and (b) The pseudo code for matrices data accesses by the Event Manager

for the data fetch requests is shown in Figure-4(b). In order to make it clear, the FSM actions are non-blocking (i.e simultaneous but based on conditions) and purpose of the sequential pseudo code is just to give the basic idea of the mechanism. The structure of this FSM already exists as a template in the DATE Translator library (Figure-3). However, an arbitrary number of registers to keep kernel specific information are created from the parameterized information at the translation time. For example in Figure4(b), ISa and ISb are the registers created for the initial source pointers to access matrices from external memory. FSa and FSb are the tuple registers for the fetch source pointers (the current pointers). FSaz and FSbz represent the registers for the fetch sizes of data. The source size registers are mentioned as SSra and SSmb. The external parameters to the DATE System also include simple computational equations to generate data accesses in big chunks, like “F Sa = ISa + i × SSra“ where “i“ is taken as internal incremental variable. The parameterized inputs also creates two events, the ”row request event“ (EVrr) and the “rows end event“ (EVre) coming from the CFG MEM-IN-Control and CFG MEM-OUT-Control respectively. These events are monitored at the Event Manager. At run time, the FSM of the Event Manager corresponding to the pseudo code shown in Figure-4(b) initializes the registers ISa, ISb, SSra and SSrb. This is done by using special initialization commands from an external host. These commands are decoded by the DATA Set Manager and forwarded to the Event Manager. The DATA Set Manager can also hold multiple requests from the Event Manager and forward these requests consecutively to the programmable memory controller (PMC). As in lines 5 and 6 of the pseudo code, the Event Manager monitors the event signals EVrr and EVre and sends the tuples of data for the external memory fetch pointers and their sizes to the Data Set Manager along with necessary control signals. This starts fetching of data by the PMC from both matrices A and B in the external memory. The physical data transactions are directly handled by the Data Set Manager and the CFG MEM-IN/OUT-Controls. The FSMs at CFG MEM-IN/OUTControls are also built based on their own parameterized information and take care for the generation of events EVrr and EVre at the appropriate execution time. During the run, one row of matrix-A is fetched from the external memory into a single circular buffer and used element by element in each cycle while the fetched row from

X= N

= V1_pointer 2 ISb = V2_pointer 3 ISc = V3_Pointer 4 BnV1=BnV2=BnV3= 0 5 loop(EVbe) : 6 if (EVrb) : 7 FSa = ISa 8 FBv1 = BnV1++ 9 FSb = ISb 10 FBv2 = BnV2++ 11 FSc = ISb 12 FBv3 = BnV3++ 13 end_if 14 end_loop

(a)

(b)

Figure 5. 3D-Stencil for : odd symmetric 3D stencil, (a) The large input volume partitioned into sub volumes (b) The pseudo code for sub-volume accesses by the Event Manager

External Memory

block counts (BnVx) for the sub volumes is done. Similar to the MM-M kernel case, the Event Manager of AWE monitors two events. One event, ”Block Ends“ (EVbe), is sourced from the CFG MEME-OUT-CONTROL and ends the execution of the kernel while the other event ”Block Request” (EVbr) comes from the CFG MEM-IN-CONTROL and initiates a new request of the block. Inside the control structure, the FSM updates three tuples of parameters corresponding to the three input volumes. Each tuple consists of the base pointer of the volume (FSx) and the block number (FBvx). These tuples of data are used by the Data-Set Manager to access external data through the programmable memory controller. The flow of data between the DataSet Manager and CFG MEM-IN-CONTROL is synchronized with handshake signals between the two interfaces. C. Smith Waterman (SW) The implementation of Smith Waterman algorithm results in a systolic array of processing cells. This kind of data flow is also well suited to map on the compute blocks of the TARCAD architecture. The left part of Figure 6 shows a TARCAD based systolic array of processing cells mapped by joining a number of compute blocks to run the SW-kernel. Each of the compute blocks consists of an algorithm specific processing cell. This processing cell, in our case, consists of the Smith Waterman compute architecture proposed by Hasan et al. [11]. The input data for a compute block constitutes only a single branch set that consists of Ax , By (the two sequences) and Mup , MDiag (the top and diagonal elements) from the similarity matrix. The MLD represents the current data passed through the LM to the next compute block as left-side’s matrix-M data. This data-word is also passed in stair case flow to be used as a diagonal data element. The generic layout of the compute block in TARCAD is shown in Figure-6(Right). Each compute block keeps a dual ported local memory (LM) to communicate low latency data with other compute blocks. Each word of this local memory is also accompanied by a valid bit which describes the validity of the data written to it. This valid bit is invalidated by the receiving compute block. In case the receiving blocks are more than one then only one of them can drive the invalidation port of the source compute block and others should be able to work synchronous to it. Inside a compute block, the LM is written as a circular buffer therefore the
A(x) Mup,MDig A(x+1) Mup,MDig B(y) MLD A(x+2) Mup,MDig Algorithm Compute Block

Programmable Memory Controller (PMC) Data-Set (DS) Manager for CFG Device
DSi0 DSi1
Events Manager

MEM-IN-CONTROL
Control Data

Event-0 Event-1

DSo0

DSo1

MEM-OUT-CONTROL
Data Control

FFT Core - N Data Organization + Computations

Figure 7.

Mapping an existing FFT core on TARCAD

invalidation of the valid bit can not create any read/write hazards for few consecutive cycles for the LM data between the source and destination. The width and depth of LM is parameterized and it can be decided at the translation time. Moreover, each compute block also has a local memory read and invalid control (LM R/I Ctrl), which helps to read and invalidate a word of the source block’s LM. The read word is placed into a FIFO which is readable by the compute block’s algorithm specific processing cell. D. Fast Fourier Transform (FFT) The TARCAD layout is flexible and can also integrate with third party cores. For the FFT case, we show in Figure 7 how TARCAD interfaces with an FFT core generated by Xilinx CoreGen [12]. TARCAD interfaces and controls the single or multiple input/output streams of data corresponding to one or more instantiations of the FFT cores. E. Sparse Matrix-Vector Multiplication (SpMV) In our TARCAD based mapping of SpMVM kernel, we use an efficient architecture that is based on a row interleaved input data flow described by Dickov et al. [13]. TARCAD’s FSM in CFG MEM-IN-CONTROL uses a standard generic Spars Matrix format and converts it internally to the rowinterleaved format before feeding to the compute block. However, this methodology needs to know in advance (at translation phase), the maximum possible number of nonzero elements in any row of the matrix. This information helps the translator to correctly estimate the maximum number of rows possible to decode and maintain inside the SpMV memory layout. F. Multiple Kernels On TARCAD TARCAD can handle multiple algorithms working at the same time. In general, each algorithm should be maintained with separate data paths, memory organization and the compute units. Only data requests to the global memory (through the Data-Set Manager) are shared. However, design schemes like a spatially mapped, shared memory layout recently presented by Shafiq et al. [8] could help to use shared data for certain kernels with different types of compute block instantiations. IV. E VALUATION M ETHODOLOGY To evaluate the TARCAD system, we simulate the mappings of various application kernels as presented in sectionIII by using a Xilinx Virtex-6 XC6VSX475T device. The

br-0 br-1 CELL-(0,0) CELL-(0,1) CELL - 1 CELL-(0,1) CELL - 1

br-p

B(y+1) MLD

CELL-(1,0)

CELL-(1,1) CELL - 1

CELL-(1,1) CELL - 1

Algorithm Specific Processing Cell

VB B(y+1) MLD

LM

CELL-(2,0) CELL - 1

CELL-(2,1) CELL - 1

CELL-(2,2) CELL - 1
LM R/I Control Output

Figure 6. Smith Waterman : symmetric 3D stencil, Left: The Systolic array of compute blocks, Right: Architectural support for inter-compute block communication.

Control

Data

FFT Core - 0 Data Organization + Computations

Data Control

Table I A PPLICATIONS M APPED TO TARCAD USING V IRTEX -6 & ISE 12.4 Applications Compute Freq DSP48E1 Slices BRAMs Blocks (MHz) (36Kb) M-M Mul 403 105 2015 49757 432 AWE Solver 22 118 2008 45484 677 SW 4922 146 2012 63989 85 FFT 4-48 125 2016-472 59K-48K 0-1060 SpMVM 134 115 2010 33684 516

HDL designs were placed and routed using the ISE 12.4 environment. The Virtex-6 device used in our evaluations has a very large number (more than 2K) of DSP48E1 modules. Therefore, we did maximum possible instantiations of the compute blocks for the kernel and used device’s maximum operational frequency after the place and route for all the back-end instantiations. The external memory support to TARCAD is dependent on the board design. In our simulated evaluations for TARCAD we use an aggressive external memory interface with multiple memory controllers, providing an aggregate peak bandwidth between 100GB/sec to 144GB/s. This external memory interface performance is similar to what can be achieved today by GPUs. In our evaluations, the efficiency of the application kernels mapped on the TARCAD layout are compared with the state of the art implementations of the same kernels on various GPU devices. The choice of the GPU based implementation is based on two points. One is that the GPU implementation should be selected out of the available ones for the best possible GPU device and second, we should be able to reproduce the same input test data for the TARCAD based implementations. The architectural efficiencies shown in the section-V(b)-(f) are defined differently for the kernels using floating point computations and cell updates as follows: Arch. Eff for Kernels with FP Operations = GFLOPS Achieved divided by the Device Max.GFLOPS Arch. Eff for Kernels with CUPS = CUPS Achieved divided by the Operational Freq V. R ESULTS & D ISCUSSION The overall performance (Figure-8-(a)) for various kernels mapped on TARCAD remained lower than 100 GFlops (or GCUPS for SW). This is considerably lower than that for the reference performances on GPUs. In fact, it is an expected phenomena as the current reconfigurable technology operates at an order of magnitude lower operational frequency (see Table-I) for the mapped designs. However, if we look at the efficiency of the TARCAD mapped applications, these are quite promising due to the customized arrangement of data and compute blocks. In following we will discuss efficiency of each kernel. In support to the discussion, the total number of compute units instantiated along with their operational frequencies and the usage of chip resource are given in Table-I. The numbers for FFT corresponds to the implementations for 128 points to 65536 points and frequency is chosen for the lowest value. 1) Matrix-Matrix Multiplication (MM-M): In case of MM-M, we can observe from the plot-8-(b) that the efficiency of TARCAD based implementation is on average 4

times higher than that for GPU. However, for smaller size of matrices the efficiency is relatively lower because of the two factors: One is that the number of columns in Matrix B are less than 403 (total compute block instantiations) or secondly, the number of columns are not multiples of 403. Both cases make unoptimized usage of the available compute units on TARCAD. 2) Acoustic Wave Equation (AWE) Solver: The TARCAD mapped memory layout for the AWE kernel can handle sub-volumes of size 320 × 320 × ∞ in the Z, X and Y axes respectively. The results for AWE (Plot-8-(c)) shows that TARCAD based AWE kernel efficiency reaches to 14 times to that for GPU based implementation. However, then it drops lowest to 5× for 384 points 3D volumes. This is because 384 is not the multiple of the basic size (320 × 320 × ∞) for AWE managed specialized memory layout and suffers huge data and computational overhead. However, this plenty starts reducing with an increase in the size of the actual input volumes. 3) Smith-Waterman (SW): The Smith Waterman’s implementation on TARCAD is approximately 3 times (Figure8-(d)) efficient than the referenced GPU based efficiency. In-fact, this edge in architectural efficiency of TARCAD is only a result of the customized mapping for the computing cells and the systolic array. The front-end data management only takes care to buffer new sequences for comparison or for feeding back the results from the cells on the boundary of the systolic array through CFG MEM-OUT-CTRL, Data Set Manager and CFG MEM-IN-CTRL path. 4) Fast Fourier Transform (FFT): The memory requirement by the floating point, streaming based implementation of xilinx’s FFT core increases rapidly for larger number of points. In case of TARCAD based mapping, the instantiations of the FFT kernel for 16384 or larger points are limited by the total available BRAM of the device . This limitation is accordingly apparent from the plot shown in Figure-8(e). However, for the lower number of points (equal/lower than 8192), the instantiations of FFT compute blocks are dictated by the total number of DSP48E modules available on the device. 5) Sparse Matrix-Vector Multiplication (SpMV): In the SpMVM mapping on TARCAD, we modified the original design of Dickov et al. [13] to a special yet generic compute block for handling any kind of laplacian data. This design handles a three point front-end which accumulates three dot products at a time from a row. However, the inefficiencies for this laplacian specific compute block appear when the non-zero diagonals in the laplacian matrix are not a multiple of 3. VI. R ELATED W ORK The topic of developing a compute template for FPGAs is related to many areas of research. In Section II-A we presented a small look into several recent developments that are directly related to this paper. Here we will provide a succinct overview of other related developments.

24

48

92

51

96

25

25

32

38

10

20

40

81

Data Size --->

Data Size --->

a)
Device Efficiency (CUP/Cyc) --->
35 30 25 20 15 10

(b)
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

(c)
0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02

Smith Waterman Sequence Alignment
Device Efficiency --->

Fast Fouriear Transform

Sparse Matrix-Vector Multiplication GPU FPGA

GPU FPGA

GPU FPGA

12 8

51 2

4

81 92

25 6

4

76 8

40 96

16 38

10 2

32

14 4 18 9 22 2 37 5 46 4 56 7 65 7 72 9 85 10 0 0 15 0 0 20 0 0 25 5 04 30 0 35 5 6 40 4 61 45 4 47 8 4 51 3 4 54 7 78

Query Sequence Length --->

Laplacian Points --->

(d)

(e)

(f)

Figure 8. (a) Performance Numbers for TARCAD based Kernels (GFLOPS and GCUPS for SW) using Virtex-6 XC6VSX475T device and (b-f) The Architectural Efficiency for :(b) MM-M (GPU: Tesla C2050 in [14]) (c) AWE (GPU: Tesla C1060 in [10]) (d) SW (GPU: Tesla C1060 in [15]), (e) FFT (GPU: Tesla C2050 in [14]) (f) SpMVM (GPU: GTX 280, Cache Enabled [16])

TARCAD defines both a high-level model for the computation flow as well as a strategy for organizing resources, managing the parallelism in the implementation, and facilitating optimization and design scaling. Following DeHon’s taxonomy, these two correspond to the fields of compute models and system architectures [17]. Compute models include abstractions such as Dataflow, Sequential Control or Hoare’s CSP. System Architectures roughly consists of Dataflow Machines, von Neuman computers and Data Parallel architectures such as SIMD, SPMD or SIMT machines. Field Programmable Gate Arrays offer a raw and unconfigured computation substrate that allows mapping all of the previous models on a chip. This provides great flexibility, but, as already discussed in this paper, at the cost of many design overheads. The raw logic and routing hardware also creates some performance bottlenecks as it imposes a considerable area penalty and limits the frequency at which a circuit can operate. As a consequence, many researchers have attempted to reduce the flexibility and improve frequency by designing new reconfigurable hardware with reduced the interconnection networks and more fullcustom functional units. Such chips are often called CoarseGrained Reconfigurable Architectures (CGRA). Similarly to TARCAD they also define stricter compute models and system architectures. PipeRench [18], MUCCRA [19] or ADRES [20] are examples of CGRA architectures. A related architecture are the so-called Massively Parallel Processor Arrays (MPPA), which are similar to CGRAs, but include complete, although very simple, processors instead of the functional units featured within CGRAs. PACT-XPP [21] are is an example of a MPPA-style architecture. Defining a compute model and a system architecture are not only specific to chip design. Several efforts have concentrated on defining environments in which to accommodate

FPGA chips. Kelm et al. [22] used a model based on local input/output buffers on the accelerator with DMA support to access external memory. Brandon et al [23] proposes a platform-independent approach by managing virtual address space inside their accelerator. Several commercially available machines like the SGI Altix-4700 [24] or the Convey HC 1 [25] propose system level models to accelerate application kernels using FPGAs. These models combine a CPU with one or multiple FPGAs running over a system bus. Another option is to integrate CPU and FPGA directly in a single chip. Several research projects have covered this possibility. In the Chimaera architecture [26], the accelerator targets special instructions that tell the microprocessor to execute the accelerator function. The accelerator in Molen processor [27] uses some exchange registers which get their data from processor register file. The major FPGA vendors are now introducing new FPGAs that include processor cores and FPGA logic, and which are specifically designed for usage together with High Level Synthesis tools. The Zynq7000 family of devices is a recent commercial architecture that combines 7-Series reconfigurable logic with ARM cores. VII. C ONCLUSIONS In this paper we have presented our developments towards a unified accelerator design for FPGAs that improves FPGA design productivity and portability without constraining customization. The evaluation on several scientific kernels shows that the template makes efficient use of resources and achieves good performance. In this work we have focused on showing how the properties are achieved for HDL-based designs. Our TARCAD design also considered adoptability by High Level Synthesis tools as a main goal in order to provide interoperability and

27

Data Size --->

65

53 6

Device Efficiency --->

t

44

5p

3p

7p

9p

pt

t

t

t

51

90 80 70 60 50 40 30 20 10 00

GFLOPS (GCUPS for SWM) --->

Device Efficiency --->

GPU
2

FPGA

1 4 5 7 6 9 2 3 8 Problem Sizes ---> (Kernal's Data Sets are the ones used in the Efficiency Plots)

Device Efficiency --->

GFLOPS for the Evaluated Kernels on FPGAs MMM FFT SpMVM RTM SWM

6

6

0

4

8

2

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Matrix-Matrix Multiplication

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

3D-Reverse Time Migration GPU FPGA

high customization to such tools. In the future we plan to analyze this possibility in more detail. Although we have shown that TARCAD is more efficient than GPUs, final performnance is often worse due to the slower operational frequencies of FPGAs. Designing a coarse-grained reconfigurable architecture (CGRA) based on the TARCAD architecture is an interesting idea to improve final performance that could be explored in the future. R EFERENCES
[1] B. Cope, P. Y. Cheung, W. Luk, and L. Howes, “Performance Comparison of Graphics Processors to Reconfigurable Logic: A Case Study,” IEEE Transcations on Computers, August 2009. [2] M. Garland and D. B. Kirk, “Understanding throughputoriented architectures,” Commun. ACM, vol. 53, pp. 58–66, November 2010. [3] M. Shafiq, M. Peric` s, R. de la Cruz, M. Araya-Polo, a N. Navarro, and E. Ayguade, “Exploiting Memory Customization in FPGA for 3D Stencil Computations,” IEEE FPT, December 2009. [4] B. Buyukkurt, J. Cortes, J. Villarreal, and W. A. Najjar, “Impact of high-level transformations within the ROCCC framework,” ACM Trans. Archit. Code Optim., December 2010. [5] P. Coussy and D. Helle, “GAUT - High-Level Synthesis tool From C to RTL.” [6] J. E. Smith, “Decoupled access/execute computer architectures,” in Proceedings of the 9th annual symposium on Computer Architecture, ser. ISCA ’82. Los Alamitos, CA, USA: IEEE Computer Society Press, 1982, pp. 112–119. [7] T. Hussain, M. Peric` s, and E. Ayguad´ , “Reconfigurable a e Memory Controller with Programmable Pattern Support,” HiPEAC WRC, Heraklion Crete, January 2011. [8] M. Shafiq, M. Peric` s, N. Navarro, and E. Ayguad´ , “FEM: a e A Step Towards a Common Memory Layout for FPGA Based Accelerators,” 2010. [9] M. Shafiq, M. Peric` s, N. Navarro and E. Ayguad´ , “A a e Template System for the Effcient Compilation of Domain Abstractions onto Reconfigurable Computers,” HiPEAC WRC, Heraklion Crete, Jan 23, 2011. [10] M. Araya-Polo, J. Cabezas, M. Hanzich, M. Peric` s, F. Rubio, a I. Gelado, M. Shafiq, E. Morancho, N. Navarro, E. Ayguad´ , e J. M. Cela, and M. Valero, “Assessing Accelerator-Based HPC Reverse Time Migration,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, pp. 147–162, 2011. [11] L. Hasan, Y. M. Khawaja, and A. Bais, “A Systolic Array Architecture for the Smith-Waterman Algorithm with High Performance Cell Design,” Proceedings of IADIS European Conference on Data Mining, 2008. [12] Xilinx, ISE Design Suite CORE Generator IP Updates. [Online]. Available: http://www.xilinx.com/ipcenter/coregen/ updates.htm

[13] B. Dickov, M. Peric` s, N. Navarro, and E. Ayguade, “Rowa interleaved streaming data flow implementation of Sparse Matrix Vector Multiplication in FPGA,” in 4th Workshop on Reconfigurable Computing, WRC-2010, 2010. [14] NVIDIA, “Tesla C2050 Performance Benchmarks,” Tech. Rep., 2010. [Online]. Available: www.siliconmechanics.com/ files/C2050Benchmarks.pdf [15] NVIDIA, “CUDASW++ on Tesla GPUs,” 2010. [Online]. Available: http://www.nvidia.com/object/swplusplus on tesla.html [16] N. Bell and M. Garland, “Efficient sparse matrix-vector multiplication on cuda,” NVIDIA Technical Report NVR-2008-004, Dec. 2008. [17] S. Hauck and A. DeHon, “Reconfigurable computing: the theory and practice of FPGA-based computation,” November 2007. [18] S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer, “Piperench: a co/processor for streaming multimedia acceleration,” in Proceedings of the 26th annual international symposium on Computer architecture, ser. ISCA ’99. Washington, DC, USA: IEEE Computer Society, 1999, pp. 28–39. [19] Y. Saito, T. Sano, M. Kato, V. Tunbunheng, Y. Yasuda, M. Kimura, and H. Amano, “Muccra-3: a low power dynamically reconfigurable processor array,” in Proceedings of the 2010 Asia and South Pacific Design Automation Conference, ser. ASPDAC ’10. Piscataway, NJ, USA: IEEE Press, 2010, pp. 377–378. [20] J. Bormans, “ADRES Architecture - Reconfigurable Array Processor,” Chip Design Magazine, November 2006. [21] V. Baumgarte, G. Ehlers, F. May, A. N¨ ckel, M. Vorbach, u and M. Weinhardt, “Pact xpp – a self-reconfigurable data processing architecture,” J. Supercomput., vol. 26, pp. 167– 184, September 2003. [22] J. Kelm, I. Gelado, K. Hwang, D. Burke, S.-Z. Ueng, N. Navarro, S. Lumetta, and W. mei Hwu, “Operating System Interfaces: Bridging the Gap between CPU and FPGA Accelerators,” Poster in International Symposium on FPGAs (FPGA’07), 2007. [23] A. Brandon, I. Sourdis, and G. N. Gaydadjiev, “General Purpose Computing with Reconfigurable Acceleration,” International conference on Field Programmable Logic and Applications, 2010. [24] SGI, “Reconfigurable Application-Specific Computing User Guide,” Tech. Rep., 2008. [25] C. C. Corporation, “The Convey HC-1: The Worlds First Hybrid-Core Computer,” HC1- Data Sheet, 2008. [26] S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao, “The Chimaera reconfigurable functional unit,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, pp. 206–217, 2004. [27] S. Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, and E. M. Panainte, “The MOLEN Polymorphic Processor,” IEEE Transactions on Computers, vol. 53, pp. 1363–1375, 2004.

Similar Documents

Premium Essay

Science

...Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science Science science science science science...

Words: 460 - Pages: 2

Premium Essay

The Importance Of Science In Science

...Our project specifically focused on the DNA sequence analysis of the genes in duckweed and how those genes fit into bioremediation. As an intimidated and shy freshman with a strong yearning to be part of the science community at my school, decided to join the intensive program with barely any knowledge about biology. It didn't seem like a smart move at the time, but I'm glad that I had persistence because I learned so much about the field through the guidance from the team. Exploring PCR and Restriction Digests, my group and I were able to publish new proteins on the national GenBank. I learned more about biology and how to work/perform in a team. This motivated me to join the Science/Biology Olympiads where we do independent research and come together as a team to compete. I found that these opportunities along with the research in my science classes not only help me learn actual science but provide me with valuable life skills that will help in the...

Words: 918 - Pages: 4

Premium Essay

Science

...The word 'science' is derived from the Latin word 'scientia' which means knowledge. Therefore, science is about gaining knowledge either through observing, studying, experience, or practice. Entire knowledge acquired through science is about discovering truths, finding facts, uncovering phenomenon hidden by the nature. Observations and experimentation, in science, support in describing truth and realities through systematic processes and procedures. For me, science is an intellectual set of activities designed to uncover information about anything related to this world in which we live. The information gathered is organized through scientific methods to form eloquent patterns. In my opinion the primary objective of science is to gather information and to distinguish the order found between facts. What Science Means to Me as an Upcoming Scientist Science exposes several ideas along with significant themes so that I could test them independently and without any bias to arrive at solid conclusion. For this purpose exchange of data and materials is necessary. I am able to generate real and tangible facts supported by reliable evidence. Work of scientist is based on theoretical science. It means, in theoretical science, there is only a sign, just a hint on which discoveries could be made, facts could be found. While studying science I am always working for determining truth, based on my perceptions, judgment, observation, experience, and knowledge collected through several means...

Words: 1529 - Pages: 7

Free Essay

Science

...Science is the concerted human effort to understand, or to understand better, the history of the natural world and how the natural world works, with observable physical evidence as the basis of that understanding1. It is done through observation of natural phenomena, and/or through experimentation that tries to simulate natural processes under controlled conditions. (There are, of course, more definitions of science.) Consider some examples. An ecologist observing the territorial behaviors of bluebirds and a geologist examining the distribution of fossils in an outcrop are both scientists making observations in order to find patterns in natural phenomena. They just do it outdoors and thus entertain the general public with their behavior. An astrophysicist photographing distant galaxies and a climatologist sifting data from weather balloons similarly are also scientists making observations, but in more discrete settings. The examples above are observational science, but there is also experimental science. A chemist observing the rates of one chemical reaction at a variety of temperatures and a nuclear physicist recording the results of bombardment of a particular kind of matter with neutrons are both scientists performing experiments to see what consistent patterns emerge. A biologist observing the reaction of a particular tissue to various stimulants is likewise experimenting to find patterns of behavior. These folks usually do their work in labs and wear impressive...

Words: 306 - Pages: 2

Premium Essay

Science

...Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science Science...

Words: 462 - Pages: 2

Premium Essay

Science

...Chapter 132 - Science and Technology Section SCIENCE AND TECHNOLOGY Science and technology provide people with the knowledge and tools to understand and address many of the challenges. Students must be provided with opportunities to access, understand, and evaluate current information and tools related to science and technology if they are to be ready to live in a 21st century global society. The study of science and technology includes both processes and bodies of knowledge. Scientific processes are the ways scientists investigate and communicate about the natural world. The scientific body of knowledge includes concepts, principles, facts, laws, and theories about the way the world around us works. Technology includes the technological design process and the body of knowledge related to the study of tools and the effect of technology on society. Science and technology merge in the pursuit of knowledge and solutions to problems that require the application of scientific understanding and product design. Solving technological problems demands scientific knowledge while modern technologies make it possible to discover new scientific knowledge. In a world shaped by science and technology, it is important for students to learn how science and technology connect with the demands of society and the knowledge of all content areas. It is equally important that students are provided with learning experiences that integrate tools, knowledge, and processes of science and technology...

Words: 8232 - Pages: 33

Free Essay

Science

...Blueprint to address Australia’s lack of science strategy unveiled Chief scientist makes series of recommendations to improve the country’s skills in science, technology, engineering and maths Australia’s chief scientist, Ian Chubb, has unveiled a blueprint to address Australia’s lack of a science strategy, with proposals aimed at improving skills, supporting research and linking scientific work to other countries. Chubb has made a series of recommendations to the federal government to increase focus on science, technology, engineering and maths skills. The strategy is partially aimed at addressing the declining number of students taking advanced maths in year 11 and 12, as well as the shortage of qualified maths and science teachers. Chubb said each primary school should have at least one specialist maths and science teacher, a policy currently used in South Australia and Victoria. This would be encouraged by improving incentives, including pay, for teachers. Other recommendations include supporting research potential, improving research collaboration with other countries and doing more to stress the importance of science to businesses and students. Chubb said: “We are the only OECD country without a science or technology strategy. Other countries have realised that such an approach is essential to remaining competitive in a world reliant on science and science-trained people. “Science is infrastructure and it is critical to our future. We must align our scientific effort...

Words: 510 - Pages: 3

Free Essay

Science

...1. Describe how fishing has changed at Apo Island, and the direct and indirect effects on people’s lives. Apo Island’s marine preserve allows fishing with hand-held lines, bamboo traps, large mesh nets, spear fishing without SCUBA gear, and hand netting. Fishing with dynamite, cyanide, trawling, and Muro-ami are forbidden. This has increased fish populations and made it easier to catch the fish needed to support a family. The healthy reef community now attracts ecotourists and provides jobs for islanders. 2. What are some basic assumptions of science? 3. Distinguish between a hypothesis and a theory. A hypothesis is the second step from the scientific method that forms an educated guess based off an observation. A theory is the information that was gathered to support the proof of an observation and confirms the hypothesis. 4. Describe the steps in the scientific method. 7. What’s the first step in critical thinking? The first step in critical thinking is 8. Distinguish between utilitarian conservation and biocentric preservation. Name two environmental leaders associated with each of these philosophies. Biocentric preservation emphasizes the fundamental right of living organisms to exist and to pursue their own good. While utilitarian conservation emphasized that resources should be used for the greater good for the greatest number for the longest time. Two environmental leaders associated with the biocentric preservation philosophy are John Muir...

Words: 294 - Pages: 2

Free Essay

Science

...SCIENCE My second month in Gusa Regional Science High School! Do you want to know what are the activities and what have I learn this month? As we all know this month is “Nutrition Month,” so I am excited what are the activities that would be held in celebrating the nutrition month. Come! and let us know what happened this July. On the first day of July we answer our wortext. We answer page 17, 1-5 in ½ lengthwise. The next day we had a contest about the scientist. We were gouped into two groups, group a and group b. Group a scored 27 while group b scored 31. Group b win with the score of 31, while group a lose with the score of 27. Group a’s punishment is they have to dance. The boys did it but the girls pleaded that they will just sing rather than dance. Teacher Cass agreed, and in the middle of singing “Nasayo Na Ang Lahat,”Teacher Cass gestured to the boys to join the girls singing. The boys didn’t insist in joining the girls. On Thursday, the rain was falling hard so teacher Cass is the one who come to us. We were trapped in Teacher Lory’s classroom. We had another game same us what we did yesterday. This time its boys vs girls. The girls won the game and as expected boys got a punishment. Their punishment was they did a fashion show. Some...

Words: 372 - Pages: 2

Premium Essay

Science

...Science: A Blessing Or A Curse Everything in the universe has its uses and abuses. The same applies to science. Science has revolutionized human existence and has made it happier and more comfortable. Modern science has many wonders. Electricity is one of its greatest wonders. It is a source of energy. It can run any type of machinery. With the help of electricity, we can light our rooms, run buses and trains and machinery, lift water for irrigation and can accomplish a multitude of other tasks. Much of the progress that mankind has made in different fields right from the stone age to the modern age is due to the progress made in the filed of science. Not only material progress but also the mental outlook of man has been influenced by it. Agriculture, business, transport, communication and medicine to name a few are all highly indebted to the wonders of science. We have become scientifically much more advanced than our ancestors. This is because the world has undergone a tremendous change because of the rapid strides made by science and technology. The discovery and development of a large number of powerful energy sources – coal, petroleum, natural gas, electricity etc. – have enabled humanity to conquer the barriers of nature. All these have facilitated the growth of fast modes of transport and communication, which have metamorphosed the world into a global village. Science has given man the means of travelling to the moon. Science is a great help in the agricultural field...

Words: 2098 - Pages: 9

Premium Essay

Science

...Blessing of Science Blessings of science are numerous. Science has completely changed the living style of man. Now man is living in a totally 18 century. From home to office, from farm to factory, form village to town, in short everywhere in life now we can see the unlimited blessing of science. At home, we find that science has provided many comforts to the human beings. Whether it is kitchen, lounge, shaker, chopper, toaster and many other appliance have brought a revolution in the working of a kitchen and a housewife. Although it is a fact that science can not fight with fate and it often fails to defeat nature yet it has done a lot to minimize the disastrous effects of nature. Scientists have invented such machines like air conditioner and heater that can give comfort to the man in hot summers and in extreme winters respectively. Now there are such instruments, which can warn man against floods, earthquakes and windstorm. After getting such warnings human beings are able to take preventive measures. Travelling and transportation were very difficult and paining in the past but now the miracles of science have made the travelling a luxury. Now there are variety of means of transportation like buses, cars, trains and aero-planes that have decreased the distances and have made the journey a comfort. Now hundreds of people can travel from own country to the other country in one train or in one aero-plane. The distance that could be covered by the people in the months...

Words: 1787 - Pages: 8

Premium Essay

Science

...Advantage Science gives us safe food, free from harmful bacteria, in clean containers or hygienic tins. It also teaches us to eat properly, indicating a diet balanced in protein and carbohydrate and containing vitamins. The results is freedom from disease and prolonged life. In pre-scientific days, food was monotonous and sometimes dangerous; today it is safe and varied. It is varied because through improved sea, land and air transport food can now be freely imported and exported. Science has also improved clothing and made it more appropriate for climatic and working conditions. Man-made fibers and versatile spinning machines, today enable us to dress in clothes both comfortable and smart without being expensive. Home, school and office all bear witness to the progress and application of science. Nowadays, most homes possess electric lighting and cooking, but many also have washing machines, vacuum cleaners and kitchen appliances, all designed to increase comfort and cleanliness and reduce drudgery. Science produces the fan which cools the air, the machinery which makes the furniture and fabrics, and hundred and one other features for good living. The books and papers are at school, and again everything from the piece of chalk to the closed-circuit television of instruction are the direct or indirect results of scientific progress. Learning is therefore easier. And clerical work is made far more speedy and efficient by the office typewriter, quite apart from the hundreds of...

Words: 572 - Pages: 3

Free Essay

Science

...One of the major shortcomings of science supposedly is a lack of communication between scientists and the general public. Many argue that too often, science is only presented in written academic journals that are not so easily obtained by the general public. This is discussed on a daily basis and was argued in the aftermath of the 2011 earthquake and tsunami in Japan, as well as in on-going debate about other scientific theories and ideas. However, people fail to realize a few things. One of the major things is that, in terms of an earthquake or tsunami, you simply cannot predict anything like that. You cannot blame scientists for not being able to predict an earthquake like a meteorologist can predict weather events. Scientists can study things like seismic activity and they can make assumptions as to what may happen should an earthquake of a high magnitude hit and cause something catastrophic like a tsunami. Yet, some fail to realize that some safety measures were taken, and even inspectors that visited the Fukushima Nuclear power plant asked Japanese authorities to increase safety measures further. According to a France24 news article written three months after the catastrophe, “A three-page summary was issued at the end of the 18-member team’s May 24-June 2 inspector mission to Japan. It said the country underestimated the threat from tsunamis to the Fukushima plant and urged sweeping changes to its regulatory system. Japanese authorities have been criticised for...

Words: 769 - Pages: 4

Premium Essay

Science

...In this essay I will focus on the events surrounding the regulation of Alar (diaminozide) up to and including 1985, as a case-study of knowledge and decision-making amidst uncertainty (418-19). I pick this time period in particular, because it is when the NRDC and other public interest groups began their campaign in protest against the EPA's decision to not ban Alar. My analysis of the events surrounding Alar will take shape around a critique of Michael Fumento's article "Environmental Hysteria: The Alar Scare," in which he paints the NRDC as "fanatics" launching a "smear campaign" not founded in any rational decision-making. This is an important argument to counter, because it has not only been taken up by many to condemn citizen-group action in the case of Alar, but to criticize their activities in many other regulatory processes. The chief framework used to devalue public action in these cases is the technocratic model, wherein it is believed that decisions can be best made by objective, rational experts acting based upon scientific knowledge. In this case, we can see a perfect example of when a decision was decided by scientific experts, in accordance with the technocratic model. Fumento and other supporters of the technocratic mode privilege the scientific knowledge of bodies such as the Scientific Advisory Panel in this case over other forms of knowledge. He denounces NRDC as fanatics based on his claim that they acted in spite of, and in contradiction to scientific...

Words: 2159 - Pages: 9

Premium Essay

Science

...Scientific papers are for sharing your own original research work with other scientists or for reviewing the research conducted by others. As such, they are critical to the evolution of modern science, in which the work of one scientist builds upon that of others. To reach their goal, papers must aim to inform, not impress. They must be highly readable — that is, clear, accurate, and concise. They are more likely to be cited by other scientists if they are helpful rather than cryptic or self-centered. Scientific papers typically have two audiences: first, the referees, who help the journal editor decide whether a paper is suitable for publication; and second, the journal readers themselves, who may be more or less knowledgeable about the topic addressed in the paper. To be accepted by referees and cited by readers, papers must do more than simply present a chronological account of the research work. Rather, they must convince their audience that the research presented is important, valid, and relevant to other scientists in the same field. To this end, they must emphasize both the motivation for the work and the outcome of it, and they must include just enough evidence to establish the validity of this outcome. Papers that report experimental work are often structured chronologically in five sections: first, Introduction; then Materials and Methods, Results, and Discussion (together, these three sections make up the paper's body); and finally, Conclusion. The Introduction...

Words: 373 - Pages: 2